Designing Next Generation Massively Multithreaded Architectures for Irregular Applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tumeo, Antonino; Secchi, Simone; Villa, Oreste
Irregular applications, such as data mining or graph-based computations, show unpredictable memory/network access patterns and control structures. Massively multi-threaded architectures with large node count, like the Cray XMT, have been shown to address their requirements better than commodity clusters. In this paper we present the approaches that we are currently pursuing to design future generations of these architectures. First, we introduce the Cray XMT and compare it to other multithreaded architectures. We then propose an evolution of the architecture, integrating multiple cores per node and next generation network interconnect. We advocate the use of hardware support for remote memory referencemore » aggregation to optimize network utilization. For this evaluation we developed a highly parallel, custom simulation infrastructure for multi-threaded systems. Our simulator executes unmodified XMT binaries with very large datasets, capturing effects due to contention and hot-spotting, while predicting execution times with greater than 90% accuracy. We also discuss the FPGA prototyping approach that we are employing to study efficient support for irregular applications in next generation manycore processors.« less
Multi-threading: A new dimension to massively parallel scientific computation
NASA Astrophysics Data System (ADS)
Nielsen, Ida M. B.; Janssen, Curtis L.
2000-06-01
Multi-threading is becoming widely available for Unix-like operating systems, and the application of multi-threading opens new ways for performing parallel computations with greater efficiency. We here briefly discuss the principles of multi-threading and illustrate the application of multi-threading for a massively parallel direct four-index transformation of electron repulsion integrals. Finally, other potential applications of multi-threading in scientific computing are outlined.
Verifying speculative multithreading in an application
Felton, Mitchell D
2014-12-09
Verifying speculative multithreading in an application executing in a computing system, including: executing one or more test instructions serially thereby producing a serial result, including insuring that all data dependencies among the test instructions are satisfied; executing the test instructions speculatively in a plurality of threads thereby producing a speculative result; and determining whether a speculative multithreading error exists including: comparing the serial result to the speculative result and, if the serial result does not match the speculative result, determining that a speculative multithreading error exists.
Verifying speculative multithreading in an application
Felton, Mitchell D
2014-11-18
Verifying speculative multithreading in an application executing in a computing system, including: executing one or more test instructions serially thereby producing a serial result, including insuring that all data dependencies among the test instructions are satisfied; executing the test instructions speculatively in a plurality of threads thereby producing a speculative result; and determining whether a speculative multithreading error exists including: comparing the serial result to the speculative result and, if the serial result does not match the speculative result, determining that a speculative multithreading error exists.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Earl, Christopher; Might, Matthew; Bagusetty, Abhishek
This study presents Nebo, a declarative domain-specific language embedded in C++ for discretizing partial differential equations for transport phenomena on multiple architectures. Application programmers use Nebo to write code that appears sequential but can be run in parallel, without editing the code. Currently Nebo supports single-thread execution, multi-thread execution, and many-core (GPU-based) execution. With single-thread execution, Nebo performs on par with code written by domain experts. With multi-thread execution, Nebo can linearly scale (with roughly 90% efficiency) up to 12 cores, compared to its single-thread execution. Moreover, Nebo’s many-core execution can be over 140x faster than its single-thread execution.
Earl, Christopher; Might, Matthew; Bagusetty, Abhishek; ...
2016-01-26
This study presents Nebo, a declarative domain-specific language embedded in C++ for discretizing partial differential equations for transport phenomena on multiple architectures. Application programmers use Nebo to write code that appears sequential but can be run in parallel, without editing the code. Currently Nebo supports single-thread execution, multi-thread execution, and many-core (GPU-based) execution. With single-thread execution, Nebo performs on par with code written by domain experts. With multi-thread execution, Nebo can linearly scale (with roughly 90% efficiency) up to 12 cores, compared to its single-thread execution. Moreover, Nebo’s many-core execution can be over 140x faster than its single-thread execution.
Implementation of a multi-threaded framework for large-scale scientific applications
Sexton-Kennedy, E.; Gartung, Patrick; Jones, C. D.; ...
2015-05-22
The CMS experiment has recently completed the development of a multi-threaded capable application framework. In this paper, we will discuss the design, implementation and application of this framework to production applications in CMS. For the 2015 LHC run, this functionality is particularly critical for both our online and offline production applications, which depend on faster turn-around times and a reduced memory footprint relative to before. These applications are complex codes, each including a large number of physics-driven algorithms. While the framework is capable of running a mix of thread-safe and 'legacy' modules, algorithms running in our production applications need tomore » be thread-safe for optimal use of this multi-threaded framework at a large scale. Towards this end, we discuss the types of changes, which were necessary for our algorithms to achieve good performance of our multithreaded applications in a full-scale application. Lastly performance numbers for what has been achieved for the 2015 run are presented.« less
Multi-threaded Event Processing with DANA
DOE Office of Scientific and Technical Information (OSTI.GOV)
David Lawrence; Elliott Wolin
2007-05-14
The C++ data analysis framework DANA has been written to support the next generation of Nuclear Physics experiments at Jefferson Lab commensurate with the anticipated 12GeV upgrade. The DANA framework was designed to allow multi-threaded event processing with a minimal impact on developers of reconstruction software. This document describes how DANA implements multi-threaded event processing and compares it to simply running multiple instances of a program. Also presented are relative reconstruction rates for Pentium4, Xeon, and Opteron based machines.
NASA Technical Reports Server (NTRS)
Dominguez, Jesus A.; Victor, Elias; Vasquez, Angel L.; Urbina, Alfredo R.
2017-01-01
A multi-threaded software application has been developed in-house by the Ground Special Power (GSP) team at NASA Kennedy Space Center (KSC) to separately simulate and fully emulate all units that supply VDC power and battery-based power backup to multiple KSC launch ground support systems for NASA Space Launch Systems (SLS) rocket.
Multithreaded hybrid feature tracking for markerless augmented reality.
Lee, Taehee; Höllerer, Tobias
2009-01-01
We describe a novel markerless camera tracking approach and user interaction methodology for augmented reality (AR) on unprepared tabletop environments. We propose a real-time system architecture that combines two types of feature tracking. Distinctive image features of the scene are detected and tracked frame-to-frame by computing optical flow. In order to achieve real-time performance, multiple operations are processed in a synchronized multi-threaded manner: capturing a video frame, tracking features using optical flow, detecting distinctive invariant features, and rendering an output frame. We also introduce user interaction methodology for establishing a global coordinate system and for placing virtual objects in the AR environment by tracking a user's outstretched hand and estimating a camera pose relative to it. We evaluate the speed and accuracy of our hybrid feature tracking approach, and demonstrate a proof-of-concept application for enabling AR in unprepared tabletop environments, using bare hands for interaction.
Shadow-Bitcoin: Scalable Simulation via Direct Execution of Multi-Threaded Applications
2015-08-10
Shadow- Bitcoin : Scalable Simulation via Direct Execution of Multi-threaded Applications Andrew Miller University of Maryland amiller@cs.umd.edu Rob...Shadow plug-in that directly executes the Bitcoin reference client software. To demonstrate the usefulness of this tool, we present novel denial-of...service attacks against the Bit- coin software that exploit low-level implementation ar- tifacts in the Bitcoin reference client; our determinis- tic
Multi-threaded integration of HTC-Vive and MeVisLab
NASA Astrophysics Data System (ADS)
Gunacker, Simon; Gall, Markus; Schmalstieg, Dieter; Egger, Jan
2018-03-01
This work presents how Virtual Reality (VR) can easily be integrated into medical applications via a plugin for a medical image processing framework called MeVisLab. A multi-threaded plugin has been developed using OpenVR, a VR library that can be used for developing vendor and platform independent VR applications. The plugin is tested using the HTC Vive, a head-mounted display developed by HTC and Valve Corporation.
Multi-threaded ATLAS simulation on Intel Knights Landing processors
NASA Astrophysics Data System (ADS)
Farrell, Steven; Calafiura, Paolo; Leggett, Charles; Tsulaia, Vakhtang; Dotti, Andrea; ATLAS Collaboration
2017-10-01
The Knights Landing (KNL) release of the Intel Many Integrated Core (MIC) Xeon Phi line of processors is a potential game changer for HEP computing. With 72 cores and deep vector registers, the KNL cards promise significant performance benefits for highly-parallel, compute-heavy applications. Cori, the newest supercomputer at the National Energy Research Scientific Computing Center (NERSC), was delivered to its users in two phases with the first phase online at the end of 2015 and the second phase now online at the end of 2016. Cori Phase 2 is based on the KNL architecture and contains over 9000 compute nodes with 96GB DDR4 memory. ATLAS simulation with the multithreaded Athena Framework (AthenaMT) is a good potential use-case for the KNL architecture and supercomputers like Cori. ATLAS simulation jobs have a high ratio of CPU computation to disk I/O and have been shown to scale well in multi-threading and across many nodes. In this paper we will give an overview of the ATLAS simulation application with details on its multi-threaded design. Then, we will present a performance analysis of the application on KNL devices and compare it to a traditional x86 platform to demonstrate the capabilities of the architecture and evaluate the benefits of utilizing KNL platforms like Cori for ATLAS production.
Geant4 Computing Performance Benchmarking and Monitoring
Dotti, Andrea; Elvira, V. Daniel; Folger, Gunter; ...
2015-12-23
Performance evaluation and analysis of large scale computing applications is essential for optimal use of resources. As detector simulation is one of the most compute intensive tasks and Geant4 is the simulation toolkit most widely used in contemporary high energy physics (HEP) experiments, it is important to monitor Geant4 through its development cycle for changes in computing performance and to identify problems and opportunities for code improvements. All Geant4 development and public releases are being profiled with a set of applications that utilize different input event samples, physics parameters, and detector configurations. Results from multiple benchmarking runs are compared tomore » previous public and development reference releases to monitor CPU and memory usage. Observed changes are evaluated and correlated with code modifications. Besides the full summary of call stack and memory footprint, a detailed call graph analysis is available to Geant4 developers for further analysis. The set of software tools used in the performance evaluation procedure, both in sequential and multi-threaded modes, include FAST, IgProf and Open|Speedshop. In conclusion, the scalability of the CPU time and memory performance in multi-threaded application is evaluated by measuring event throughput and memory gain as a function of the number of threads for selected event samples.« less
Using Multithreading for the Automatic Load Balancing of 2D Adaptive Finite Element Meshes
NASA Technical Reports Server (NTRS)
Heber, Gerd; Biswas, Rupak; Thulasiraman, Parimala; Gao, Guang R.; Bailey, David H. (Technical Monitor)
1998-01-01
In this paper, we present a multi-threaded approach for the automatic load balancing of adaptive finite element (FE) meshes. The platform of our choice is the EARTH multi-threaded system which offers sufficient capabilities to tackle this problem. We implement the question phase of FE applications on triangular meshes, and exploit the EARTH token mechanism to automatically balance the resulting irregular and highly nonuniform workload. We discuss the results of our experiments on EARTH-SP2, an implementation of EARTH on the IBM SP2, with different load balancing strategies that are built into the runtime system.
Using Multi-threading for the Automatic Load Balancing of 2D Adaptive Finite Element Meshes
NASA Technical Reports Server (NTRS)
Heber, Gerd; Biswas, Rupak; Thulasiraman, Parimala; Gao, Guang R.; Saini, Subhash (Technical Monitor)
1998-01-01
In this paper, we present a multi-threaded approach for the automatic load balancing of adaptive finite element (FE) meshes The platform of our choice is the EARTH multi-threaded system which offers sufficient capabilities to tackle this problem. We implement the adaption phase of FE applications oil triangular meshes and exploit the EARTH token mechanism to automatically balance the resulting irregular and highly nonuniform workload. We discuss the results of our experiments oil EARTH-SP2, on implementation of EARTH on the IBM SP2 with different load balancing strategies that are built into the runtime system.
Optics Program Modified for Multithreaded Parallel Computing
NASA Technical Reports Server (NTRS)
Lou, John; Bedding, Dave; Basinger, Scott
2006-01-01
A powerful high-performance computer program for simulating and analyzing adaptive and controlled optical systems has been developed by modifying the serial version of the Modeling and Analysis for Controlled Optical Systems (MACOS) program to impart capabilities for multithreaded parallel processing on computing systems ranging from supercomputers down to Symmetric Multiprocessing (SMP) personal computers. The modifications included the incorporation of OpenMP, a portable and widely supported application interface software, that can be used to explicitly add multithreaded parallelism to an application program under a shared-memory programming model. OpenMP was applied to parallelize ray-tracing calculations, one of the major computing components in MACOS. Multithreading is also used in the diffraction propagation of light in MACOS based on pthreads [POSIX Thread, (where "POSIX" signifies a portable operating system for UNIX)]. In tests of the parallelized version of MACOS, the speedup in ray-tracing calculations was found to be linear, or proportional to the number of processors, while the speedup in diffraction calculations ranged from 50 to 60 percent, depending on the type and number of processors. The parallelized version of MACOS is portable, and, to the user, its interface is basically the same as that of the original serial version of MACOS.
Hukerikar, Saurabh; Teranishi, Keita; Diniz, Pedro C.; ...
2017-02-11
In the presence of accelerated fault rates, which are projected to be the norm on future exascale systems, it will become increasingly difficult for high-performance computing (HPC) applications to accomplish useful computation. Due to the fault-oblivious nature of current HPC programming paradigms and execution environments, HPC applications are insufficiently equipped to deal with errors. We believe that HPC applications should be enabled with capabilities to actively search for and correct errors in their computations. The redundant multithreading (RMT) approach offers lightweight replicated execution streams of program instructions within the context of a single application process. Furthermore, the use of completemore » redundancy incurs significant overhead to the application performance.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hukerikar, Saurabh; Teranishi, Keita; Diniz, Pedro C.
In the presence of accelerated fault rates, which are projected to be the norm on future exascale systems, it will become increasingly difficult for high-performance computing (HPC) applications to accomplish useful computation. Due to the fault-oblivious nature of current HPC programming paradigms and execution environments, HPC applications are insufficiently equipped to deal with errors. We believe that HPC applications should be enabled with capabilities to actively search for and correct errors in their computations. The redundant multithreading (RMT) approach offers lightweight replicated execution streams of program instructions within the context of a single application process. Furthermore, the use of completemore » redundancy incurs significant overhead to the application performance.« less
SMT-Aware Instantaneous Footprint Optimization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Roy, Probir; Liu, Xu; Song, Shuaiwen
Modern architectures employ simultaneous multithreading (SMT) to increase thread-level parallelism. SMT threads share many functional units and the whole memory hierarchy of a physical core. Without a careful code design, SMT threads can easily contend with each other for these shared resources, causing severe performance degradation. Minimizing SMT thread contention for HPC applications running on dedicated platforms is very challenging, because they usually spawn threads within Single Program Multiple Data (SPMD) models. To address this important issue, we introduce a simple scheme for SMT-aware code optimization, which aims to reduce the memory contention across SMT threads.
NASA Astrophysics Data System (ADS)
Liao, S.; Chen, L.; Li, J.; Xiong, W.; Wu, Q.
2015-07-01
Existing spatiotemporal database supports spatiotemporal aggregation query over massive moving objects datasets. Due to the large amounts of data and single-thread processing method, the query speed cannot meet the application requirements. On the other hand, the query efficiency is more sensitive to spatial variation then temporal variation. In this paper, we proposed a spatiotemporal aggregation query method using multi-thread parallel technique based on regional divison and implemented it on the server. Concretely, we divided the spatiotemporal domain into several spatiotemporal cubes, computed spatiotemporal aggregation on all cubes using the technique of multi-thread parallel processing, and then integrated the query results. By testing and analyzing on the real datasets, this method has improved the query speed significantly.
Global Futures: a multithreaded execution model for Global Arrays-based applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chavarría-Miranda, Daniel; Krishnamoorthy, Sriram; Vishnu, Abhinav
2012-05-31
We present Global Futures (GF), an execution model extension to Global Arrays, which is based on a PGAS-compatible Active Message-based paradigm. We describe the design and implementation of Global Futures and illustrate its use in a computational chemistry application benchmark (Hartree-Fock matrix construction using the Self-Consistent Field method). Our results show how we used GF to increase the scalability of the Hartree-Fock matrix build to up to 6,144 cores of an Infiniband cluster. We also show how GF's multithreaded execution has comparable performance to the traditional process-based SPMD model.
Efficient Parallelization of a Dynamic Unstructured Application on the Tera MTA
NASA Technical Reports Server (NTRS)
Oliker, Leonid; Biswas, Rupak
1999-01-01
The success of parallel computing in solving real-life computationally-intensive problems relies on their efficient mapping and execution on large-scale multiprocessor architectures. Many important applications are both unstructured and dynamic in nature, making their efficient parallel implementation a daunting task. This paper presents the parallelization of a dynamic unstructured mesh adaptation algorithm using three popular programming paradigms on three leading supercomputers. We examine an MPI message-passing implementation on the Cray T3E and the SGI Origin2OOO, a shared-memory implementation using cache coherent nonuniform memory access (CC-NUMA) of the Origin2OOO, and a multi-threaded version on the newly-released Tera Multi-threaded Architecture (MTA). We compare several critical factors of this parallel code development, including runtime, scalability, programmability, and memory overhead. Our overall results demonstrate that multi-threaded systems offer tremendous potential for quickly and efficiently solving some of the most challenging real-life problems on parallel computers.
CMS event processing multi-core efficiency status
NASA Astrophysics Data System (ADS)
Jones, C. D.; CMS Collaboration
2017-10-01
In 2015, CMS was the first LHC experiment to begin using a multi-threaded framework for doing event processing. This new framework utilizes Intel’s Thread Building Block library to manage concurrency via a task based processing model. During the 2015 LHC run period, CMS only ran reconstruction jobs using multiple threads because only those jobs were sufficiently thread efficient. Recent work now allows simulation and digitization to be thread efficient. In addition, during 2015 the multi-threaded framework could run events in parallel but could only use one thread per event. Work done in 2016 now allows multiple threads to be used while processing one event. In this presentation we will show how these recent changes have improved CMS’s overall threading and memory efficiency and we will discuss work to be done to further increase those efficiencies.
Applying Jlint to Space Exploration Software
NASA Technical Reports Server (NTRS)
Artho, Cyrille; Havelund, Klaus
2004-01-01
Java is a very successful programming language which is also becoming widespread in embedded systems, where software correctness is critical. Jlint is a simple but highly efficient static analyzer that checks a Java program for several common errors, such as null pointer exceptions, and overflow errors. It also includes checks for multi-threading problems, such as deadlocks and data races. The case study described here shows the effectiveness of Jlint in find-false positives in the multi-threading warnings gives an insight into design patterns commonly used in multi-threaded code. The results show that a few analysis techniques are sufficient to avoid almost all false positives. These techniques include investigating all possible callers and a few code idioms. Verifying the correct application of these patterns is still crucial, because their correct usage is not trivial.
Multi-threaded Sparse Matrix Sparse Matrix Multiplication for Many-Core and GPU Architectures.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Deveci, Mehmet; Trott, Christian Robert; Rajamanickam, Sivasankaran
Sparse Matrix-Matrix multiplication is a key kernel that has applications in several domains such as scientific computing and graph analysis. Several algorithms have been studied in the past for this foundational kernel. In this paper, we develop parallel algorithms for sparse matrix- matrix multiplication with a focus on performance portability across different high performance computing architectures. The performance of these algorithms depend on the data structures used in them. We compare different types of accumulators in these algorithms and demonstrate the performance difference between these data structures. Furthermore, we develop a meta-algorithm, kkSpGEMM, to choose the right algorithm and datamore » structure based on the characteristics of the problem. We show performance comparisons on three architectures and demonstrate the need for the community to develop two phase sparse matrix-matrix multiplication implementations for efficient reuse of the data structures involved.« less
Multi-threaded Sparse Matrix-Matrix Multiplication for Many-Core and GPU Architectures.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Deveci, Mehmet; Rajamanickam, Sivasankaran; Trott, Christian Robert
Sparse Matrix-Matrix multiplication is a key kernel that has applications in several domains such as scienti c computing and graph analysis. Several algorithms have been studied in the past for this foundational kernel. In this paper, we develop parallel algorithms for sparse matrix-matrix multiplication with a focus on performance portability across different high performance computing architectures. The performance of these algorithms depend on the data structures used in them. We compare different types of accumulators in these algorithms and demonstrate the performance difference between these data structures. Furthermore, we develop a meta-algorithm, kkSpGEMM, to choose the right algorithm and datamore » structure based on the characteristics of the problem. We show performance comparisons on three architectures and demonstrate the need for the community to develop two phase sparse matrix-matrix multiplication implementations for efficient reuse of the data structures involved.« less
Fast and Accurate Simulation of the Cray XMT Multithreaded Supercomputer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Villa, Oreste; Tumeo, Antonino; Secchi, Simone
Irregular applications, such as data mining and analysis or graph-based computations, show unpredictable memory/network access patterns and control structures. Highly multithreaded architectures with large processor counts, like the Cray MTA-1, MTA-2 and XMT, appear to address their requirements better than commodity clusters. However, the research on highly multithreaded systems is currently limited by the lack of adequate architectural simulation infrastructures due to issues such as size of the machines, memory footprint, simulation speed, accuracy and customization. At the same time, Shared-memory MultiProcessors (SMPs) with multi-core processors have become an attractive platform to simulate large scale machines. In this paper, wemore » introduce a cycle-level simulator of the highly multithreaded Cray XMT supercomputer. The simulator runs unmodified XMT applications. We discuss how we tackled the challenges posed by its development, detailing the techniques introduced to make the simulation as fast as possible while maintaining a high accuracy. By mapping XMT processors (ThreadStorm with 128 hardware threads) to host computing cores, the simulation speed remains constant as the number of simulated processors increases, up to the number of available host cores. The simulator supports zero-overhead switching among different accuracy levels at run-time and includes a network model that takes into account contention. On a modern 48-core SMP host, our infrastructure simulates a large set of irregular applications 500 to 2000 times slower than real time when compared to a 128-processor XMT, while remaining within 10\\% of accuracy. Emulation is only from 25 to 200 times slower than real time.« less
Hybrid Optimization Parallel Search PACKage
DOE Office of Scientific and Technical Information (OSTI.GOV)
2009-11-10
HOPSPACK is open source software for solving optimization problems without derivatives. Application problems may have a fully nonlinear objective function, bound constraints, and linear and nonlinear constraints. Problem variables may be continuous, integer-valued, or a mixture of both. The software provides a framework that supports any derivative-free type of solver algorithm. Through the framework, solvers request parallel function evaluation, which may use MPI (multiple machines) or multithreading (multiple processors/cores on one machine). The framework provides a Cache and Pending Cache of saved evaluations that reduces execution time and facilitates restarts. Solvers can dynamically create other algorithms to solve subproblems, amore » useful technique for handling multiple start points and integer-valued variables. HOPSPACK ships with the Generating Set Search (GSS) algorithm, developed at Sandia as part of the APPSPACK open source software project.« less
Scaling Irregular Applications through Data Aggregation and Software Multithreading
DOE Office of Scientific and Technical Information (OSTI.GOV)
Morari, Alessandro; Tumeo, Antonino; Chavarría-Miranda, Daniel
Bioinformatics, data analytics, semantic databases, knowledge discovery are emerging high performance application areas that exploit dynamic, linked data structures such as graphs, unbalanced trees or unstructured grids. These data structures usually are very large, requiring significantly more memory than available on single shared memory systems. Additionally, these data structures are difficult to partition on distributed memory systems. They also present poor spatial and temporal locality, thus generating unpredictable memory and network accesses. The Partitioned Global Address Space (PGAS) programming model seems suitable for these applications, because it allows using a shared memory abstraction across distributed-memory clusters. However, current PGAS languagesmore » and libraries are built to target regular remote data accesses and block transfers. Furthermore, they usually rely on the Single Program Multiple Data (SPMD) parallel control model, which is not well suited to the fine grained, dynamic and unbalanced parallelism of irregular applications. In this paper we present {\\bf GMT} (Global Memory and Threading library), a custom runtime library that enables efficient execution of irregular applications on commodity clusters. GMT integrates a PGAS data substrate with simple fork/join parallelism and provides automatic load balancing on a per node basis. It implements multi-level aggregation and lightweight multithreading to maximize memory and network bandwidth with fine-grained data accesses and tolerate long data access latencies. A key innovation in the GMT runtime is its thread specialization (workers, helpers and communication threads) that realize the overall functionality. We compare our approach with other PGAS models, such as UPC running using GASNet, and hand-optimized MPI code on a set of typical large-scale irregular applications, demonstrating speedups of an order of magnitude.« less
Exploring Manycore Multinode Systems for Irregular Applications with FPGA Prototyping
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ceriani, Marco; Palermo, Gianluca; Secchi, Simone
We present a prototype of a multi-core architecture implemented on FPGA, designed to enable efficient execution of irregular applications on distributed shared memory machines, while maintaining high performance on regular workloads. The architecture is composed of off-the-shelf soft-core cores, local interconnection and memory interface, integrated with custom components that optimize it for irregular applications. It relies on three key elements: a global address space, multithreading, and fine-grained synchronization. Global addresses are scrambled to reduce the formation of network hot-spots, while the latency of the transactions is covered by integrating an hardware scheduler within the custom load/store buffers to take advantagemore » from the availability of multiple executions threads, increasing the efficiency in a transparent way to the application. We evaluated a dual node system irregular kernels showing scalability in the number of cores and threads.« less
Large Scale Document Inversion using a Multi-threaded Computing System
Jung, Sungbo; Chang, Dar-Jen; Park, Juw Won
2018-01-01
Current microprocessor architecture is moving towards multi-core/multi-threaded systems. This trend has led to a surge of interest in using multi-threaded computing devices, such as the Graphics Processing Unit (GPU), for general purpose computing. We can utilize the GPU in computation as a massive parallel coprocessor because the GPU consists of multiple cores. The GPU is also an affordable, attractive, and user-programmable commodity. Nowadays a lot of information has been flooded into the digital domain around the world. Huge volume of data, such as digital libraries, social networking services, e-commerce product data, and reviews, etc., is produced or collected every moment with dramatic growth in size. Although the inverted index is a useful data structure that can be used for full text searches or document retrieval, a large number of documents will require a tremendous amount of time to create the index. The performance of document inversion can be improved by multi-thread or multi-core GPU. Our approach is to implement a linear-time, hash-based, single program multiple data (SPMD), document inversion algorithm on the NVIDIA GPU/CUDA programming platform utilizing the huge computational power of the GPU, to develop high performance solutions for document indexing. Our proposed parallel document inversion system shows 2-3 times faster performance than a sequential system on two different test datasets from PubMed abstract and e-commerce product reviews. CCS Concepts •Information systems➝Information retrieval • Computing methodologies➝Massively parallel and high-performance simulations. PMID:29861701
Large Scale Document Inversion using a Multi-threaded Computing System.
Jung, Sungbo; Chang, Dar-Jen; Park, Juw Won
2017-06-01
Current microprocessor architecture is moving towards multi-core/multi-threaded systems. This trend has led to a surge of interest in using multi-threaded computing devices, such as the Graphics Processing Unit (GPU), for general purpose computing. We can utilize the GPU in computation as a massive parallel coprocessor because the GPU consists of multiple cores. The GPU is also an affordable, attractive, and user-programmable commodity. Nowadays a lot of information has been flooded into the digital domain around the world. Huge volume of data, such as digital libraries, social networking services, e-commerce product data, and reviews, etc., is produced or collected every moment with dramatic growth in size. Although the inverted index is a useful data structure that can be used for full text searches or document retrieval, a large number of documents will require a tremendous amount of time to create the index. The performance of document inversion can be improved by multi-thread or multi-core GPU. Our approach is to implement a linear-time, hash-based, single program multiple data (SPMD), document inversion algorithm on the NVIDIA GPU/CUDA programming platform utilizing the huge computational power of the GPU, to develop high performance solutions for document indexing. Our proposed parallel document inversion system shows 2-3 times faster performance than a sequential system on two different test datasets from PubMed abstract and e-commerce product reviews. •Information systems➝Information retrieval • Computing methodologies➝Massively parallel and high-performance simulations.
Multithreaded Model for Dynamic Load Balancing Parallel Adaptive PDE Computations
NASA Technical Reports Server (NTRS)
Chrisochoides, Nikos
1995-01-01
We present a multithreaded model for the dynamic load-balancing of numerical, adaptive computations required for the solution of Partial Differential Equations (PDE's) on multiprocessors. Multithreading is used as a means of exploring concurrency in the processor level in order to tolerate synchronization costs inherent to traditional (non-threaded) parallel adaptive PDE solvers. Our preliminary analysis for parallel, adaptive PDE solvers indicates that multithreading can be used an a mechanism to mask overheads required for the dynamic balancing of processor workloads with computations required for the actual numerical solution of the PDE's. Also, multithreading can simplify the implementation of dynamic load-balancing algorithms, a task that is very difficult for traditional data parallel adaptive PDE computations. Unfortunately, multithreading does not always simplify program complexity, often makes code re-usability not an easy task, and increases software complexity.
Jagged Tiling for Intra-tile Parallelism and Fine-Grain Multithreading
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shrestha, Sunil; Manzano Franco, Joseph B.; Marquez, Andres
In this paper, we have developed a novel methodology that takes into consideration multithreaded many-core designs to better utilize memory/processing resources and improve memory residence on tileable applications. It takes advantage of polyhedral analysis and transformation in the form of PLUTO, combined with a highly optimized finegrain tile runtime to exploit parallelism at all levels. The main contributions of this paper include the introduction of multi-hierarchical tiling techniques that increases intra tile parallelism; and a data-flow inspired runtime library that allows the expression of parallel tiles with an efficient synchronization registry. Our current implementation shows performance improvements on an Intelmore » Xeon Phi board up to 32.25% against instances produced by state-of-the-art compiler frameworks for selected stencil applications.« less
Moessfit. A free Mössbauer fitting program
NASA Astrophysics Data System (ADS)
Kamusella, Sirko; Klauss, Hans-Henning
2016-12-01
A free data analysis program for Mössbauer spectroscopy was developed to solve commonly faced problems such as simultaneous fitting of multiple data sets, Maximum Entropy Method and a proper error estimation. The program is written in C++ using the Qt application framework and the Gnu Scientific Library. Moessfit makes use of multithreading to reasonably apply the multi core CPU capacities of modern PC. The whole fit is specified in a text input file issued to simplify work flow for the user and provide a simple start in the Mössbauer data analysis for beginners. However, the possibility to define arbitrary parameter dependencies and distributions as well as relaxation spectra makes Moessfit interesting for advanced user as well.
González-Domínguez, Jorge; Remeseiro, Beatriz; Martín, María J
2017-02-01
The analysis of the interference patterns on the tear film lipid layer is a useful clinical test to diagnose dry eye syndrome. This task can be automated with a high degree of accuracy by means of the use of tear film maps. However, the time required by the existing applications to generate them prevents a wider acceptance of this method by medical experts. Multithreading has been previously successfully employed by the authors to accelerate the tear film map definition on multicore single-node machines. In this work, we propose a hybrid message-passing and multithreading parallel approach that further accelerates the generation of tear film maps by exploiting the computational capabilities of distributed-memory systems such as multicore clusters and supercomputers. The algorithm for drawing tear film maps is parallelized using Message Passing Interface (MPI) for inter-node communications and the multithreading support available in the C++11 standard for intra-node parallelization. The original algorithm is modified to reduce the communications and increase the scalability. The hybrid method has been tested on 32 nodes of an Intel cluster (with two 12-core Haswell 2680v3 processors per node) using 50 representative images. Results show that maximum runtime is reduced from almost two minutes using the previous only-multithreaded approach to less than ten seconds using the hybrid method. The hybrid MPI/multithreaded implementation can be used by medical experts to obtain tear film maps in only a few seconds, which will significantly accelerate and facilitate the diagnosis of the dry eye syndrome. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Exploiting multicore compute resources in the CMS experiment
NASA Astrophysics Data System (ADS)
Ramírez, J. E.; Pérez-Calero Yzquierdo, A.; Hernández, J. M.; CMS Collaboration
2016-10-01
CMS has developed a strategy to efficiently exploit the multicore architecture of the compute resources accessible to the experiment. A coherent use of the multiple cores available in a compute node yields substantial gains in terms of resource utilization. The implemented approach makes use of the multithreading support of the event processing framework and the multicore scheduling capabilities of the resource provisioning system. Multicore slots are acquired and provisioned by means of multicore pilot agents which internally schedule and execute single and multicore payloads. Multicore scheduling and multithreaded processing are currently used in production for online event selection and prompt data reconstruction. More workflows are being adapted to run in multicore mode. This paper presents a review of the experience gained in the deployment and operation of the multicore scheduling and processing system, the current status and future plans.
Multithreaded implicitly dealiased convolutions
NASA Astrophysics Data System (ADS)
Roberts, Malcolm; Bowman, John C.
2018-03-01
Implicit dealiasing is a method for computing in-place linear convolutions via fast Fourier transforms that decouples work memory from input data. It offers easier memory management and, for long one-dimensional input sequences, greater efficiency than conventional zero-padding. Furthermore, for convolutions of multidimensional data, the segregation of data and work buffers can be exploited to reduce memory usage and execution time significantly. This is accomplished by processing and discarding data as it is generated, allowing work memory to be reused, for greater data locality and performance. A multithreaded implementation of implicit dealiasing that accepts an arbitrary number of input and output vectors and a general multiplication operator is presented, along with an improved one-dimensional Hermitian convolution that avoids the loop dependency inherent in previous work. An alternate data format that can accommodate a Nyquist mode and enhance cache efficiency is also proposed.
ALFA: The new ALICE-FAIR software framework
NASA Astrophysics Data System (ADS)
Al-Turany, M.; Buncic, P.; Hristov, P.; Kollegger, T.; Kouzinopoulos, C.; Lebedev, A.; Lindenstruth, V.; Manafov, A.; Richter, M.; Rybalchenko, A.; Vande Vyvre, P.; Winckler, N.
2015-12-01
The commonalities between the ALICE and FAIR experiments and their computing requirements led to the development of large parts of a common software framework in an experiment independent way. The FairRoot project has already shown the feasibility of such an approach for the FAIR experiments and extending it beyond FAIR to experiments at other facilities[1, 2]. The ALFA framework is a joint development between ALICE Online- Offline (O2) and FairRoot teams. ALFA is designed as a flexible, elastic system, which balances reliability and ease of development with performance using multi-processing and multithreading. A message- based approach has been adopted; such an approach will support the use of the software on different hardware platforms, including heterogeneous systems. Each process in ALFA assumes limited communication and reliance on other processes. Such a design will add horizontal scaling (multiple processes) to vertical scaling provided by multiple threads to meet computing and throughput demands. ALFA does not dictate any application protocols. Potentially, any content-based processor or any source can change the application protocol. The framework supports different serialization standards for data exchange between different hardware and software languages.
Multilevel Parallelization of AutoDock 4.2.
Norgan, Andrew P; Coffman, Paul K; Kocher, Jean-Pierre A; Katzmann, David J; Sosa, Carlos P
2011-04-28
Virtual (computational) screening is an increasingly important tool for drug discovery. AutoDock is a popular open-source application for performing molecular docking, the prediction of ligand-receptor interactions. AutoDock is a serial application, though several previous efforts have parallelized various aspects of the program. In this paper, we report on a multi-level parallelization of AutoDock 4.2 (mpAD4). Using MPI and OpenMP, AutoDock 4.2 was parallelized for use on MPI-enabled systems and to multithread the execution of individual docking jobs. In addition, code was implemented to reduce input/output (I/O) traffic by reusing grid maps at each node from docking to docking. Performance of mpAD4 was examined on two multiprocessor computers. Using MPI with OpenMP multithreading, mpAD4 scales with near linearity on the multiprocessor systems tested. In situations where I/O is limiting, reuse of grid maps reduces both system I/O and overall screening time. Multithreading of AutoDock's Lamarkian Genetic Algorithm with OpenMP increases the speed of execution of individual docking jobs, and when combined with MPI parallelization can significantly reduce the execution time of virtual screens. This work is significant in that mpAD4 speeds the execution of certain molecular docking workloads and allows the user to optimize the degree of system-level (MPI) and node-level (OpenMP) parallelization to best fit both workloads and computational resources.
BCYCLIC: A parallel block tridiagonal matrix cyclic solver
NASA Astrophysics Data System (ADS)
Hirshman, S. P.; Perumalla, K. S.; Lynch, V. E.; Sanchez, R.
2010-09-01
A block tridiagonal matrix is factored with minimal fill-in using a cyclic reduction algorithm that is easily parallelized. Storage of the factored blocks allows the application of the inverse to multiple right-hand sides which may not be known at factorization time. Scalability with the number of block rows is achieved with cyclic reduction, while scalability with the block size is achieved using multithreaded routines (OpenMP, GotoBLAS) for block matrix manipulation. This dual scalability is a noteworthy feature of this new solver, as well as its ability to efficiently handle arbitrary (non-powers-of-2) block row and processor numbers. Comparison with a state-of-the art parallel sparse solver is presented. It is expected that this new solver will allow many physical applications to optimally use the parallel resources on current supercomputers. Example usage of the solver in magneto-hydrodynamic (MHD), three-dimensional equilibrium solvers for high-temperature fusion plasmas is cited.
Das, Narendra; Stampoulis, Dimitrios; Ines, Amor; Fisher, Joshua B.; Granger, Stephanie; Kawata, Jessie; Han, Eunjin; Behrangi, Ali
2017-01-01
The Regional Hydrologic Extremes Assessment System (RHEAS) is a prototype software framework for hydrologic modeling and data assimilation that automates the deployment of water resources nowcasting and forecasting applications. A spatially-enabled database is a key component of the software that can ingest a suite of satellite and model datasets while facilitating the interfacing with Geographic Information System (GIS) applications. The datasets ingested are obtained from numerous space-borne sensors and represent multiple components of the water cycle. The object-oriented design of the software allows for modularity and extensibility, showcased here with the coupling of the core hydrologic model with a crop growth model. RHEAS can exploit multi-threading to scale with increasing number of processors, while the database allows delivery of data products and associated uncertainty through a variety of GIS platforms. A set of three example implementations of RHEAS in the United States and Kenya are described to demonstrate the different features of the system in real-world applications. PMID:28545077
Andreadis, Konstantinos M; Das, Narendra; Stampoulis, Dimitrios; Ines, Amor; Fisher, Joshua B; Granger, Stephanie; Kawata, Jessie; Han, Eunjin; Behrangi, Ali
2017-01-01
The Regional Hydrologic Extremes Assessment System (RHEAS) is a prototype software framework for hydrologic modeling and data assimilation that automates the deployment of water resources nowcasting and forecasting applications. A spatially-enabled database is a key component of the software that can ingest a suite of satellite and model datasets while facilitating the interfacing with Geographic Information System (GIS) applications. The datasets ingested are obtained from numerous space-borne sensors and represent multiple components of the water cycle. The object-oriented design of the software allows for modularity and extensibility, showcased here with the coupling of the core hydrologic model with a crop growth model. RHEAS can exploit multi-threading to scale with increasing number of processors, while the database allows delivery of data products and associated uncertainty through a variety of GIS platforms. A set of three example implementations of RHEAS in the United States and Kenya are described to demonstrate the different features of the system in real-world applications.
NASA Astrophysics Data System (ADS)
Saruwatari, Shunsuke; Suzuki, Makoto; Morikawa, Hiroyuki
The paper shows a compact hard real-time operating system for wireless sensor nodes called PAVENET OS. PAVENET OS provides hybrid multithreading: preemptive multithreading and cooperative multithreading. Both of the multithreading are optimized for two kinds of tasks on wireless sensor networks, and those are real-time tasks and best-effort ones. PAVENET OS can efficiently perform hard real-time tasks that cannot be performed by TinyOS. The paper demonstrates the hybrid multithreading realizes compactness and low overheads, which are comparable to those of TinyOS, through quantitative evaluation. The evaluation results show PAVENET OS performs 100 Hz sensor sampling with 0.01% jitter while performing wireless communication tasks, whereas optimized TinyOS has 0.62% jitter. In addition, PAVENET OS has a small footprint and low overheads (minimum RAM size: 29 bytes, minimum ROM size: 490 bytes, minimum task switch time: 23 cycles).
Thread scheduling for GPU-based OPC simulation on multi-thread
NASA Astrophysics Data System (ADS)
Lee, Heejun; Kim, Sangwook; Hong, Jisuk; Lee, Sooryong; Han, Hwansoo
2018-03-01
As semiconductor product development based on shrinkage continues, the accuracy and difficulty required for the model based optical proximity correction (MBOPC) is increasing. OPC simulation time, which is the most timeconsuming part of MBOPC, is rapidly increasing due to high pattern density in a layout and complex OPC model. To reduce OPC simulation time, we attempt to apply graphic processing unit (GPU) to MBOPC because OPC process is good to be programmed in parallel. We address some issues that may typically happen during GPU-based OPC simulation in multi thread system, such as "out of memory" and "GPU idle time". To overcome these problems, we propose a thread scheduling method, which manages OPC jobs in multiple threads in such a way that simulations jobs from multiple threads are alternatively executed on GPU while correction jobs are executed at the same time in each CPU cores. It was observed that the amount of GPU peak memory usage decreases by up to 35%, and MBOPC runtime also decreases by 4%. In cases where out of memory issues occur in a multi-threaded environment, the thread scheduler was used to improve MBOPC runtime up to 23%.
Factors controlling large-wood transport in a mountain river
NASA Astrophysics Data System (ADS)
Ruiz-Villanueva, Virginia; Wyżga, Bartłomiej; Zawiejska, Joanna; Hajdukiewicz, Maciej; Stoffel, Markus
2016-11-01
As with bedload transport, wood transport in rivers is governed by several factors such as flow regime, geomorphic configuration of the channel and floodplain, or wood size and shape. Because large-wood tends to be transported during floods, safety and logistical constraints make field measurements difficult. As a result, direct observation and measurements of the conditions of wood transport are scarce. This lack of direct observations and the complexity of the processes involved in wood transport may result in an incomplete understanding of wood transport processes. Numerical modelling provides an alternative approach to addressing some of the unknowns in the dynamics of large-wood in rivers. The aim of this study is to improve the understanding of controls governing wood transport in mountain rivers, combining numerical modelling and direct field observations. By defining different scenarios, we illustrate relationships between the rate of wood transport and discharge, wood size, and river morphology. We test these relationships for a wide, multithread reach and a narrower, partially channelized single-thread reach of the Czarny Dunajec River in the Polish Carpathians. Results indicate that a wide range of quantitative information about wood transport can be obtained from a combination of numerical modelling and field observations and from document contrasting patterns of wood transport in single- and multithread river reaches. On the one hand, log diameter seems to have a greater importance for wood transport in the multithread channel because of shallower flow, lower flow velocity, and lower stream power. Hydrodynamic conditions in the single-thread channel allow transport of large-wood pieces, whereas in the multithread reach, logs with diameters similar to water depth are not being moved. On the other hand, log length also exerts strong control on wood transport, more so in the single-thread than in the multithread reach. In any case, wood transport strongly decreases with increasing piece volume, although this relation is not linear. We also document a nonlinear relationship between wood transport and flood magnitude. A threshold discharge was identified below which wood transport is negligible. This threshold is higher in the multithread reach, while in the single-thread reach floods of lower magnitude are able to transport wood downstream. Wood transport ratio increases with discharge until it reaches an upper threshold or tipping point, and then decreases or increases much more slowly. This threshold is clearly related to bankfull discharge, but it is much higher for the multithread reach than for the single-thread one. Although modelling input and field observations were taken from a specific river, our findings and conclusions are likely to be applicable to a much larger suite of (mountain) rivers.
Real-time SHVC software decoding with multi-threaded parallel processing
NASA Astrophysics Data System (ADS)
Gudumasu, Srinivas; He, Yuwen; Ye, Yan; He, Yong; Ryu, Eun-Seok; Dong, Jie; Xiu, Xiaoyu
2014-09-01
This paper proposes a parallel decoding framework for scalable HEVC (SHVC). Various optimization technologies are implemented on the basis of SHVC reference software SHM-2.0 to achieve real-time decoding speed for the two layer spatial scalability configuration. SHVC decoder complexity is analyzed with profiling information. The decoding process at each layer and the up-sampling process are designed in parallel and scheduled by a high level application task manager. Within each layer, multi-threaded decoding is applied to accelerate the layer decoding speed. Entropy decoding, reconstruction, and in-loop processing are pipeline designed with multiple threads based on groups of coding tree units (CTU). A group of CTUs is treated as a processing unit in each pipeline stage to achieve a better trade-off between parallelism and synchronization. Motion compensation, inverse quantization, and inverse transform modules are further optimized with SSE4 SIMD instructions. Simulations on a desktop with an Intel i7 processor 2600 running at 3.4 GHz show that the parallel SHVC software decoder is able to decode 1080p spatial 2x at up to 60 fps (frames per second) and 1080p spatial 1.5x at up to 50 fps for those bitstreams generated with SHVC common test conditions in the JCT-VC standardization group. The decoding performance at various bitrates with different optimization technologies and different numbers of threads are compared in terms of decoding speed and resource usage, including processor and memory.
NavP: Structured and Multithreaded Distributed Parallel Programming
NASA Technical Reports Server (NTRS)
Pan, Lei; Xu, Jingling
2006-01-01
This slide presentation reviews some of the issues around distributed parallel programming. It compares and contrast two methods of programming: Single Program Multiple Data (SPMD) with the Navigational Programming (NAVP). It then reviews the distributed sequential computing (DSC) method and the methodology of NavP. Case studies are presented. It also reviews the work that is being done to enable the NavP system.
JaSTA-2: Second version of the Java Superposition T-matrix Application
NASA Astrophysics Data System (ADS)
Halder, Prithish; Das, Himadri Sekhar
2017-12-01
In this article, we announce the development of a new version of the Java Superposition T-matrix App (JaSTA-2), to study the light scattering properties of porous aggregate particles. It has been developed using Netbeans 7.1.2, which is a java integrated development environment (IDE). The JaSTA uses double precision superposition T-matrix codes for multi-sphere clusters in random orientation, developed by Mackowski and Mischenko (1996). The new version consists of two options as part of the input parameters: (i) single wavelength and (ii) multiple wavelengths. The first option (which retains the applicability of older version of JaSTA) calculates the light scattering properties of aggregates of spheres for a single wavelength at a given instant of time whereas the second option can execute the code for a multiple numbers of wavelengths in a single run. JaSTA-2 provides convenient and quicker data analysis which can be used in diverse fields like Planetary Science, Atmospheric Physics, Nanoscience, etc. This version of the software is developed for Linux platform only, and it can be operated over all the cores of a processor using the multi-threading option.
High Accuracy Monocular SFM and Scale Correction for Autonomous Driving.
Song, Shiyu; Chandraker, Manmohan; Guest, Clark C
2016-04-01
We present a real-time monocular visual odometry system that achieves high accuracy in real-world autonomous driving applications. First, we demonstrate robust monocular SFM that exploits multithreading to handle driving scenes with large motions and rapidly changing imagery. To correct for scale drift, we use known height of the camera from the ground plane. Our second contribution is a novel data-driven mechanism for cue combination that allows highly accurate ground plane estimation by adapting observation covariances of multiple cues, such as sparse feature matching and dense inter-frame stereo, based on their relative confidences inferred from visual data on a per-frame basis. Finally, we demonstrate extensive benchmark performance and comparisons on the challenging KITTI dataset, achieving accuracy comparable to stereo and exceeding prior monocular systems. Our SFM system is optimized to output pose within 50 ms in the worst case, while average case operation is over 30 fps. Our framework also significantly boosts the accuracy of applications like object localization that rely on the ground plane.
Accelerating Demand Paging for Local and Remote Out-of-Core Visualization
NASA Technical Reports Server (NTRS)
Ellsworth, David
2001-01-01
This paper describes a new algorithm that improves the performance of application-controlled demand paging for the out-of-core visualization of data sets that are on either local disks or disks on remote servers. The performance improvements come from better overlapping the computation with the page reading process, and by performing multiple page reads in parallel. The new algorithm can be applied to many different visualization algorithms since application-controlled demand paging is not specific to any visualization algorithm. The paper includes measurements that show that the new multi-threaded paging algorithm decreases the time needed to compute visualizations by one third when using one processor and reading data from local disk. The time needed when using one processor and reading data from remote disk decreased by up to 60%. Visualization runs using data from remote disk ran about as fast as ones using data from local disk because the remote runs were able to make use of the remote server's high performance disk array.
DAQ: Software Architecture for Data Acquisition in Sounding Rockets
NASA Technical Reports Server (NTRS)
Ahmad, Mohammad; Tran, Thanh; Nichols, Heidi; Bowles-Martinez, Jessica N.
2011-01-01
A multithreaded software application was developed by Jet Propulsion Lab (JPL) to collect a set of correlated imagery, Inertial Measurement Unit (IMU) and GPS data for a Wallops Flight Facility (WFF) sounding rocket flight. The data set will be used to advance Terrain Relative Navigation (TRN) technology algorithms being researched at JPL. This paper describes the software architecture and the tests used to meet the timing and data rate requirements for the software used to collect the dataset. Also discussed are the challenges of using commercial off the shelf (COTS) flight hardware and open source software. This includes multiple Camera Link (C-link) based cameras, a Pentium-M based computer, and Linux Fedora 11 operating system. Additionally, the paper talks about the history of the software architecture's usage in other JPL projects and its applicability for future missions, such as cubesats, UAVs, and research planes/balloons. Also talked about will be the human aspect of project especially JPL's Phaeton program and the results of the launch.
Hiding the Disk and Network Latency of Out-of-Core Visualization
NASA Technical Reports Server (NTRS)
Ellsworth, David
2001-01-01
This paper describes an algorithm that improves the performance of application-controlled demand paging for out-of-core visualization by hiding the latency of reading data from both local disks or disks on remote servers. The performance improvements come from better overlapping the computation with the page reading process, and by performing multiple page reads in parallel. The paper includes measurements that show that the new multithreaded paging algorithm decreases the time needed to compute visualizations by one third when using one processor and reading data from local disk. The time needed when using one processor and reading data from remote disk decreased by two thirds. Visualization runs using data from remote disk actually ran faster than ones using data from local disk because the remote runs were able to make use of the remote server's high performance disk array.
Application of Advanced Multi-Core Processor Technologies to Oceanographic Research
2013-09-30
STM32 NXP LPC series No Proprietary Microchip PIC32/DSPIC No > 500 mW; < 5 W ARM Cortex TI OMAP TI Sitara Broadcom BCM2835 Varies FPGA...1 DISTRIBUTION STATEMENT A. Approved for public release; distribution is unlimited. Application of Advanced Multi-Core Processor Technologies...state-of-the-art information processing architectures. OBJECTIVES Next-generation processor architectures (multi-core, multi-threaded) hold the
A Tool for Intersecting Context-Free Grammars and Its Applications
NASA Technical Reports Server (NTRS)
Gange, Graeme; Navas, Jorge A.; Schachte, Peter; Sondergaard, Harald; Stuckey, Peter J.
2015-01-01
This paper describes a tool for intersecting context-free grammars. Since this problem is undecidable the tool follows a refinement-based approach and implements a novel refinement which is complete for regularly separable grammars. We show its effectiveness for safety verification of recursive multi-threaded programs.
A parallel approach of COFFEE objective function to multiple sequence alignment
NASA Astrophysics Data System (ADS)
Zafalon, G. F. D.; Visotaky, J. M. V.; Amorim, A. R.; Valêncio, C. R.; Neves, L. A.; de Souza, R. C. G.; Machado, J. M.
2015-09-01
The computational tools to assist genomic analyzes show even more necessary due to fast increasing of data amount available. With high computational costs of deterministic algorithms for sequence alignments, many works concentrate their efforts in the development of heuristic approaches to multiple sequence alignments. However, the selection of an approach, which offers solutions with good biological significance and feasible execution time, is a great challenge. Thus, this work aims to show the parallelization of the processing steps of MSA-GA tool using multithread paradigm in the execution of COFFEE objective function. The standard objective function implemented in the tool is the Weighted Sum of Pairs (WSP), which produces some distortions in the final alignments when sequences sets with low similarity are aligned. Then, in studies previously performed we implemented the COFFEE objective function in the tool to smooth these distortions. Although the nature of COFFEE objective function implies in the increasing of execution time, this approach presents points, which can be executed in parallel. With the improvements implemented in this work, we can verify the execution time of new approach is 24% faster than the sequential approach with COFFEE. Moreover, the COFFEE multithreaded approach is more efficient than WSP, because besides it is slightly fast, its biological results are better.
Electron-beam lithography data preparation based on multithreading MGS/PROXECCO
NASA Astrophysics Data System (ADS)
Eichhorn, Hans; Lemke, Melchior; Gramss, Juergen; Buerger, B.; Baetz, Uwe; Belic, Nikola; Eisenmann, Hans
2001-04-01
This paper will highlight an enhanced MGS layout data post processor and the results of its industrial application. Besides the preparation of hierarchical GDS layout data, the processing of flat data has been drastically accelerated. The application of the Proximity Correction in conjunction with the OEM version of the PROXECCO was crowned with success for data preparation of mask sets featuring 0.25 micrometers /0.18 micrometers integration levels.
Multithreading in vector processors
DOE Office of Scientific and Technical Information (OSTI.GOV)
Evangelinos, Constantinos; Kim, Changhoan; Nair, Ravi
In one embodiment, a system includes a processor having a vector processing mode and a multithreading mode. The processor is configured to operate on one thread per cycle in the multithreading mode. The processor includes a program counter register having a plurality of program counters, and the program counter register is vectorized. Each program counter in the program counter register represents a distinct corresponding thread of a plurality of threads. The processor is configured to execute the plurality of threads by activating the plurality of program counters in a round robin cycle.
On Parallel Push-Relabel based Algorithms for Bipartite Maximum Matching
DOE Office of Scientific and Technical Information (OSTI.GOV)
Langguth, Johannes; Azad, Md Ariful; Halappanavar, Mahantesh
2014-07-01
We study multithreaded push-relabel based algorithms for computing maximum cardinality matching in bipartite graphs. Matching is a fundamental combinatorial (graph) problem with applications in a wide variety of problems in science and engineering. We are motivated by its use in the context of sparse linear solvers for computing maximum transversal of a matrix. We implement and test our algorithms on several multi-socket multicore systems and compare their performance to state-of-the-art augmenting path-based serial and parallel algorithms using a testset comprised of a wide range of real-world instances. Building on several heuristics for enhancing performance, we demonstrate good scaling for themore » parallel push-relabel algorithm. We show that it is comparable to the best augmenting path-based algorithms for bipartite matching. To the best of our knowledge, this is the first extensive study of multithreaded push-relabel based algorithms. In addition to a direct impact on the applications using matching, the proposed algorithmic techniques can be extended to preflow-push based algorithms for computing maximum flow in graphs.« less
NASA Astrophysics Data System (ADS)
Wegener, Pam; Covino, Tim; Wohl, Ellen
2017-06-01
River networks that drain mountain landscapes alternate between narrow and wide valley segments. Within the wide segments, beaver activity can facilitate the development and maintenance of complex, multithread planform. Because the narrow segments have limited ability to retain water, carbon, and nutrients, the wide, multithread segments are likely important locations of retention. We evaluated hydrologic dynamics, nutrient flux, and aquatic ecosystem metabolism along two adjacent segments of a river network in the Rocky Mountains, Colorado: (1) a wide, multithread segment with beaver activity; and, (2) an adjacent (directly upstream) narrow, single-thread segment without beaver activity. We used a mass balance approach to determine the water, carbon, and nutrient source-sink behavior of each river segment across a range of flows. While the single-thread segment was consistently a source of water, carbon, and nitrogen, the beaver impacted multithread segment exhibited variable source-sink dynamics as a function of flow. Specifically, the multithread segment was a sink for water, carbon, and nutrients during high flows, and subsequently became a source as flows decreased. Shifts in river-floodplain hydrologic connectivity across flows related to higher and more variable aquatic ecosystem metabolism rates along the multithread relative to the single-thread segment. Our data suggest that beaver activity in wide valleys can create a physically complex hydrologic environment that can enhance hydrologic and biogeochemical buffering, and promote high rates of aquatic ecosystem metabolism. Given the widespread removal of beaver, determining the cumulative effects of these changes is a critical next step in restoring function in altered river networks.
NASA Technical Reports Server (NTRS)
Artho, Cyrille; Havelund, Klaus; Biere, Armin; Koga, Dennis (Technical Monitor)
2003-01-01
Data races are a common problem in concurrent and multi-threaded programming. They are hard to detect without proper tool support. Despite the successful application of these tools, experience shows that the notion of data race is not powerful enough to capture certain types of inconsistencies occurring in practice. In this paper we investigate data races on a higher abstraction layer. This enables us to detect inconsistent uses of shared variables, even if no classical race condition occurs. For example, a data structure representing a coordinate pair may have to be treated atomically. By lifting the meaning of a data race to a higher level, such problems can now be covered. The paper defines the concepts view and view consistency to give a notation for this novel kind of property. It describes what kinds of errors can be detected with this new definition, and where its limitations are. It also gives a formal guideline for using data structures in a multi-threading environment.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Madduri, Kamesh; Ediger, David; Jiang, Karl
2009-02-15
We present a new lock-free parallel algorithm for computing betweenness centralityof massive small-world networks. With minor changes to the data structures, ouralgorithm also achieves better spatial cache locality compared to previous approaches. Betweenness centrality is a key algorithm kernel in HPCS SSCA#2, a benchmark extensively used to evaluate the performance of emerging high-performance computing architectures for graph-theoretic computations. We design optimized implementations of betweenness centrality and the SSCA#2 benchmark for two hardware multithreaded systems: a Cray XMT system with the Threadstorm processor, and a single-socket Sun multicore server with the UltraSPARC T2 processor. For a small-world network of 134 millionmore » vertices and 1.073 billion edges, the 16-processor XMT system and the 8-core Sun Fire T5120 server achieve TEPS scores (an algorithmic performance count for the SSCA#2 benchmark) of 160 million and 90 million respectively, which corresponds to more than a 2X performance improvement over the previous parallel implementations. To better characterize the performance of these multithreaded systems, we correlate the SSCA#2 performance results with data from the memory-intensive STREAM and RandomAccess benchmarks. Finally, we demonstrate the applicability of our implementation to analyze massive real-world datasets by computing approximate betweenness centrality for a large-scale IMDb movie-actor network.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Madduri, Kamesh; Ediger, David; Jiang, Karl
2009-05-29
We present a new lock-free parallel algorithm for computing betweenness centrality of massive small-world networks. With minor changes to the data structures, our algorithm also achieves better spatial cache locality compared to previous approaches. Betweenness centrality is a key algorithm kernel in the HPCS SSCA#2 Graph Analysis benchmark, which has been extensively used to evaluate the performance of emerging high-performance computing architectures for graph-theoretic computations. We design optimized implementations of betweenness centrality and the SSCA#2 benchmark for two hardware multithreaded systems: a Cray XMT system with the ThreadStorm processor, and a single-socket Sun multicore server with the UltraSparc T2 processor.more » For a small-world network of 134 million vertices and 1.073 billion edges, the 16-processor XMT system and the 8-core Sun Fire T5120 server achieve TEPS scores (an algorithmic performance count for the SSCA#2 benchmark) of 160 million and 90 million respectively, which corresponds to more than a 2X performance improvement over the previous parallel implementations. To better characterize the performance of these multithreaded systems, we correlate the SSCA#2 performance results with data from the memory-intensive STREAM and RandomAccess benchmarks. Finally, we demonstrate the applicability of our implementation to analyze massive real-world datasets by computing approximate betweenness centrality for a large-scale IMDb movie-actor network.« less
Transformation Systems at NASA Ames
NASA Technical Reports Server (NTRS)
Buntine, Wray; Fischer, Bernd; Havelund, Klaus; Lowry, Michael; Pressburger, TOm; Roach, Steve; Robinson, Peter; VanBaalen, Jeffrey
1999-01-01
In this paper, we describe the experiences of the Automated Software Engineering Group at the NASA Ames Research Center in the development and application of three different transformation systems. The systems span the entire technology range, from deductive synthesis, to logic-based transformation, to almost compiler-like source-to-source transformation. These systems also span a range of NASA applications, including solving solar system geometry problems, generating data analysis software, and analyzing multi-threaded Java code.
Multi-Threaded DNA Tag/Anti-Tag Library Generator for Multi-Core Platforms
2009-05-01
base pair) Watson ‐ Crick strand pairs that bind perfectly within pairs, but poorly across pairs. A variety of DNA strand hybridization metrics...AFRL-RI-RS-TR-2009-131 Final Technical Report May 2009 MULTI-THREADED DNA TAG/ANTI-TAG LIBRARY GENERATOR FOR MULTI-CORE PLATFORMS...TYPE Final 3. DATES COVERED (From - To) Jun 08 – Feb 09 4. TITLE AND SUBTITLE MULTI-THREADED DNA TAG/ANTI-TAG LIBRARY GENERATOR FOR MULTI-CORE
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hedstrom, Gerald; Beck, Bret; Mattoon, Caleb
2016-10-01
Merced performs a multi-dimensional integral tl generate so-called 'transfer matrices' for use in deterministic radiation transport applications. It produces transfer matrices on the user-defind energy grid. The angular dependence of outgoing products is captured in a Legendre expansion, up to a user-specified maximun Legendre order. Merced calculations can use multi-threading for enhanced performance on a single compute node.
NASA Astrophysics Data System (ADS)
Baregheh, Mandana; Mezentsev, Vladimir; Schmitz, Holger
2011-06-01
We describe a parallel multi-threaded approach for high performance modelling of wide class of phenomena in ultrafast nonlinear optics. Specific implementation has been performed using the highly parallel capabilities of a programmable graphics processor.
A Hybrid Task Graph Scheduler for High Performance Image Processing Workflows.
Blattner, Timothy; Keyrouz, Walid; Bhattacharyya, Shuvra S; Halem, Milton; Brady, Mary
2017-12-01
Designing applications for scalability is key to improving their performance in hybrid and cluster computing. Scheduling code to utilize parallelism is difficult, particularly when dealing with data dependencies, memory management, data motion, and processor occupancy. The Hybrid Task Graph Scheduler (HTGS) improves programmer productivity when implementing hybrid workflows for multi-core and multi-GPU systems. The Hybrid Task Graph Scheduler (HTGS) is an abstract execution model, framework, and API that increases programmer productivity when implementing hybrid workflows for such systems. HTGS manages dependencies between tasks, represents CPU and GPU memories independently, overlaps computations with disk I/O and memory transfers, keeps multiple GPUs occupied, and uses all available compute resources. Through these abstractions, data motion and memory are explicit; this makes data locality decisions more accessible. To demonstrate the HTGS application program interface (API), we present implementations of two example algorithms: (1) a matrix multiplication that shows how easily task graphs can be used; and (2) a hybrid implementation of microscopy image stitching that reduces code size by ≈ 43% compared to a manually coded hybrid workflow implementation and showcases the minimal overhead of task graphs in HTGS. Both of the HTGS-based implementations show good performance. In image stitching the HTGS implementation achieves similar performance to the hybrid workflow implementation. Matrix multiplication with HTGS achieves 1.3× and 1.8× speedup over the multi-threaded OpenBLAS library for 16k × 16k and 32k × 32k size matrices, respectively.
Near Theoretical Gigabit Link Efficiency for Distributed Data Acquisition Systems
Abu-Nimeh, Faisal T.; Choong, Woon-Seng
2017-01-01
Link efficiency, data integrity, and continuity for high-throughput and real-time systems is crucial. Most of these applications require specialized hardware and operating systems as well as extensive tuning in order to achieve high efficiency. Here, we present an implementation of gigabit Ethernet data streaming which can achieve 99.26% link efficiency while maintaining no packet losses. The design and implementation are built on OpenPET, an opensource data acquisition platform for nuclear medical imaging, where (a) a crate hosting multiple OpenPET detector boards uses a User Datagram Protocol over Internet Protocol (UDP/IP) Ethernet soft-core, that is capable of understanding PAUSE frames, to stream data out to a computer workstation; (b) the receiving computer uses Netmap to allow the processing software (i.e., user space), which is written in Python, to directly receive and manage the network card’s ring buffers, bypassing the operating system kernel’s networking stack; and (c) a multi-threaded application using synchronized queues is implemented in the processing software (Python) to free up the ring buffers as quickly as possible while preserving data integrity and flow continuity. PMID:28630948
Near Theoretical Gigabit Link Efficiency for Distributed Data Acquisition Systems.
Abu-Nimeh, Faisal T; Choong, Woon-Seng
2017-03-01
Link efficiency, data integrity, and continuity for high-throughput and real-time systems is crucial. Most of these applications require specialized hardware and operating systems as well as extensive tuning in order to achieve high efficiency. Here, we present an implementation of gigabit Ethernet data streaming which can achieve 99.26% link efficiency while maintaining no packet losses. The design and implementation are built on OpenPET, an opensource data acquisition platform for nuclear medical imaging, where (a) a crate hosting multiple OpenPET detector boards uses a User Datagram Protocol over Internet Protocol (UDP/IP) Ethernet soft-core, that is capable of understanding PAUSE frames, to stream data out to a computer workstation; (b) the receiving computer uses Netmap to allow the processing software (i.e., user space), which is written in Python, to directly receive and manage the network card's ring buffers, bypassing the operating system kernel's networking stack; and (c) a multi-threaded application using synchronized queues is implemented in the processing software (Python) to free up the ring buffers as quickly as possible while preserving data integrity and flow continuity.
Using all of your CPU's in HIPE
NASA Astrophysics Data System (ADS)
Jacobson, J. D.; Fadda, D.
2012-09-01
Modern computer architectures increasingly feature multi-core CPU's. For example, the MacbookPro features the Intel quad-core i7 processors. Through the use of hyper-threading, where each core can execute two threads simultaneously, the quad-core i7 can support eight simultaneous processing threads. All this on your laptop! This CPU power can now be put into service by scientists to perform data reduction tasks, but only if the software has been designed to take advantage of the multiple processor architectures. Up to now, software written for Herschel data reduction (HIPE), written in Jython and JAVA, is single-threaded and can only utilize a single processor. Users of HIPE do not get any advantage from the additional processors. Why not put all of the CPU resources to work reducing your data? We present a multi-threaded software application that corrects long-term transients in the signal from the PACS unchopped spectroscopy line scan mode. In this poster, we present a multi-threaded software framework to achieve performance improvements from parallel execution. We will show how a task to correct transients in the PACS Spectroscopy Pipeline for the un-chopped line scan mode, has been threaded. This computation-intensive task uses either a one-parameter or a three parameter exponential function, to characterize the transient. The task uses a JAVA implementation of Minpack, translated from the C (Moshier) and IDL (Markwardt) by the authors, to optimize the correction parameters. We also explain how to determine if a task can benefit from threading (Amdahl's Law), and if it is safe to thread. The design and implementation, using the JAVA concurrency package completions service is described. Pitfalls, timing bugs, thread safety, resource control, testing and performance improvements are described and plotted.
Parallel heuristics for scalable community detection
Lu, Hao; Halappanavar, Mahantesh; Kalyanaraman, Ananth
2015-08-14
Community detection has become a fundamental operation in numerous graph-theoretic applications. Despite its potential for application, there is only limited support for community detection on large-scale parallel computers, largely owing to the irregular and inherently sequential nature of the underlying heuristics. In this paper, we present parallelization heuristics for fast community detection using the Louvain method as the serial template. The Louvain method is an iterative heuristic for modularity optimization. Originally developed in 2008, the method has become increasingly popular owing to its ability to detect high modularity community partitions in a fast and memory-efficient manner. However, the method ismore » also inherently sequential, thereby limiting its scalability. Here, we observe certain key properties of this method that present challenges for its parallelization, and consequently propose heuristics that are designed to break the sequential barrier. For evaluation purposes, we implemented our heuristics using OpenMP multithreading, and tested them over real world graphs derived from multiple application domains. Compared to the serial Louvain implementation, our parallel implementation is able to produce community outputs with a higher modularity for most of the inputs tested, in comparable number or fewer iterations, while providing real speedups of up to 16x using 32 threads.« less
FODEM: A Multi-Threaded Research and Development Method for Educational Technology
ERIC Educational Resources Information Center
Suhonen, Jarkko; de Villiers, M. Ruth; Sutinen, Erkki
2012-01-01
Formative development method (FODEM) is a multithreaded design approach that was originated to support the design and development of various types of educational technology innovations, such as learning tools, and online study programmes. The threaded and agile structure of the approach provides flexibility to the design process. Intensive…
Development of a Next Generation Concurrent Framework for the ATLAS Experiment
NASA Astrophysics Data System (ADS)
Calafiura, P.; Lampl, W.; Leggett, C.; Malon, D.; Stewart, G.; Wynne, B.
2015-12-01
The ATLAS experiment has successfully used its Gaudi/Athena software framework for data taking and analysis during the first LHC run, with billions of events successfully processed. However, the design of Gaudi/Athena dates from early 2000 and the software and the physics code has been written using a single threaded, serial design. This programming model has increasing difficulty in exploiting the potential of current CPUs, which offer their best performance only through taking full advantage of multiple cores and wide vector registers. Future CPU evolution will intensify this trend, with core counts increasing and memory per core falling. With current memory consumption for 64 bit ATLAS reconstruction in a high luminosity environment approaching 4GB, it will become impossible to fully occupy all cores in a machine without exhausting available memory. However, since maximizing performance per watt will be a key metric, a mechanism must be found to use all cores as efficiently as possible. In this paper we report on our progress with a practical demonstration of the use of multithreading in the ATLAS reconstruction software, using the GaudiHive framework. We have expanded support to Calorimeter, Inner Detector, and Tracking code, discussing what changes were necessary in order to allow the serially designed ATLAS code to run, both to the framework and to the tools and algorithms used. We report on both the performance gains, and what general lessons were learned about the code patterns that had been employed in the software and which patterns were identified as particularly problematic for multi-threading. We also present our findings on implementing a hybrid multi-threaded / multi-process framework, to take advantage of the strengths of each type of concurrency, while avoiding some of their corresponding limitations.
Zhang, Xiaohua; Wong, Sergio E; Lightstone, Felice C
2013-04-30
A mixed parallel scheme that combines message passing interface (MPI) and multithreading was implemented in the AutoDock Vina molecular docking program. The resulting program, named VinaLC, was tested on the petascale high performance computing (HPC) machines at Lawrence Livermore National Laboratory. To exploit the typical cluster-type supercomputers, thousands of docking calculations were dispatched by the master process to run simultaneously on thousands of slave processes, where each docking calculation takes one slave process on one node, and within the node each docking calculation runs via multithreading on multiple CPU cores and shared memory. Input and output of the program and the data handling within the program were carefully designed to deal with large databases and ultimately achieve HPC on a large number of CPU cores. Parallel performance analysis of the VinaLC program shows that the code scales up to more than 15K CPUs with a very low overhead cost of 3.94%. One million flexible compound docking calculations took only 1.4 h to finish on about 15K CPUs. The docking accuracy of VinaLC has been validated against the DUD data set by the re-docking of X-ray ligands and an enrichment study, 64.4% of the top scoring poses have RMSD values under 2.0 Å. The program has been demonstrated to have good enrichment performance on 70% of the targets in the DUD data set. An analysis of the enrichment factors calculated at various percentages of the screening database indicates VinaLC has very good early recovery of actives. Copyright © 2013 Wiley Periodicals, Inc.
Results of SEI Independent Research and Development Projects
2008-12-01
contained there. When laptops with a dual-core processor came out, ITunes fails crashed. ITunes was designed as multi-threaded application, but until...involving product portfolio, in-bound technical marketing, research and development, product engineering, supply chain, and out-bound sales and marketing...of quality and process improvement professionals to the marketing, product engineering, supply chain, product test and sales professionals. 3
Playback system designed for X-Band SAR
NASA Astrophysics Data System (ADS)
Yuquan, Liu; Changyong, Dou
2014-03-01
SAR(Synthetic Aperture Radar) has extensive application because it is daylight and weather independent. In particular, X-Band SAR strip map, designed by Institute of Remote Sensing and Digital Earth, Chinese Academy of Sciences, provides high ground resolution images, at the same time it has a large spatial coverage and a short acquisition time, so it is promising in multi-applications. When sudden disaster comes, the emergency situation acquires radar signal data and image as soon as possible, in order to take action to reduce loss and save lives in the first time. This paper summarizes a type of X-Band SAR playback processing system designed for disaster response and scientific needs. It describes SAR data workflow includes the payload data transmission and reception process. Playback processing system completes signal analysis on the original data, providing SAR level 0 products and quick image. Gigabit network promises radar signal transmission efficiency from recorder to calculation unit. Multi-thread parallel computing and ping pong operation can ensure computation speed. Through gigabit network, multi-thread parallel computing and ping pong operation, high speed data transmission and processing meet the SAR radar data playback real time requirement.
BowMapCL: Burrows-Wheeler Mapping on Multiple Heterogeneous Accelerators.
Nogueira, David; Tomas, Pedro; Roma, Nuno
2016-01-01
The computational demand of exact-search procedures has pressed the exploitation of parallel processing accelerators to reduce the execution time of many applications. However, this often imposes strict restrictions in terms of the problem size and implementation efforts, mainly due to their possibly distinct architectures. To circumvent this limitation, a new exact-search alignment tool (BowMapCL) based on the Burrows-Wheeler Transform and FM-Index is presented. Contrasting to other alternatives, BowMapCL is based on a unified implementation using OpenCL, allowing the exploitation of multiple and possibly different devices (e.g., NVIDIA, AMD/ATI, and Intel GPUs/APUs). Furthermore, to efficiently exploit such heterogeneous architectures, BowMapCL incorporates several techniques to promote its performance and scalability, including multiple buffering, work-queue task-distribution, and dynamic load-balancing, together with index partitioning, bit-encoding, and sampling. When compared with state-of-the-art tools, the attained results showed that BowMapCL (using a single GPU) is 2 × to 7.5 × faster than mainstream multi-threaded CPU BWT-based aligners, like Bowtie, BWA, and SOAP2; and up to 4 × faster than the best performing state-of-the-art GPU implementations (namely, SOAP3 and HPG-BWT). When multiple and completely distinct devices are considered, BowMapCL efficiently scales the offered throughput, ensuring a convenient load-balance of the involved processing in the several distinct devices.
HPC Profiling with the Sun Studio™ Performance Tools
NASA Astrophysics Data System (ADS)
Itzkowitz, Marty; Maruyama, Yukon
In this paper, we describe how to use the Sun Studio Performance Tools to understand the nature and causes of application performance problems. We first explore CPU and memory performance problems for single-threaded applications, giving some simple examples. Then, we discuss multi-threaded performance issues, such as locking and false-sharing of cache lines, in each case showing how the tools can help. We go on to describe OpenMP applications and the support for them in the performance tools. Then we discuss MPI applications, and the techniques used to profile them. Finally, we present our conclusions.
Platform-Independence and Scheduling In a Multi-Threaded Real-Time Simulation
NASA Technical Reports Server (NTRS)
Sugden, Paul P.; Rau, Melissa A.; Kenney, P. Sean
2001-01-01
Aviation research often relies on real-time, pilot-in-the-loop flight simulation as a means to develop new flight software, flight hardware, or pilot procedures. Often these simulations become so complex that a single processor is incapable of performing the necessary computations within a fixed time-step. Threads are an elegant means to distribute the computational work-load when running on a symmetric multi-processor machine. However, programming with threads often requires operating system specific calls that reduce code portability and maintainability. While a multi-threaded simulation allows a significant increase in the simulation complexity, it also increases the workload of a simulation operator by requiring that the operator determine which models run on which thread. To address these concerns an object-oriented design was implemented in the NASA Langley Standard Real-Time Simulation in C++ (LaSRS++) application framework. The design provides a portable and maintainable means to use threads and also provides a mechanism to automatically load balance the simulation models.
NASA Astrophysics Data System (ADS)
Woelfle-Erskine, C. A.; Wilcox, A. C.
2009-12-01
Active restoration approaches such as channel reconstruction have moved beyond the realm of small streams and are being applied to larger rivers. Uncertainties arising from limited knowledge, fluvial and ecosystem variability, and contaminants are especially significant in restoration of large rivers, where project costs and the social, infrastructural, and ecological costs of failure are high. We use the case of Milltown Dam removal on the Clark Fork River, Montana and subsequent channel reconstruction in the former reservoir to examine the use of historical research and uncertainty analysis in river restoration. At a cost of approximately $120 million, the Milltown Dam removal involves the mechanical removal of approximately 2 million cubic meters of sediments contaminated by upstream mining, followed by restoration of the former reservoir reach in which a single-thread meandering channel is being constructed. Historical maps, surveys, photographs, and accounts suggest a conceptual model of a multi-thread, anastomosing river in the reach targeted for channel reconstruction, upstream of the confluence of the Clark Fork and Blackfoot Rivers. We supplemented historical research with analysis of aerial photographs, topographic data, and USGS stage-discharge measurements in a lotic but reservoir-influenced reach of the Clark Fork River within our study area to estimate avulsion frequency (0.8 avulsions/year over a 70-year period) and average rates of lateral migration and aggradation. These were used to calculate the mobility number, a dimensionless relationship between channel filling and lateral migration timescales that can be used to predict whether a river’s planform is single or multi-threaded. The mobility number within our study reach ranged from 0.6 (multi-thread channel) to 1.7 (transitional channel). We predict that, in the absence of active channel reconstruction, the post-dam channel pattern would evolve to one that alternates between single and multi-threaded. We propose that multiple working hypotheses should be applied to managing uncertainty as part of an adaptive management plan for restoration in our study area and elsewhere. In this approach, restoration planning and implementation would be underpinned by an explicitly identified set of uncertainties and hypotheses about channel processes and post-restoration responses. This framework would allow for and embrace channel processes such as bifurcations and avulsions that are excluded from dominant approaches to channel reconstruction, which emphasize single-thread meandering planforms.
Liu, Xing; Hou, Kun Mean; de Vaulx, Christophe; Xu, Jun; Yang, Jianfeng; Zhou, Haiying; Shi, Hongling; Zhou, Peng
2015-01-01
Memory and energy optimization strategies are essential for the resource-constrained wireless sensor network (WSN) nodes. In this article, a new memory-optimized and energy-optimized multithreaded WSN operating system (OS) LiveOS is designed and implemented. Memory cost of LiveOS is optimized by using the stack-shifting hybrid scheduling approach. Different from the traditional multithreaded OS in which thread stacks are allocated statically by the pre-reservation, thread stacks in LiveOS are allocated dynamically by using the stack-shifting technique. As a result, memory waste problems caused by the static pre-reservation can be avoided. In addition to the stack-shifting dynamic allocation approach, the hybrid scheduling mechanism which can decrease both the thread scheduling overhead and the thread stack number is also implemented in LiveOS. With these mechanisms, the stack memory cost of LiveOS can be reduced more than 50% if compared to that of a traditional multithreaded OS. Not is memory cost optimized, but also the energy cost is optimized in LiveOS, and this is achieved by using the multi-core “context aware” and multi-core “power-off/wakeup” energy conservation approaches. By using these approaches, energy cost of LiveOS can be reduced more than 30% when compared to the single-core WSN system. Memory and energy optimization strategies in LiveOS not only prolong the lifetime of WSN nodes, but also make the multithreaded OS feasible to run on the memory-constrained WSN nodes. PMID:25545264
NASA Astrophysics Data System (ADS)
Abrudean, C.
2017-05-01
Due to multiple reflexions on walls, the electromagnetic field in a multimode microwave oven is difficult to estimate analytically. This paper presents a C++ program that calculates the electromagnetic field in a resonating cavity with an absorbing payload, uses the result to calculate heating in the payload taking its properties into account and then repeats. This results in a simulation of microwave heating, including phenomena like thermal runaway. The program is multithreaded to make use of today’s common multiprocessor/multicore computers.
Parallelization strategies for continuum-generalized method of moments on the multi-thread systems
NASA Astrophysics Data System (ADS)
Bustamam, A.; Handhika, T.; Ernastuti, Kerami, D.
2017-07-01
Continuum-Generalized Method of Moments (C-GMM) covers the Generalized Method of Moments (GMM) shortfall which is not as efficient as Maximum Likelihood estimator by using the continuum set of moment conditions in a GMM framework. However, this computation would take a very long time since optimizing regularization parameter. Unfortunately, these calculations are processed sequentially whereas in fact all modern computers are now supported by hierarchical memory systems and hyperthreading technology, which allowing for parallel computing. This paper aims to speed up the calculation process of C-GMM by designing a parallel algorithm for C-GMM on the multi-thread systems. First, parallel regions are detected for the original C-GMM algorithm. There are two parallel regions in the original C-GMM algorithm, that are contributed significantly to the reduction of computational time: the outer-loop and the inner-loop. Furthermore, this parallel algorithm will be implemented with standard shared-memory application programming interface, i.e. Open Multi-Processing (OpenMP). The experiment shows that the outer-loop parallelization is the best strategy for any number of observations.
Evolution of the ATLAS Software Framework towards Concurrency
NASA Astrophysics Data System (ADS)
Jones, R. W. L.; Stewart, G. A.; Leggett, C.; Wynne, B. M.
2015-05-01
The ATLAS experiment has successfully used its Gaudi/Athena software framework for data taking and analysis during the first LHC run, with billions of events successfully processed. However, the design of Gaudi/Athena dates from early 2000 and the software and the physics code has been written using a single threaded, serial design. This programming model has increasing difficulty in exploiting the potential of current CPUs, which offer their best performance only through taking full advantage of multiple cores and wide vector registers. Future CPU evolution will intensify this trend, with core counts increasing and memory per core falling. Maximising performance per watt will be a key metric, so all of these cores must be used as efficiently as possible. In order to address the deficiencies of the current framework, ATLAS has embarked upon two projects: first, a practical demonstration of the use of multi-threading in our reconstruction software, using the GaudiHive framework; second, an exercise to gather requirements for an updated framework, going back to the first principles of how event processing occurs. In this paper we report on both these aspects of our work. For the hive based demonstrators, we discuss what changes were necessary in order to allow the serially designed ATLAS code to run, both to the framework and to the tools and algorithms used. We report on what general lessons were learned about the code patterns that had been employed in the software and which patterns were identified as particularly problematic for multi-threading. These lessons were fed into our considerations of a new framework and we present preliminary conclusions on this work. In particular we identify areas where the framework can be simplified in order to aid the implementation of a concurrent event processing scheme. Finally, we discuss the practical difficulties involved in migrating a large established code base to a multi-threaded framework and how this can be achieved for LHC Run 3.
AthenaMT: upgrading the ATLAS software framework for the many-core world with multi-threading
NASA Astrophysics Data System (ADS)
Leggett, Charles; Baines, John; Bold, Tomasz; Calafiura, Paolo; Farrell, Steven; van Gemmeren, Peter; Malon, David; Ritsch, Elmar; Stewart, Graeme; Snyder, Scott; Tsulaia, Vakhtang; Wynne, Benjamin; ATLAS Collaboration
2017-10-01
ATLAS’s current software framework, Gaudi/Athena, has been very successful for the experiment in LHC Runs 1 and 2. However, its single threaded design has been recognized for some time to be increasingly problematic as CPUs have increased core counts and decreased available memory per core. Even the multi-process version of Athena, AthenaMP, will not scale to the range of architectures we expect to use beyond Run2. After concluding a rigorous requirements phase, where many design components were examined in detail, ATLAS has begun the migration to a new data-flow driven, multi-threaded framework, which enables the simultaneous processing of singleton, thread unsafe legacy Algorithms, cloned Algorithms that execute concurrently in their own threads with different Event contexts, and fully re-entrant, thread safe Algorithms. In this paper we report on the process of modifying the framework to safely process multiple concurrent events in different threads, which entails significant changes in the underlying handling of features such as event and time dependent data, asynchronous callbacks, metadata, integration with the online High Level Trigger for partial processing in certain regions of interest, concurrent I/O, as well as ensuring thread safety of core services. We also report on upgrading the framework to handle Algorithms that are fully re-entrant.
MEGAnnotator: a user-friendly pipeline for microbial genomes assembly and annotation.
Lugli, Gabriele Andrea; Milani, Christian; Mancabelli, Leonardo; van Sinderen, Douwe; Ventura, Marco
2016-04-01
Genome annotation is one of the key actions that must be undertaken in order to decipher the genetic blueprint of organisms. Thus, a correct and reliable annotation is essential in rendering genomic data valuable. Here, we describe a bioinformatics pipeline based on freely available software programs coordinated by a multithreaded script named MEGAnnotator (Multithreaded Enhanced prokaryotic Genome Annotator). This pipeline allows the generation of multiple annotated formats fulfilling the NCBI guidelines for assembled microbial genome submission, based on DNA shotgun sequencing reads, and minimizes manual intervention, while also reducing waiting times between software program executions and improving final quality of both assembly and annotation outputs. MEGAnnotator provides an efficient way to pre-arrange the assembly and annotation work required to process NGS genome sequence data. The script improves the final quality of microbial genome annotation by reducing ambiguous annotations. Moreover, the MEGAnnotator platform allows the user to perform a partial annotation of pre-assembled genomes and includes an option to accomplish metagenomic data set assemblies. MEGAnnotator platform will be useful for microbiologists interested in genome analyses of bacteria as well as those investigating the complexity of microbial communities that do not possess the necessary skills to prepare their own bioinformatics pipeline. © FEMS 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
A distributed infrastructure for publishing VO services: an implementation
NASA Astrophysics Data System (ADS)
Cepparo, Francesco; Scagnetto, Ivan; Molinaro, Marco; Smareglia, Riccardo
2016-07-01
This contribution describes both the design and the implementation details of a new solution for publishing VO services, enlightening its maintainable, distributed, modular and scalable architecture. Indeed, the new publisher is multithreaded and multiprocess. Multiple instances of the modules can run on different machines to ensure high performance and high availability, and this will be true both for the interface modules of the services and the back end data access ones. The system uses message passing to let its components communicate through an AMQP message broker that can itself be distributed to provide better scalability and availability.
Toward Enhancing OpenMP's Work-Sharing Directives
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chapman, B M; Huang, L; Jin, H
2006-05-17
OpenMP provides a portable programming interface for shared memory parallel computers (SMPs). Although this interface has proven successful for small SMPs, it requires greater flexibility in light of the steadily growing size of individual SMPs and the recent advent of multithreaded chips. In this paper, we describe two application development experiences that exposed these expressivity problems in the current OpenMP specification. We then propose mechanisms to overcome these limitations, including thread subteams and thread topologies. Thus, we identify language features that improve OpenMP application performance on emerging and large-scale platforms while preserving ease of programming.
orthAgogue: an agile tool for the rapid prediction of orthology relations.
Ekseth, Ole Kristian; Kuiper, Martin; Mironov, Vladimir
2014-03-01
The comparison of genes and gene products across species depends on high-quality tools to determine the relationships between gene or protein sequences from various species. Although some excellent applications are available and widely used, their performance leaves room for improvement. We developed orthAgogue: a multithreaded C application for high-speed estimation of homology relations in massive datasets, operated via a flexible and easy command-line interface. The orthAgogue software is distributed under the GNU license. The source code and binaries compiled for Linux are available at https://code.google.com/p/orthagogue/.
Multi-thread parallel algorithm for reconstructing 3D large-scale porous structures
NASA Astrophysics Data System (ADS)
Ju, Yang; Huang, Yaohui; Zheng, Jiangtao; Qian, Xu; Xie, Heping; Zhao, Xi
2017-04-01
Geomaterials inherently contain many discontinuous, multi-scale, geometrically irregular pores, forming a complex porous structure that governs their mechanical and transport properties. The development of an efficient reconstruction method for representing porous structures can significantly contribute toward providing a better understanding of the governing effects of porous structures on the properties of porous materials. In order to improve the efficiency of reconstructing large-scale porous structures, a multi-thread parallel scheme was incorporated into the simulated annealing reconstruction method. In the method, four correlation functions, which include the two-point probability function, the linear-path functions for the pore phase and the solid phase, and the fractal system function for the solid phase, were employed for better reproduction of the complex well-connected porous structures. In addition, a random sphere packing method and a self-developed pre-conditioning method were incorporated to cast the initial reconstructed model and select independent interchanging pairs for parallel multi-thread calculation, respectively. The accuracy of the proposed algorithm was evaluated by examining the similarity between the reconstructed structure and a prototype in terms of their geometrical, topological, and mechanical properties. Comparisons of the reconstruction efficiency of porous models with various scales indicated that the parallel multi-thread scheme significantly shortened the execution time for reconstruction of a large-scale well-connected porous model compared to a sequential single-thread procedure.
SISSY: An example of a multi-threaded, networked, object-oriented databased application
DOE Office of Scientific and Technical Information (OSTI.GOV)
Scipioni, B.; Liu, D.; Song, T.
1993-05-01
The Systems Integration Support SYstem (SISSY) is presented and its capabilities and techniques are discussed. It is fully automated data collection and analysis system supporting the SSCL`s systems analysis activities as they relate to the Physics Detector and Simulation Facility (PDSF). SISSY itself is a paradigm of effective computing on the PDSF. It uses home-grown code (C++), network programming (RPC, SNMP), relational (SYBASE) and object-oriented (ObjectStore) DBMSs, UNIX operating system services (IRIX threads, cron, system utilities, shells scripts, etc.), and third party software applications (NetCentral Station, Wingz, DataLink) all of which act together as a single application to monitor andmore » analyze the PDSF.« less
Parallel Task Management Library for MARTe
NASA Astrophysics Data System (ADS)
Valcarcel, Daniel F.; Alves, Diogo; Neto, Andre; Reux, Cedric; Carvalho, Bernardo B.; Felton, Robert; Lomas, Peter J.; Sousa, Jorge; Zabeo, Luca
2014-06-01
The Multithreaded Application Real-Time executor (MARTe) is a real-time framework with increasing popularity and support in the thermonuclear fusion community. It allows modular code to run in a multi-threaded environment leveraging on the current multi-core processor (CPU) technology. One application that relies on the MARTe framework is the Joint European Torus (JET) tokamak WAll Load Limiter System (WALLS). It calculates and monitors the temperature on metal tiles and plasma facing components (PFCs) that can melt or flake if their temperature gets too high when exposed to power loads. One of the main time consuming tasks in WALLS is the calculation of thermal diffusion models in real-time. These models tend to be described by very large state-space models thus making them perfect candidates for parallelisation. MARTe's traditional approach for task parallelisation is to split the problem into several Real-Time Threads, each responsible for a self-contained sequential execution of an input-to-output chain. This is usually possible, but it might not always be practical for algorithmic or technical reasons. Also, it might not be easily scalable with an increase in the number of available CPU cores. The WorkLibrary introduces a “GPU-like approach” of splitting work among the available cores of modern CPUs that is (i) straightforward to use in an application, (ii) scalable with the availability of cores and all of this (iii) without rewriting or recompiling the source code. The first part of this article explains the motivation behind the library, its architecture and implementation. The second part presents a real application for WALLS, a parallel version of a large state-space model describing the 2D thermal diffusion on a JET tile.
Development of an extensible dual-core wireless sensing node for cyber-physical systems
NASA Astrophysics Data System (ADS)
Kane, Michael; Zhu, Dapeng; Hirose, Mitsuhito; Dong, Xinjun; Winter, Benjamin; Häckell, Mortiz; Lynch, Jerome P.; Wang, Yang; Swartz, A.
2014-04-01
The introduction of wireless telemetry into the design of monitoring and control systems has been shown to reduce system costs while simplifying installations. To date, wireless nodes proposed for sensing and actuation in cyberphysical systems have been designed using microcontrollers with one computational pipeline (i.e., single-core microcontrollers). While concurrent code execution can be implemented on single-core microcontrollers, concurrency is emulated by splitting the pipeline's resources to support multiple threads of code execution. For many applications, this approach to multi-threading is acceptable in terms of speed and function. However, some applications such as feedback controls demand deterministic timing of code execution and maximum computational throughput. For these applications, the adoption of multi-core processor architectures represents one effective solution. Multi-core microcontrollers have multiple computational pipelines that can execute embedded code in parallel and can be interrupted independent of one another. In this study, a new wireless platform named Martlet is introduced with a dual-core microcontroller adopted in its design. The dual-core microcontroller design allows Martlet to dedicate one core to standard wireless sensor operations while the other core is reserved for embedded data processing and real-time feedback control law execution. Another distinct feature of Martlet is a standardized hardware interface that allows specialized daughter boards (termed wing boards) to be interfaced to the Martlet baseboard. This extensibility opens opportunity to encapsulate specialized sensing and actuation functions in a wing board without altering the design of Martlet. In addition to describing the design of Martlet, a few example wings are detailed, along with experiments showing the Martlet's ability to monitor and control physical systems such as wind turbines and buildings.
A fast CT reconstruction scheme for a general multi-core PC.
Zeng, Kai; Bai, Erwei; Wang, Ge
2007-01-01
Expensive computational cost is a severe limitation in CT reconstruction for clinical applications that need real-time feedback. A primary example is bolus-chasing computed tomography (CT) angiography (BCA) that we have been developing for the past several years. To accelerate the reconstruction process using the filtered backprojection (FBP) method, specialized hardware or graphics cards can be used. However, specialized hardware is expensive and not flexible. The graphics processing unit (GPU) in a current graphic card can only reconstruct images in a reduced precision and is not easy to program. In this paper, an acceleration scheme is proposed based on a multi-core PC. In the proposed scheme, several techniques are integrated, including utilization of geometric symmetry, optimization of data structures, single-instruction multiple-data (SIMD) processing, multithreaded computation, and an Intel C++ compilier. Our scheme maintains the original precision and involves no data exchange between the GPU and CPU. The merits of our scheme are demonstrated in numerical experiments against the traditional implementation. Our scheme achieves a speedup of about 40, which can be further improved by several folds using the latest quad-core processors.
A Fast CT Reconstruction Scheme for a General Multi-Core PC
Zeng, Kai; Bai, Erwei; Wang, Ge
2007-01-01
Expensive computational cost is a severe limitation in CT reconstruction for clinical applications that need real-time feedback. A primary example is bolus-chasing computed tomography (CT) angiography (BCA) that we have been developing for the past several years. To accelerate the reconstruction process using the filtered backprojection (FBP) method, specialized hardware or graphics cards can be used. However, specialized hardware is expensive and not flexible. The graphics processing unit (GPU) in a current graphic card can only reconstruct images in a reduced precision and is not easy to program. In this paper, an acceleration scheme is proposed based on a multi-core PC. In the proposed scheme, several techniques are integrated, including utilization of geometric symmetry, optimization of data structures, single-instruction multiple-data (SIMD) processing, multithreaded computation, and an Intel C++ compilier. Our scheme maintains the original precision and involves no data exchange between the GPU and CPU. The merits of our scheme are demonstrated in numerical experiments against the traditional implementation. Our scheme achieves a speedup of about 40, which can be further improved by several folds using the latest quad-core processors. PMID:18256731
NASA Astrophysics Data System (ADS)
Reyes, J. C.; Vernon, F. L.; Newman, R. L.; Steidl, J. H.
2010-12-01
The Waveform Server is an interactive web-based interface to multi-station, multi-sensor and multi-channel high-density time-series data stored in Center for Seismic Studies (CSS) 3.0 schema relational databases (Newman et al., 2009). In the last twelve months, based on expanded specifications and current user feedback, both the server-side infrastructure and client-side interface have been extensively rewritten. The Python Twisted server-side code-base has been fundamentally modified to now present waveform data stored in cluster-based databases using a multi-threaded architecture, in addition to supporting the pre-existing single database model. This allows interactive web-based access to high-density (broadband @ 40Hz to strong motion @ 200Hz) waveform data that can span multiple years; the common lifetime of broadband seismic networks. The client-side interface expands on it's use of simple JSON-based AJAX queries to now incorporate a variety of User Interface (UI) improvements including standardized calendars for defining time ranges, applying on-the-fly data calibration to display SI-unit data, and increased rendering speed. This presentation will outline the various cyber infrastructure challenges we have faced while developing this application, the use-cases currently in existence, and the limitations of web-based application development.
Massively Multithreaded Maxflow for Image Segmentation on the Cray XMT-2
Bokhari, Shahid H.; Çatalyürek, Ümit V.; Gurcan, Metin N.
2014-01-01
SUMMARY Image segmentation is a very important step in the computerized analysis of digital images. The maxflow mincut approach has been successfully used to obtain minimum energy segmentations of images in many fields. Classical algorithms for maxflow in networks do not directly lend themselves to efficient parallel implementations on contemporary parallel processors. We present the results of an implementation of Goldberg-Tarjan preflow-push algorithm on the Cray XMT-2 massively multithreaded supercomputer. This machine has hardware support for 128 threads in each physical processor, a uniformly accessible shared memory of up to 4 TB and hardware synchronization for each 64 bit word. It is thus well-suited to the parallelization of graph theoretic algorithms, such as preflow-push. We describe the implementation of the preflow-push code on the XMT-2 and present the results of timing experiments on a series of synthetically generated as well as real images. Our results indicate very good performance on large images and pave the way for practical applications of this machine architecture for image analysis in a production setting. The largest images we have run are 320002 pixels in size, which are well beyond the largest previously reported in the literature. PMID:25598745
Modernizing the ATLAS simulation infrastructure
NASA Astrophysics Data System (ADS)
Di Simone, A.; CollaborationAlbert-Ludwigs-Universitt Freiburg, ATLAS; Institut, Physikalisches; Br., 79104 Freiburg i.; Germany
2017-10-01
The ATLAS Simulation infrastructure has been used to produce upwards of 50 billion proton-proton collision events for analyses ranging from detailed Standard Model measurements to searches for exotic new phenomena. In the last several years, the infrastructure has been heavily revised to allow intuitive multithreading and significantly improved maintainability. Such a massive update of a legacy code base requires careful choices about what pieces of code to completely rewrite and what to wrap or revise. The initialization of the complex geometry was generalized to allow new tools and geometry description languages, popular in some detector groups. The addition of multithreading requires Geant4-MT and GaudiHive, two frameworks with fundamentally different approaches to multithreading, to work together. It also required enforcing thread safety throughout a large code base, which required the redesign of several aspects of the simulation, including truth, the record of particle interactions with the detector during the simulation. These advances were possible thanks to close interactions with the Geant4 developers.
A Metascalable Computing Framework for Large Spatiotemporal-Scale Atomistic Simulations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nomura, K; Seymour, R; Wang, W
2009-02-17
A metascalable (or 'design once, scale on new architectures') parallel computing framework has been developed for large spatiotemporal-scale atomistic simulations of materials based on spatiotemporal data locality principles, which is expected to scale on emerging multipetaflops architectures. The framework consists of: (1) an embedded divide-and-conquer (EDC) algorithmic framework based on spatial locality to design linear-scaling algorithms for high complexity problems; (2) a space-time-ensemble parallel (STEP) approach based on temporal locality to predict long-time dynamics, while introducing multiple parallelization axes; and (3) a tunable hierarchical cellular decomposition (HCD) parallelization framework to map these O(N) algorithms onto a multicore cluster based onmore » hybrid implementation combining message passing and critical section-free multithreading. The EDC-STEP-HCD framework exposes maximal concurrency and data locality, thereby achieving: (1) inter-node parallel efficiency well over 0.95 for 218 billion-atom molecular-dynamics and 1.68 trillion electronic-degrees-of-freedom quantum-mechanical simulations on 212,992 IBM BlueGene/L processors (superscalability); (2) high intra-node, multithreading parallel efficiency (nanoscalability); and (3) nearly perfect time/ensemble parallel efficiency (eon-scalability). The spatiotemporal scale covered by MD simulation on a sustained petaflops computer per day (i.e. petaflops {center_dot} day of computing) is estimated as NT = 2.14 (e.g. N = 2.14 million atoms for T = 1 microseconds).« less
A multi-threaded version of MCFM
DOE Office of Scientific and Technical Information (OSTI.GOV)
Campbell, John M.; Ellis, R. Keith; Giele, Walter T.
We report on our findings modifying MCFM using OpenMP to implement multi-threading. By using OpenMP, the modified MCFM will execute on any processor, automatically adjusting to the number of available threads. We then modified the integration routine VEGAS to distribute the event evaluation over the threads, while combining all events at the end of every iteration to optimize the numerical integration. Furthermore, we took special care so that the results of the Monte Carlo integration were independent of the number of threads used, to facilitate the validation of the OpenMP version of MCFM.
Parallel Lattice Basis Reduction Using a Multi-threaded Schnorr-Euchner LLL Algorithm
NASA Astrophysics Data System (ADS)
Backes, Werner; Wetzel, Susanne
In this paper, we introduce a new parallel variant of the LLL lattice basis reduction algorithm. Our new, multi-threaded algorithm is the first to provide an efficient, parallel implementation of the Schorr-Euchner algorithm for today’s multi-processor, multi-core computer architectures. Experiments with sparse and dense lattice bases show a speed-up factor of about 1.8 for the 2-thread and about factor 3.2 for the 4-thread version of our new parallel lattice basis reduction algorithm in comparison to the traditional non-parallel algorithm.
Expressing Parallelism with ROOT
NASA Astrophysics Data System (ADS)
Piparo, D.; Tejedor, E.; Guiraud, E.; Ganis, G.; Mato, P.; Moneta, L.; Valls Pla, X.; Canal, P.
2017-10-01
The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments. The incarnations of the aforementioned parallelism are multi-threading, multi-processing and cluster-wide executions. In the area of multi-threading, we discuss the new implicit parallelism and related interfaces, as well as the new building blocks to safely operate with ROOT objects in a multi-threaded environment. Regarding multi-processing, we review the new MultiProc framework, comparing it with similar tools (e.g. multiprocessing module in Python). Finally, as an alternative to PROOF for cluster-wide executions, we introduce the efforts on integrating ROOT with state-of-the-art distributed data processing technologies like Spark, both in terms of programming model and runtime design (with EOS as one of the main components). For all the levels of parallelism, we discuss, based on real-life examples and measurements, how our proposals can increase the productivity of scientists.
Expressing Parallelism with ROOT
DOE Office of Scientific and Technical Information (OSTI.GOV)
Piparo, D.; Tejedor, E.; Guiraud, E.
The need for processing the ever-increasing amount of data generated by the LHC experiments in a more efficient way has motivated ROOT to further develop its support for parallelism. Such support is being tackled both for shared-memory and distributed-memory environments. The incarnations of the aforementioned parallelism are multi-threading, multi-processing and cluster-wide executions. In the area of multi-threading, we discuss the new implicit parallelism and related interfaces, as well as the new building blocks to safely operate with ROOT objects in a multi-threaded environment. Regarding multi-processing, we review the new MultiProc framework, comparing it with similar tools (e.g. multiprocessing module inmore » Python). Finally, as an alternative to PROOF for cluster-wide executions, we introduce the efforts on integrating ROOT with state-of-the-art distributed data processing technologies like Spark, both in terms of programming model and runtime design (with EOS as one of the main components). For all the levels of parallelism, we discuss, based on real-life examples and measurements, how our proposals can increase the productivity of scientists.« less
Image-based 3D reconstruction and virtual environmental walk-through
NASA Astrophysics Data System (ADS)
Sun, Jifeng; Fang, Lixiong; Luo, Ying
2001-09-01
We present a 3D reconstruction method, which combines geometry-based modeling, image-based modeling and rendering techniques. The first component is an interactive geometry modeling method which recovery of the basic geometry of the photographed scene. The second component is model-based stereo algorithm. We discus the image processing problems and algorithms of walking through in virtual space, then designs and implement a high performance multi-thread wandering algorithm. The applications range from architectural planning and archaeological reconstruction to virtual environments and cinematic special effects.
Parallel k-means++ for Multiple Shared-Memory Architectures
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mackey, Patrick S.; Lewis, Robert R.
2016-09-22
In recent years k-means++ has become a popular initialization technique for improved k-means clustering. To date, most of the work done to improve its performance has involved parallelizing algorithms that are only approximations of k-means++. In this paper we present a parallelization of the exact k-means++ algorithm, with a proof of its correctness. We develop implementations for three distinct shared-memory architectures: multicore CPU, high performance GPU, and the massively multithreaded Cray XMT platform. We demonstrate the scalability of the algorithm on each platform. In addition we present a visual approach for showing which platform performed k-means++ the fastest for varyingmore » data sizes.« less
MetAlign 3.0: performance enhancement by efficient use of advances in computer hardware.
Lommen, Arjen; Kools, Harrie J
2012-08-01
A new, multi-threaded version of the GC-MS and LC-MS data processing software, metAlign, has been developed which is able to utilize multiple cores on one PC. This new version was tested using three different multi-core PCs with different operating systems. The performance of noise reduction, baseline correction and peak-picking was 8-19 fold faster compared to the previous version on a single core machine from 2008. The alignment was 5-10 fold faster. Factors influencing the performance enhancement are discussed. Our observations show that performance scales with the increase in processor core numbers we currently see in consumer PC hardware development.
NASA Astrophysics Data System (ADS)
Huang, Melin; Huang, Bormin; Huang, Allen H.
2014-10-01
The Weather Research and Forecasting (WRF) model provided operational services worldwide in many areas and has linked to our daily activity, in particular during severe weather events. The scheme of Yonsei University (YSU) is one of planetary boundary layer (PBL) models in WRF. The PBL is responsible for vertical sub-grid-scale fluxes due to eddy transports in the whole atmospheric column, determines the flux profiles within the well-mixed boundary layer and the stable layer, and thus provide atmospheric tendencies of temperature, moisture (including clouds), and horizontal momentum in the entire atmospheric column. The YSU scheme is very suitable for massively parallel computation as there are no interactions among horizontal grid points. To accelerate the computation process of the YSU scheme, we employ Intel Many Integrated Core (MIC) Architecture as it is a multiprocessor computer structure with merits of efficient parallelization and vectorization essentials. Our results show that the MIC-based optimization improved the performance of the first version of multi-threaded code on Xeon Phi 5110P by a factor of 2.4x. Furthermore, the same CPU-based optimizations improved the performance on Intel Xeon E5-2603 by a factor of 1.6x as compared to the first version of multi-threaded code.
Numericware i: Identical by State Matrix Calculator
Kim, Bongsong; Beavis, William D
2017-01-01
We introduce software, Numericware i, to compute identical by state (IBS) matrix based on genotypic data. Calculating an IBS matrix with a large dataset requires large computer memory and takes lengthy processing time. Numericware i addresses these challenges with 2 algorithmic methods: multithreading and forward chopping. The multithreading allows computational routines to concurrently run on multiple central processing unit (CPU) processors. The forward chopping addresses memory limitation by dividing a dataset into appropriately sized subsets. Numericware i allows calculation of the IBS matrix for a large genotypic dataset using a laptop or a desktop computer. For comparison with different software, we calculated genetic relationship matrices using Numericware i, SPAGeDi, and TASSEL with the same genotypic dataset. Numericware i calculates IBS coefficients between 0 and 2, whereas SPAGeDi and TASSEL produce different ranges of values including negative values. The Pearson correlation coefficient between the matrices from Numericware i and TASSEL was high at .9972, whereas SPAGeDi showed low correlation with Numericware i (.0505) and TASSEL (.0587). With a high-dimensional dataset of 500 entities by 10 000 000 SNPs, Numericware i spent 382 minutes using 19 CPU threads and 64 GB memory by dividing the dataset into 3 pieces, whereas SPAGeDi and TASSEL failed with the same dataset. Numericware i is freely available for Windows and Linux under CC-BY 4.0 license at https://figshare.com/s/f100f33a8857131eb2db. PMID:28469375
Generic accelerated sequence alignment in SeqAn using vectorization and multi-threading.
Rahn, René; Budach, Stefan; Costanza, Pascal; Ehrhardt, Marcel; Hancox, Jonny; Reinert, Knut
2018-05-03
Pairwise sequence alignment is undoubtedly a central tool in many bioinformatics analyses. In this paper, we present a generically accelerated module for pairwise sequence alignments applicable for a broad range of applications. In our module, we unified the standard dynamic programming kernel used for pairwise sequence alignments and extended it with a generalized inter-sequence vectorization layout, such that many alignments can be computed simultaneously by exploiting SIMD (Single Instruction Multiple Data) instructions of modern processors. We then extended the module by adding two layers of thread-level parallelization, where we a) distribute many independent alignments on multiple threads and b) inherently parallelize a single alignment computation using a work stealing approach producing a dynamic wavefront progressing along the minor diagonal. We evaluated our alignment vectorization and parallelization on different processors, including the newest Intel® Xeon® (Skylake) and Intel® Xeon Phi™ (KNL) processors, and use cases. The instruction set AVX512-BW (Byte and Word), available on Skylake processors, can genuinely improve the performance of vectorized alignments. We could run single alignments 1600 times faster on the Xeon Phi™ and 1400 times faster on the Xeon® than executing them with our previous sequential alignment module. The module is programmed in C++ using the SeqAn (Reinert et al., 2017) library and distributed with version 2.4. under the BSD license. We support SSE4, AVX2, AVX512 instructions and included UME::SIMD, a SIMD-instruction wrapper library, to extend our module for further instruction sets. We thoroughly test all alignment components with all major C++ compilers on various platforms. rene.rahn@fu-berlin.de.
NASA Astrophysics Data System (ADS)
Lohn, Stefan B.; Dong, Xin; Carminati, Federico
2012-12-01
Chip-Multiprocessors are going to support massive parallelism by many additional physical and logical cores. Improving performance can no longer be obtained by increasing clock-frequency because the technical limits are almost reached. Instead, parallel execution must be used to gain performance. Resources like main memory, the cache hierarchy, bandwidth of the memory bus or links between cores and sockets are not going to be improved as fast. Hence, parallelism can only result into performance gains if the memory usage is optimized and the communication between threads is minimized. Besides concurrent programming has become a domain for experts. Implementing multi-threading is error prone and labor-intensive. A full reimplementation of the whole AliRoot source-code is unaffordable. This paper describes the effort to evaluate the adaption of AliRoot to the needs of multi-threading and to provide the capability of parallel processing by using a semi-automatic source-to-source transformation to address the problems as described before and to provide a straight-forward way of parallelization with almost no interference between threads. This makes the approach simple and reduces the required manual changes in the code. In a first step, unconditional thread-safety will be introduced to bring the original sequential and thread unaware source-code into the position of utilizing multi-threading. Afterwards further investigations have to be performed to point out candidates of classes that are useful to share amongst threads. Then in a second step, the transformation has to change the code to share these classes and finally to verify if there are anymore invalid interferences between threads.
Activity-Centric Approach to Distributed Programming
NASA Technical Reports Server (NTRS)
Levy, Renato; Satapathy, Goutam; Lang, Jun
2004-01-01
The first phase of an effort to develop a NASA version of the Cybele software system has been completed. To give meaning to even a highly abbreviated summary of the modifications to be embodied in the NASA version, it is necessary to present the following background information on Cybele: Cybele is a proprietary software infrastructure for use by programmers in developing agent-based application programs [complex application programs that contain autonomous, interacting components (agents)]. Cybele provides support for event handling from multiple sources, multithreading, concurrency control, migration, and load balancing. A Cybele agent follows a programming paradigm, called activity-centric programming, that enables an abstraction over system-level thread mechanisms. Activity centric programming relieves application programmers of the complex tasks of thread management, concurrency control, and event management. In order to provide such functionality, activity-centric programming demands support of other layers of software. This concludes the background information. In the first phase of the present development, a new architecture for Cybele was defined. In this architecture, Cybele follows a modular service-based approach to coupling of the programming and service layers of software architecture. In a service-based approach, the functionalities supported by activity-centric programming are apportioned, according to their characteristics, among several groups called services. A well-defined interface among all such services serves as a path that facilitates the maintenance and enhancement of such services without adverse effect on the whole software framework. The activity-centric application-program interface (API) is part of a kernel. The kernel API calls the services by use of their published interface. This approach makes it possible for any application code written exclusively under the API to be portable for any configuration of Cybele.
Harmony search optimization for HDR prostate brachytherapy
NASA Astrophysics Data System (ADS)
Panchal, Aditya
In high dose-rate (HDR) prostate brachytherapy, multiple catheters are inserted interstitially into the target volume. The process of treating the prostate involves calculating and determining the best dose distribution to the target and organs-at-risk by means of optimizing the time that the radioactive source dwells at specified positions within the catheters. It is the goal of this work to investigate the use of a new optimization algorithm, known as Harmony Search, in order to optimize dwell times for HDR prostate brachytherapy. The new algorithm was tested on 9 different patients and also compared with the genetic algorithm. Simulations were performed to determine the optimal value of the Harmony Search parameters. Finally, multithreading of the simulation was examined to determine potential benefits. First, a simulation environment was created using the Python programming language and the wxPython graphical interface toolkit, which was necessary to run repeated optimizations. DICOM RT data from Varian BrachyVision was parsed and used to obtain patient anatomy and HDR catheter information. Once the structures were indexed, the volume of each structure was determined and compared to the original volume calculated in BrachyVision for validation. Dose was calculated using the AAPM TG-43 point source model of the GammaMed 192Ir HDR source and was validated against Varian BrachyVision. A DVH-based objective function was created and used for the optimization simulation. Harmony Search and the genetic algorithm were implemented as optimization algorithms for the simulation and were compared against each other. The optimal values for Harmony Search parameters (Harmony Memory Size [HMS], Harmony Memory Considering Rate [HMCR], and Pitch Adjusting Rate [PAR]) were also determined. Lastly, the simulation was modified to use multiple threads of execution in order to achieve faster computational times. Experimental results show that the volume calculation that was implemented in this thesis was within 2% of the values computed by Varian BrachyVision for the prostate, within 3% for the rectum and bladder and 6% for the urethra. The calculation of dose compared to BrachyVision was determined to be different by only 0.38%. Isodose curves were also generated and were found to be similar to BrachyVision. The comparison between Harmony Search and genetic algorithm showed that Harmony Search was over 4 times faster when compared over multiple data sets. The optimal Harmony Memory Size was found to be 5 or lower; the Harmony Memory Considering Rate was determined to be 0.95, and the Pitch Adjusting Rate was found to be 0.9. Ultimately, the effect of multithreading showed that as intensive computations such as optimization and dose calculation are involved, the threads of execution scale with the number of processors, achieving a speed increase proportional to the number of processor cores. In conclusion, this work showed that Harmony Search is a viable alternative to existing algorithms for use in HDR prostate brachytherapy optimization. Coupled with the optimal parameters for the algorithm and a multithreaded simulation, this combination has the capability to significantly decrease the time spent on minimizing optimization problems in the clinic that are time intensive, such as brachytherapy, IMRT and beam angle optimization.
Han, Min Cheol; Yeom, Yeon Soo; Lee, Hyun Su; Shin, Bangho; Kim, Chan Hyeong; Furuta, Takuya
2018-05-04
In this study, the multi-threading performance of the Geant4, MCNP6, and PHITS codes was evaluated as a function of the number of threads (N) and the complexity of the tetrahedral-mesh phantom. For this, three tetrahedral-mesh phantoms of varying complexity (simple, moderately complex, and highly complex) were prepared and implemented in the three different Monte Carlo codes, in photon and neutron transport simulations. Subsequently, for each case, the initialization time, calculation time, and memory usage were measured as a function of the number of threads used in the simulation. It was found that for all codes, the initialization time significantly increased with the complexity of the phantom, but not with the number of threads. Geant4 exhibited much longer initialization time than the other codes, especially for the complex phantom (MRCP). The improvement of computation speed due to the use of a multi-threaded code was calculated as the speed-up factor, the ratio of the computation speed on a multi-threaded code to the computation speed on a single-threaded code. Geant4 showed the best multi-threading performance among the codes considered in this study, with the speed-up factor almost linearly increasing with the number of threads, reaching ~30 when N = 40. PHITS and MCNP6 showed a much smaller increase of the speed-up factor with the number of threads. For PHITS, the speed-up factors were low when N = 40. For MCNP6, the increase of the speed-up factors was better, but they were still less than ~10 when N = 40. As for memory usage, Geant4 was found to use more memory than the other codes. In addition, compared to that of the other codes, the memory usage of Geant4 more rapidly increased with the number of threads, reaching as high as ~74 GB when N = 40 for the complex phantom (MRCP). It is notable that compared to that of the other codes, the memory usage of PHITS was much lower, regardless of both the complexity of the phantom and the number of threads, hardly increasing with the number of threads for the MRCP.
Distributed run of a one-dimensional model in a regional application using SOAP-based web services
NASA Astrophysics Data System (ADS)
Smiatek, Gerhard
This article describes the setup of a distributed computing system in Perl. It facilitates the parallel run of a one-dimensional environmental model on a number of simple network PC hosts. The system uses Simple Object Access Protocol (SOAP) driven web services offering the model run on remote hosts and a multi-thread environment distributing the work and accessing the web services. Its application is demonstrated in a regional run of a process-oriented biogenic emission model for the area of Germany. Within a network consisting of up to seven web services implemented on Linux and MS-Windows hosts, a performance increase of approximately 400% has been reached compared to a model run on the fastest single host.
NASA Astrophysics Data System (ADS)
Vnukov, A. A.; Shershnev, M. B.
2018-01-01
The aim of this work is the software implementation of three image scaling algorithms using parallel computations, as well as the development of an application with a graphical user interface for the Windows operating system to demonstrate the operation of algorithms and to study the relationship between system performance, algorithm execution time and the degree of parallelization of computations. Three methods of interpolation were studied, formalized and adapted to scale images. The result of the work is a program for scaling images by different methods. Comparison of the quality of scaling by different methods is given.
Two schemes for rapid generation of digital video holograms using PC cluster
NASA Astrophysics Data System (ADS)
Park, Hanhoon; Song, Joongseok; Kim, Changseob; Park, Jong-Il
2017-12-01
Computer-generated holography (CGH), which is a process of generating digital holograms, is computationally expensive. Recently, several methods/systems of parallelizing the process using graphic processing units (GPUs) have been proposed. Indeed, use of multiple GPUs or a personal computer (PC) cluster (each PC with GPUs) enabled great improvements in the process speed. However, extant literature has less often explored systems involving rapid generation of multiple digital holograms and specialized systems for rapid generation of a digital video hologram. This study proposes a system that uses a PC cluster and is able to more efficiently generate a video hologram. The proposed system is designed to simultaneously generate multiple frames and accelerate the generation by parallelizing the CGH computations across a number of frames, as opposed to separately generating each individual frame while parallelizing the CGH computations within each frame. The proposed system also enables the subprocesses for generating each frame to execute in parallel through multithreading. With these two schemes, the proposed system significantly reduced the data communication time for generating a digital hologram when compared with that of the state-of-the-art system.
Characterizing and Mitigating Work Time Inflation in Task Parallel Programs
Olivier, Stephen L.; de Supinski, Bronis R.; Schulz, Martin; ...
2013-01-01
Task parallelism raises the level of abstraction in shared memory parallel programming to simplify the development of complex applications. However, task parallel applications can exhibit poor performance due to thread idleness, scheduling overheads, and work time inflation – additional time spent by threads in a multithreaded computation beyond the time required to perform the same work in a sequential computation. We identify the contributions of each factor to lost efficiency in various task parallel OpenMP applications and diagnose the causes of work time inflation in those applications. Increased data access latency can cause significant work time inflation in NUMA systems.more » Our locality framework for task parallel OpenMP programs mitigates this cause of work time inflation. Our extensions to the Qthreads library demonstrate that locality-aware scheduling can improve performance up to 3X compared to the Intel OpenMP task scheduler.« less
Compiler-Directed File Layout Optimization for Hierarchical Storage Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ding, Wei; Zhang, Yuanrui; Kandemir, Mahmut
File layout of array data is a critical factor that effects the behavior of storage caches, and has so far taken not much attention in the context of hierarchical storage systems. The main contribution of this paper is a compiler-driven file layout optimization scheme for hierarchical storage caches. This approach, fully automated within an optimizing compiler, analyzes a multi-threaded application code and determines a file layout for each disk-resident array referenced by the code, such that the performance of the target storage cache hierarchy is maximized. We tested our approach using 16 I/O intensive application programs and compared its performancemore » against two previously proposed approaches under different cache space management schemes. Our experimental results show that the proposed approach improves the execution time of these parallel applications by 23.7% on average.« less
Compiler-Directed File Layout Optimization for Hierarchical Storage Systems
Ding, Wei; Zhang, Yuanrui; Kandemir, Mahmut; ...
2013-01-01
File layout of array data is a critical factor that effects the behavior of storage caches, and has so far taken not much attention in the context of hierarchical storage systems. The main contribution of this paper is a compiler-driven file layout optimization scheme for hierarchical storage caches. This approach, fully automated within an optimizing compiler, analyzes a multi-threaded application code and determines a file layout for each disk-resident array referenced by the code, such that the performance of the target storage cache hierarchy is maximized. We tested our approach using 16 I/O intensive application programs and compared its performancemore » against two previously proposed approaches under different cache space management schemes. Our experimental results show that the proposed approach improves the execution time of these parallel applications by 23.7% on average.« less
Reducing False Positives in Runtime Analysis of Deadlocks
NASA Technical Reports Server (NTRS)
Bensalem, Saddek; Havelund, Klaus; Clancy, Daniel (Technical Monitor)
2002-01-01
This paper presents an improvement of a standard algorithm for detecting dead-lock potentials in multi-threaded programs, in that it reduces the number of false positives. The standard algorithm works as follows. The multi-threaded program under observation is executed, while lock and unlock events are observed. A graph of locks is built, with edges between locks symbolizing locking orders. Any cycle in the graph signifies a potential for a deadlock. The typical standard example is the group of dining philosophers sharing forks. The algorithm is interesting because it can catch deadlock potentials even though no deadlocks occur in the examined trace, and at the same time it scales very well in contrast t o more formal approaches to deadlock detection. The algorithm, however, can yield false positives (as well as false negatives). The extension of the algorithm described in this paper reduces the amount of false positives for three particular cases: when a gate lock protects a cycle, when a single thread introduces a cycle, and when the code segments in different threads that cause the cycle can actually not execute in parallel. The paper formalizes a theory for dynamic deadlock detection and compares it to model checking and static analysis techniques. It furthermore describes an implementation for analyzing Java programs and its application to two case studies: a planetary rover and a space craft altitude control system.
Samant, Sanjiv S; Xia, Junyi; Muyan-Ozcelik, Pinar; Owens, John D
2008-08-01
The advent of readily available temporal imaging or time series volumetric (4D) imaging has become an indispensable component of treatment planning and adaptive radiotherapy (ART) at many radiotherapy centers. Deformable image registration (DIR) is also used in other areas of medical imaging, including motion corrected image reconstruction. Due to long computation time, clinical applications of DIR in radiation therapy and elsewhere have been limited and consequently relegated to offline analysis. With the recent advances in hardware and software, graphics processing unit (GPU) based computing is an emerging technology for general purpose computation, including DIR, and is suitable for highly parallelized computing. However, traditional general purpose computation on the GPU is limited because the constraints of the available programming platforms. As well, compared to CPU programming, the GPU currently has reduced dedicated processor memory, which can limit the useful working data set for parallelized processing. We present an implementation of the demons algorithm using the NVIDIA 8800 GTX GPU and the new CUDA programming language. The GPU performance will be compared with single threading and multithreading CPU implementations on an Intel dual core 2.4 GHz CPU using the C programming language. CUDA provides a C-like language programming interface, and allows for direct access to the highly parallel compute units in the GPU. Comparisons for volumetric clinical lung images acquired using 4DCT were carried out. Computation time for 100 iterations in the range of 1.8-13.5 s was observed for the GPU with image size ranging from 2.0 x 10(6) to 14.2 x 10(6) pixels. The GPU registration was 55-61 times faster than the CPU for the single threading implementation, and 34-39 times faster for the multithreading implementation. For CPU based computing, the computational time generally has a linear dependence on image size for medical imaging data. Computational efficiency is characterized in terms of time per megapixels per iteration (TPMI) with units of seconds per megapixels per iteration (or spmi). For the demons algorithm, our CPU implementation yielded largely invariant values of TPMI. The mean TPMIs were 0.527 spmi and 0.335 spmi for the single threading and multithreading cases, respectively, with <2% variation over the considered image data range. For GPU computing, we achieved TPMI =0.00916 spmi with 3.7% variation, indicating optimized memory handling under CUDA. The paradigm of GPU based real-time DIR opens up a host of clinical applications for medical imaging.
A VxD-based automatic blending system using multithreaded programming.
Wang, L; Jiang, X; Chen, Y; Tan, K C
2004-01-01
This paper discusses the object-oriented software design for an automatic blending system. By combining the advantages of a programmable logic controller (PLC) and an industrial control PC (ICPC), an automatic blending control system is developed for a chemical plant. The system structure and multithread-based communication approach are first presented in this paper. The overall software design issues, such as system requirements and functionalities, are then discussed in detail. Furthermore, by replacing the conventional dynamic link library (DLL) with virtual X device drivers (VxD's), a practical and cost-effective solution is provided to improve the robustness of the Windows platform-based automatic blending system in small- and medium-sized plants.
Real time display Fourier-domain OCT using multi-thread parallel computing with data vectorization
NASA Astrophysics Data System (ADS)
Eom, Tae Joong; Kim, Hoon Seop; Kim, Chul Min; Lee, Yeung Lak; Choi, Eun-Seo
2011-03-01
We demonstrate a real-time display of processed OCT images using multi-thread parallel computing with a quad-core CPU of a personal computer. The data of each A-line are treated as one vector to maximize the data translation rate between the cores of the CPU and RAM stored image data. A display rate of 29.9 frames/sec for processed OCT data (4096 FFT-size x 500 A-scans) is achieved in our system using a wavelength swept source with 52-kHz swept frequency. The data processing times of the OCT image and a Doppler OCT image with a 4-time average are 23.8 msec and 91.4 msec.
Evaluating architecture impact on system energy efficiency
Yu, Shijie; Wang, Rui; Luan, Zhongzhi; Qian, Depei
2017-01-01
As the energy consumption has been surging in an unsustainable way, it is important to understand the impact of existing architecture designs from energy efficiency perspective, which is especially valuable for High Performance Computing (HPC) and datacenter environment hosting tens of thousands of servers. One obstacle hindering the advance of comprehensive evaluation on energy efficiency is the deficient power measuring approach. Most of the energy study relies on either external power meters or power models, both of these two methods contain intrinsic drawbacks in their practical adoption and measuring accuracy. Fortunately, the advent of Intel Running Average Power Limit (RAPL) interfaces has promoted the power measurement ability into next level, with higher accuracy and finer time resolution. Therefore, we argue it is the exact time to conduct an in-depth evaluation of the existing architecture designs to understand their impact on system energy efficiency. In this paper, we leverage representative benchmark suites including serial and parallel workloads from diverse domains to evaluate the architecture features such as Non Uniform Memory Access (NUMA), Simultaneous Multithreading (SMT) and Turbo Boost. The energy is tracked at subcomponent level such as Central Processing Unit (CPU) cores, uncore components and Dynamic Random-Access Memory (DRAM) through exploiting the power measurement ability exposed by RAPL. The experiments reveal non-intuitive results: 1) the mismatch between local compute and remote memory node caused by NUMA effect not only generates dramatic power and energy surge but also deteriorates the energy efficiency significantly; 2) for multithreaded application such as the Princeton Application Repository for Shared-Memory Computers (PARSEC), most of the workloads benefit a notable increase of energy efficiency using SMT, with more than 40% decline in average power consumption; 3) Turbo Boost is effective to accelerate the workload execution and further preserve the energy, however it may not be applicable on system with tight power budget. PMID:29161317
Evaluating architecture impact on system energy efficiency.
Yu, Shijie; Yang, Hailong; Wang, Rui; Luan, Zhongzhi; Qian, Depei
2017-01-01
As the energy consumption has been surging in an unsustainable way, it is important to understand the impact of existing architecture designs from energy efficiency perspective, which is especially valuable for High Performance Computing (HPC) and datacenter environment hosting tens of thousands of servers. One obstacle hindering the advance of comprehensive evaluation on energy efficiency is the deficient power measuring approach. Most of the energy study relies on either external power meters or power models, both of these two methods contain intrinsic drawbacks in their practical adoption and measuring accuracy. Fortunately, the advent of Intel Running Average Power Limit (RAPL) interfaces has promoted the power measurement ability into next level, with higher accuracy and finer time resolution. Therefore, we argue it is the exact time to conduct an in-depth evaluation of the existing architecture designs to understand their impact on system energy efficiency. In this paper, we leverage representative benchmark suites including serial and parallel workloads from diverse domains to evaluate the architecture features such as Non Uniform Memory Access (NUMA), Simultaneous Multithreading (SMT) and Turbo Boost. The energy is tracked at subcomponent level such as Central Processing Unit (CPU) cores, uncore components and Dynamic Random-Access Memory (DRAM) through exploiting the power measurement ability exposed by RAPL. The experiments reveal non-intuitive results: 1) the mismatch between local compute and remote memory node caused by NUMA effect not only generates dramatic power and energy surge but also deteriorates the energy efficiency significantly; 2) for multithreaded application such as the Princeton Application Repository for Shared-Memory Computers (PARSEC), most of the workloads benefit a notable increase of energy efficiency using SMT, with more than 40% decline in average power consumption; 3) Turbo Boost is effective to accelerate the workload execution and further preserve the energy, however it may not be applicable on system with tight power budget.
A Scalable Nonuniform Pointer Analysis for Embedded Program
NASA Technical Reports Server (NTRS)
Venet, Arnaud
2004-01-01
In this paper we present a scalable pointer analysis for embedded applications that is able to distinguish between instances of recursively defined data structures and elements of arrays. The main contribution consists of an efficient yet precise algorithm that can handle multithreaded programs. We first perform an inexpensive flow-sensitive analysis of each function in the program that generates semantic equations describing the effect of the function on the memory graph. These equations bear numerical constraints that describe nonuniform points-to relationships. We then iteratively solve these equations in order to obtain an abstract storage graph that describes the shape of data structures at every point of the program for all possible thread interleavings. We bring experimental evidence that this approach is tractable and precise for real-size embedded applications.
On the Suitability of MPI as a PGAS Runtime
DOE Office of Scientific and Technical Information (OSTI.GOV)
Daily, Jeffrey A.; Vishnu, Abhinav; Palmer, Bruce J.
2014-12-18
Partitioned Global Address Space (PGAS) models are emerging as a popular alternative to MPI models for designing scalable applications. At the same time, MPI remains a ubiquitous communication subsystem due to its standardization, high performance, and availability on leading platforms. In this paper, we explore the suitability of using MPI as a scalable PGAS communication subsystem. We focus on the Remote Memory Access (RMA) communication in PGAS models which typically includes {\\em get, put,} and {\\em atomic memory operations}. We perform an in-depth exploration of design alternatives based on MPI. These alternatives include using a semantically-matching interface such as MPI-RMA,more » as well as not-so-intuitive interfaces such as MPI two-sided with a combination of multi-threading and dynamic process management. With an in-depth exploration of these alternatives and their shortcomings, we propose a novel design which is facilitated by the data-centric view in PGAS models. This design leverages a combination of highly tuned MPI two-sided semantics and an automatic, user-transparent split of MPI communicators to provide asynchronous progress. We implement the asynchronous progress ranks approach and other approaches within the Communication Runtime for Exascale which is a communication subsystem for Global Arrays. Our performance evaluation spans pure communication benchmarks, graph community detection and sparse matrix-vector multiplication kernels, and a computational chemistry application. The utility of our proposed PR-based approach is demonstrated by a 2.17x speed-up on 1008 processors over the other MPI-based designs.« less
Asynchronous Message Service Reference Implementation
NASA Technical Reports Server (NTRS)
Burleigh, Scott C.
2011-01-01
This software provides a library of middleware functions with a simple application programming interface, enabling implementation of distributed applications in conformance with the CCSDS AMS (Consultative Committee for Space Data Systems Asynchronous Message Service) specification. The AMS service, and its protocols, implement an architectural concept under which the modules of mission systems may be designed as if they were to operate in isolation, each one producing and consuming mission information without explicit awareness of which other modules are currently operating. Communication relationships among such modules are self-configuring; this tends to minimize complexity in the development and operations of modular data systems. A system built on this model is a society of generally autonomous, inter-operating modules that may fluctuate freely over time in response to changing mission objectives, modules functional upgrades, and recovery from individual module failure. The purpose of AMS, then, is to reduce mission cost and risk by providing standard, reusable infrastructure for the exchange of information among data system modules in a manner that is simple to use, highly automated, flexible, robust, scalable, and efficient. The implementation is designed to spawn multiple threads of AMS functionality under the control of an AMS application program. These threads enable all members of an AMS-based, distributed application to discover one another in real time, subscribe to messages on specific topics, and to publish messages on specific topics. The query/reply (client/server) communication model is also supported. Message exchange is optionally subject to encryption (to support confidentiality) and authorization. Fault tolerance measures in the discovery protocol minimize the likelihood of overall application failure due to any single operational error anywhere in the system. The multi-threaded design simplifies processing while enabling application nodes to operate at high speeds; linked lists protected by mutex semaphores and condition variables are used for efficient, inter-thread communication. Applications may use a variety of transport protocols underlying AMS itself, including TCP (Transmission Control Protocol), UDP (User Datagram Protocol), and message queues.
pWeb: A High-Performance, Parallel-Computing Framework for Web-Browser-Based Medical Simulation.
Halic, Tansel; Ahn, Woojin; De, Suvranu
2014-01-01
This work presents a pWeb - a new language and compiler for parallelization of client-side compute intensive web applications such as surgical simulations. The recently introduced HTML5 standard has enabled creating unprecedented applications on the web. Low performance of the web browser, however, remains the bottleneck of computationally intensive applications including visualization of complex scenes, real time physical simulations and image processing compared to native ones. The new proposed language is built upon web workers for multithreaded programming in HTML5. The language provides fundamental functionalities of parallel programming languages as well as the fork/join parallel model which is not supported by web workers. The language compiler automatically generates an equivalent parallel script that complies with the HTML5 standard. A case study on realistic rendering for surgical simulations demonstrates enhanced performance with a compact set of instructions.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Messer, Bronson; Harris, James A; Parete-Koon, Suzanne T
We describe recent development work on the core-collapse supernova code CHIMERA. CHIMERA has consumed more than 100 million cpu-hours on Oak Ridge Leadership Computing Facility (OLCF) platforms in the past 3 years, ranking it among the most important applications at the OLCF. Most of the work described has been focused on exploiting the multicore nature of the current platform (Jaguar) via, e.g., multithreading using OpenMP. In addition, we have begun a major effort to marshal the computational power of GPUs with CHIMERA. The impending upgrade of Jaguar to Titan a 20+ PF machine with an NVIDIA GPU on many nodesmore » makes this work essential.« less
Accelerating next generation sequencing data analysis with system level optimizations.
Kathiresan, Nagarajan; Temanni, Ramzi; Almabrazi, Hakeem; Syed, Najeeb; Jithesh, Puthen V; Al-Ali, Rashid
2017-08-22
Next generation sequencing (NGS) data analysis is highly compute intensive. In-memory computing, vectorization, bulk data transfer, CPU frequency scaling are some of the hardware features in the modern computing architectures. To get the best execution time and utilize these hardware features, it is necessary to tune the system level parameters before running the application. We studied the GATK-HaplotypeCaller which is part of common NGS workflows, that consume more than 43% of the total execution time. Multiple GATK 3.x versions were benchmarked and the execution time of HaplotypeCaller was optimized by various system level parameters which included: (i) tuning the parallel garbage collection and kernel shared memory to simulate in-memory computing, (ii) architecture-specific tuning in the PairHMM library for vectorization, (iii) including Java 1.8 features through GATK source code compilation and building a runtime environment for parallel sorting and bulk data transfer (iv) the default 'on-demand' mode of CPU frequency is over-clocked by using 'performance-mode' to accelerate the Java multi-threads. As a result, the HaplotypeCaller execution time was reduced by 82.66% in GATK 3.3 and 42.61% in GATK 3.7. Overall, the execution time of NGS pipeline was reduced to 70.60% and 34.14% for GATK 3.3 and GATK 3.7 respectively.
Multicore Architecture-aware Scientific Applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Srinivasa, Avinash
Modern high performance systems are becoming increasingly complex and powerful due to advancements in processor and memory architecture. In order to keep up with this increasing complexity, applications have to be augmented with certain capabilities to fully exploit such systems. These may be at the application level, such as static or dynamic adaptations or at the system level, like having strategies in place to override some of the default operating system polices, the main objective being to improve computational performance of the application. The current work proposes two such capabilites with respect to multi-threaded scientific applications, in particular a largemore » scale physics application computing ab-initio nuclear structure. The first involves using a middleware tool to invoke dynamic adaptations in the application, so as to be able to adjust to the changing computational resource availability at run-time. The second involves a strategy for effective placement of data in main memory, to optimize memory access latencies and bandwidth. These capabilties when included were found to have a significant impact on the application performance, resulting in average speedups of as much as two to four times.« less
NASA Technical Reports Server (NTRS)
Schunk, Richard Gregory; Chung, T. J.
2001-01-01
A parallelized version of the Flowfield Dependent Variation (FDV) Method is developed to analyze a problem of current research interest, the flowfield resulting from a triple shock/boundary layer interaction. Such flowfields are often encountered in the inlets of high speed air-breathing vehicles including the NASA Hyper-X research vehicle. In order to resolve the complex shock structure and to provide adequate resolution for boundary layer computations of the convective heat transfer from surfaces inside the inlet, models containing over 500,000 nodes are needed. Efficient parallelization of the computation is essential to achieving results in a timely manner. Results from a parallelization scheme, based upon multi-threading, as implemented on multiple processor supercomputers and workstations is presented.
Clinical image processing engine
NASA Astrophysics Data System (ADS)
Han, Wei; Yao, Jianhua; Chen, Jeremy; Summers, Ronald
2009-02-01
Our group provides clinical image processing services to various institutes at NIH. We develop or adapt image processing programs for a variety of applications. However, each program requires a human operator to select a specific set of images and execute the program, as well as store the results appropriately for later use. To improve efficiency, we design a parallelized clinical image processing engine (CIPE) to streamline and parallelize our service. The engine takes DICOM images from a PACS server, sorts and distributes the images to different applications, multithreads the execution of applications, and collects results from the applications. The engine consists of four modules: a listener, a router, a job manager and a data manager. A template filter in XML format is defined to specify the image specification for each application. A MySQL database is created to store and manage the incoming DICOM images and application results. The engine achieves two important goals: reduce the amount of time and manpower required to process medical images, and reduce the turnaround time for responding. We tested our engine on three different applications with 12 datasets and demonstrated that the engine improved the efficiency dramatically.
Framework for Development of Object-Oriented Software
NASA Technical Reports Server (NTRS)
Perez-Poveda, Gus; Ciavarella, Tony; Nieten, Dan
2004-01-01
The Real-Time Control (RTC) Application Framework is a high-level software framework written in C++ that supports the rapid design and implementation of object-oriented application programs. This framework provides built-in functionality that solves common software development problems within distributed client-server, multi-threaded, and embedded programming environments. When using the RTC Framework to develop software for a specific domain, designers and implementers can focus entirely on the details of the domain-specific software rather than on creating custom solutions, utilities, and frameworks for the complexities of the programming environment. The RTC Framework was originally developed as part of a Space Shuttle Launch Processing System (LPS) replacement project called Checkout and Launch Control System (CLCS). As a result of the framework s development, CLCS software development time was reduced by 66 percent. The framework is generic enough for developing applications outside of the launch-processing system domain. Other applicable high-level domains include command and control systems and simulation/ training systems.
The Tera Multithreaded Architecture and Unstructured Meshes
NASA Technical Reports Server (NTRS)
Bokhari, Shahid H.; Mavriplis, Dimitri J.
1998-01-01
The Tera Multithreaded Architecture (MTA) is a new parallel supercomputer currently being installed at San Diego Supercomputing Center (SDSC). This machine has an architecture quite different from contemporary parallel machines. The computational processor is a custom design and the machine uses hardware to support very fine grained multithreading. The main memory is shared, hardware randomized and flat. These features make the machine highly suited to the execution of unstructured mesh problems, which are difficult to parallelize on other architectures. We report the results of a study carried out during July-August 1998 to evaluate the execution of EUL3D, a code that solves the Euler equations on an unstructured mesh, on the 2 processor Tera MTA at SDSC. Our investigation shows that parallelization of an unstructured code is extremely easy on the Tera. We were able to get an existing parallel code (designed for a shared memory machine), running on the Tera by changing only the compiler directives. Furthermore, a serial version of this code was compiled to run in parallel on the Tera by judicious use of directives to invoke the "full/empty" tag bits of the machine to obtain synchronization. This version achieves 212 and 406 Mflop/s on one and two processors respectively, and requires no attention to partitioning or placement of data issues that would be of paramount importance in other parallel architectures.
Parallel mutual information estimation for inferring gene regulatory networks on GPUs
2011-01-01
Background Mutual information is a measure of similarity between two variables. It has been widely used in various application domains including computational biology, machine learning, statistics, image processing, and financial computing. Previously used simple histogram based mutual information estimators lack the precision in quality compared to kernel based methods. The recently introduced B-spline function based mutual information estimation method is competitive to the kernel based methods in terms of quality but at a lower computational complexity. Results We present a new approach to accelerate the B-spline function based mutual information estimation algorithm with commodity graphics hardware. To derive an efficient mapping onto this type of architecture, we have used the Compute Unified Device Architecture (CUDA) programming model to design and implement a new parallel algorithm. Our implementation, called CUDA-MI, can achieve speedups of up to 82 using double precision on a single GPU compared to a multi-threaded implementation on a quad-core CPU for large microarray datasets. We have used the results obtained by CUDA-MI to infer gene regulatory networks (GRNs) from microarray data. The comparisons to existing methods including ARACNE and TINGe show that CUDA-MI produces GRNs of higher quality in less time. Conclusions CUDA-MI is publicly available open-source software, written in CUDA and C++ programming languages. It obtains significant speedup over sequential multi-threaded implementation by fully exploiting the compute capability of commonly used CUDA-enabled low-cost GPUs. PMID:21672264
RVC-CAL library for endmember and abundance estimation in hyperspectral image analysis
NASA Astrophysics Data System (ADS)
Lazcano López, R.; Madroñal Quintín, D.; Juárez Martínez, E.; Sanz Álvaro, C.
2015-10-01
Hyperspectral imaging (HI) collects information from across the electromagnetic spectrum, covering a wide range of wavelengths. Although this technology was initially developed for remote sensing and earth observation, its multiple advantages - such as high spectral resolution - led to its application in other fields, as cancer detection. However, this new field has shown specific requirements; for instance, it needs to accomplish strong time specifications, since all the potential applications - like surgical guidance or in vivo tumor detection - imply real-time requisites. Achieving this time requirements is a great challenge, as hyperspectral images generate extremely high volumes of data to process. Thus, some new research lines are studying new processing techniques, and the most relevant ones are related to system parallelization. In that line, this paper describes the construction of a new hyperspectral processing library for RVC-CAL language, which is specifically designed for multimedia applications and allows multithreading compilation and system parallelization. This paper presents the development of the required library functions to implement two of the four stages of the hyperspectral imaging processing chain--endmember and abundances estimation. The results obtained show that the library achieves speedups of 30%, approximately, comparing to an existing software of hyperspectral images analysis; concretely, the endmember estimation step reaches an average speedup of 27.6%, which saves almost 8 seconds in the execution time. It also shows the existence of some bottlenecks, as the communication interfaces among the different actors due to the volume of data to transfer. Finally, it is shown that the library considerably simplifies the implementation process. Thus, experimental results show the potential of a RVC-CAL library for analyzing hyperspectral images in real-time, as it provides enough resources to study the system performance.
Servicing a globally broadcast interrupt signal in a multi-threaded computer
Attinella, John E.; Davis, Kristan D.; Musselman, Roy G.; Satterfield, David L.
2015-12-29
Methods, apparatuses, and computer program products for servicing a globally broadcast interrupt signal in a multi-threaded computer comprising a plurality of processor threads. Embodiments include an interrupt controller indicating in a plurality of local interrupt status locations that a globally broadcast interrupt signal has been received by the interrupt controller. Embodiments also include a thread determining that a local interrupt status location corresponding to the thread indicates that the globally broadcast interrupt signal has been received by the interrupt controller. Embodiments also include the thread processing one or more entries in a global interrupt status bit queue based on whether global interrupt status bits associated with the globally broadcast interrupt signal are locked. Each entry in the global interrupt status bit queue corresponds to a queued global interrupt.
Ellingwood, Nathan D; Yin, Youbing; Smith, Matthew; Lin, Ching-Long
2016-04-01
Faster and more accurate methods for registration of images are important for research involved in conducting population-based studies that utilize medical imaging, as well as improvements for use in clinical applications. We present a novel computation- and memory-efficient multi-level method on graphics processing units (GPU) for performing registration of two computed tomography (CT) volumetric lung images. We developed a computation- and memory-efficient Diffeomorphic Multi-level B-Spline Transform Composite (DMTC) method to implement nonrigid mass-preserving registration of two CT lung images on GPU. The framework consists of a hierarchy of B-Spline control grids of increasing resolution. A similarity criterion known as the sum of squared tissue volume difference (SSTVD) was adopted to preserve lung tissue mass. The use of SSTVD consists of the calculation of the tissue volume, the Jacobian, and their derivatives, which makes its implementation on GPU challenging due to memory constraints. The use of the DMTC method enabled reduced computation and memory storage of variables with minimal communication between GPU and Central Processing Unit (CPU) due to ability to pre-compute values. The method was assessed on six healthy human subjects. Resultant GPU-generated displacement fields were compared against the previously validated CPU counterpart fields, showing good agreement with an average normalized root mean square error (nRMS) of 0.044±0.015. Runtime and performance speedup are compared between single-threaded CPU, multi-threaded CPU, and GPU algorithms. Best performance speedup occurs at the highest resolution in the GPU implementation for the SSTVD cost and cost gradient computations, with a speedup of 112 times that of the single-threaded CPU version and 11 times over the twelve-threaded version when considering average time per iteration using a Nvidia Tesla K20X GPU. The proposed GPU-based DMTC method outperforms its multi-threaded CPU version in terms of runtime. Total registration time reduced runtime to 2.9min on the GPU version, compared to 12.8min on twelve-threaded CPU version and 112.5min on a single-threaded CPU. Furthermore, the GPU implementation discussed in this work can be adapted for use of other cost functions that require calculation of the first derivatives. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Specifications and implementation of the RT MHD control system for the EC launcher of FTU
NASA Astrophysics Data System (ADS)
Galperti, C.; Alessi, E.; Boncagni, L.; Bruschi, A.; Granucci, G.; Grosso, A.; Iannone, F.; Marchetto, C.; Nowak, S.; Panella, M.; Sozzi, C.; Tilia, B.
2012-09-01
To perform real time plasma control experiments using EC heating waves by using the new fast launcher installed on FTU a dedicated data acquisition and elaboration system has been designed recently. A prototypical version of the acquisition/control system has been recently developed and will be tested on FTU machine in its next experimental campaign. The open-source framework MARTe (Multi-threaded Application Real-Time executor) on Linux/RTAI real-time operating system has been chosen as software platform to realize the control system. Standard open-architecture industrial PCs, based either on VME bus and CompactPCI bus equipped with standard input/output cards are the chosen hardware platform.
MIEC-SVM: automated pipeline for protein peptide/ligand interaction prediction.
Li, Nan; Ainsworth, Richard I; Wu, Meixin; Ding, Bo; Wang, Wei
2016-03-15
MIEC-SVM is a structure-based method for predicting protein recognition specificity. Here, we present an automated MIEC-SVM pipeline providing an integrated and user-friendly workflow for construction and application of the MIEC-SVM models. This pipeline can handle standard amino acids and those with post-translational modifications (PTMs) or small molecules. Moreover, multi-threading and support to Sun Grid Engine (SGE) are implemented to significantly boost the computational efficiency. The program is available at http://wanglab.ucsd.edu/MIEC-SVM CONTACT: : wei-wang@ucsd.edu Supplementary data available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Accelerated numerical processing of electronically recorded holograms with reduced speckle noise.
Trujillo, Carlos; Garcia-Sucerquia, Jorge
2013-09-01
The numerical reconstruction of digitally recorded holograms suffers from speckle noise. An accelerated method that uses general-purpose computing in graphics processing units to reduce that noise is shown. The proposed methodology utilizes parallelized algorithms to record, reconstruct, and superimpose multiple uncorrelated holograms of a static scene. For the best tradeoff between reduction of the speckle noise and processing time, the method records, reconstructs, and superimposes six holograms of 1024 × 1024 pixels in 68 ms; for this case, the methodology reduces the speckle noise by 58% compared with that exhibited by a single hologram. The fully parallelized method running on a commodity graphics processing unit is one order of magnitude faster than the same technique implemented on a regular CPU using its multithreading capabilities. Experimental results are shown to validate the proposal.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aliaga, José I., E-mail: aliaga@uji.es; Alonso, Pedro; Badía, José M.
We introduce a new iterative Krylov subspace-based eigensolver for the simulation of macromolecular motions on desktop multithreaded platforms equipped with multicore processors and, possibly, a graphics accelerator (GPU). The method consists of two stages, with the original problem first reduced into a simpler band-structured form by means of a high-performance compute-intensive procedure. This is followed by a memory-intensive but low-cost Krylov iteration, which is off-loaded to be computed on the GPU by means of an efficient data-parallel kernel. The experimental results reveal the performance of the new eigensolver. Concretely, when applied to the simulation of macromolecules with a few thousandsmore » degrees of freedom and the number of eigenpairs to be computed is small to moderate, the new solver outperforms other methods implemented as part of high-performance numerical linear algebra packages for multithreaded architectures.« less
NavP: Structured and Multithreaded Distributed Parallel Programming
NASA Technical Reports Server (NTRS)
Pan, Lei
2007-01-01
We present Navigational Programming (NavP) -- a distributed parallel programming methodology based on the principles of migrating computations and multithreading. The four major steps of NavP are: (1) Distribute the data using the data communication pattern in a given algorithm; (2) Insert navigational commands for the computation to migrate and follow large-sized distributed data; (3) Cut the sequential migrating thread and construct a mobile pipeline; and (4) Loop back for refinement. NavP is significantly different from the current prevailing Message Passing (MP) approach. The advantages of NavP include: (1) NavP is structured distributed programming and it does not change the code structure of an original algorithm. This is in sharp contrast to MP as MP implementations in general do not resemble the original sequential code; (2) NavP implementations are always competitive with the best MPI implementations in terms of performance. Approaches such as DSM or HPF have failed to deliver satisfying performance as of today in contrast, even if they are relatively easy to use compared to MP; (3) NavP provides incremental parallelization, which is beyond the reach of MP; and (4) NavP is a unifying approach that allows us to exploit both fine- (multithreading on shared memory) and coarse- (pipelined tasks on distributed memory) grained parallelism. This is in contrast to the currently popular hybrid use of MP+OpenMP, which is known to be complex to use. We present experimental results that demonstrate the effectiveness of NavP.
Airbreathing Propulsion System Analysis Using Multithreaded Parallel Processing
NASA Technical Reports Server (NTRS)
Schunk, Richard Gregory; Chung, T. J.; Rodriguez, Pete (Technical Monitor)
2000-01-01
In this paper, parallel processing is used to analyze the mixing, and combustion behavior of hypersonic flow. Preliminary work for a sonic transverse hydrogen jet injected from a slot into a Mach 4 airstream in a two-dimensional duct combustor has been completed [Moon and Chung, 1996]. Our aim is to extend this work to three-dimensional domain using multithreaded domain decomposition parallel processing based on the flowfield-dependent variation theory. Numerical simulations of chemically reacting flows are difficult because of the strong interactions between the turbulent hydrodynamic and chemical processes. The algorithm must provide an accurate representation of the flowfield, since unphysical flowfield calculations will lead to the faulty loss or creation of species mass fraction, or even premature ignition, which in turn alters the flowfield information. Another difficulty arises from the disparity in time scales between the flowfield and chemical reactions, which may require the use of finite rate chemistry. The situations are more complex when there is a disparity in length scales involved in turbulence. In order to cope with these complicated physical phenomena, it is our plan to utilize the flowfield-dependent variation theory mentioned above, facilitated by large eddy simulation. Undoubtedly, the proposed computation requires the most sophisticated computational strategies. The multithreaded domain decomposition parallel processing will be necessary in order to reduce both computational time and storage. Without special treatments involved in computer engineering, our attempt to analyze the airbreathing combustion appears to be difficult, if not impossible.
Model Checking JAVA Programs Using Java Pathfinder
NASA Technical Reports Server (NTRS)
Havelund, Klaus; Pressburger, Thomas
2000-01-01
This paper describes a translator called JAVA PATHFINDER from JAVA to PROMELA, the "programming language" of the SPIN model checker. The purpose is to establish a framework for verification and debugging of JAVA programs based on model checking. This work should be seen in a broader attempt to make formal methods applicable "in the loop" of programming within NASA's areas such as space, aviation, and robotics. Our main goal is to create automated formal methods such that programmers themselves can apply these in their daily work (in the loop) without the need for specialists to manually reformulate a program into a different notation in order to analyze the program. This work is a continuation of an effort to formally verify, using SPIN, a multi-threaded operating system programmed in Lisp for the Deep-Space 1 spacecraft, and of previous work in applying existing model checkers and theorem provers to real applications.
Study of Thread Level Parallelism in a Video Encoding Application for Chip Multiprocessor Design
NASA Astrophysics Data System (ADS)
Debes, Eric; Kaine, Greg
2002-11-01
In media applications there is a high level of available thread level parallelism (TLP). In this paper we study the intra TLP in a video encoder. We show that a well-distributed highly optimized encoder running on a symmetric multiprocessor (SMP) system can run 3.2 faster on a 4-way SMP machine than on a single processor. The multithreaded encoder running on an SMP system is then used to understand the requirements of a chip multiprocessor (CMP) architecture, which is one possible architectural direction to better exploit TLP. In the framework of this study, we use a software approach to evaluate the dataflow between processors for the video encoder running on an SMP system. An estimation of the dataflow is done with L2 cache miss event counters using Intel® VTuneTM performance analyzer. The experimental measurements are compared to theoretical results.
WISARD: workbench for integrated superfast association studies for related datasets.
Lee, Sungyoung; Choi, Sungkyoung; Qiao, Dandi; Cho, Michael; Silverman, Edwin K; Park, Taesung; Won, Sungho
2018-04-20
A Mendelian transmission produces phenotypic and genetic relatedness between family members, giving family-based analytical methods an important role in genetic epidemiological studies-from heritability estimations to genetic association analyses. With the advance in genotyping technologies, whole-genome sequence data can be utilized for genetic epidemiological studies, and family-based samples may become more useful for detecting de novo mutations. However, genetic analyses employing family-based samples usually suffer from the complexity of the computational/statistical algorithms, and certain types of family designs, such as incorporating data from extended families, have rarely been used. We present a Workbench for Integrated Superfast Association studies for Related Data (WISARD) programmed in C/C++. WISARD enables the fast and a comprehensive analysis of SNP-chip and next-generation sequencing data on extended families, with applications from designing genetic studies to summarizing analysis results. In addition, WISARD can automatically be run in a fully multithreaded manner, and the integration of R software for visualization makes it more accessible to non-experts. Comparison with existing toolsets showed that WISARD is computationally suitable for integrated analysis of related subjects, and demonstrated that WISARD outperforms existing toolsets. WISARD has also been successfully utilized to analyze the large-scale massive sequencing dataset of chronic obstructive pulmonary disease data (COPD), and we identified multiple genes associated with COPD, which demonstrates its practical value.
NASA Astrophysics Data System (ADS)
Stone, Christopher P.; Alferman, Andrew T.; Niemeyer, Kyle E.
2018-05-01
Accurate and efficient methods for solving stiff ordinary differential equations (ODEs) are a critical component of turbulent combustion simulations with finite-rate chemistry. The ODEs governing the chemical kinetics at each mesh point are decoupled by operator-splitting allowing each to be solved concurrently. An efficient ODE solver must then take into account the available thread and instruction-level parallelism of the underlying hardware, especially on many-core coprocessors, as well as the numerical efficiency. A stiff Rosenbrock and a nonstiff Runge-Kutta ODE solver are both implemented using the single instruction, multiple thread (SIMT) and single instruction, multiple data (SIMD) paradigms within OpenCL. Both methods solve multiple ODEs concurrently within the same instruction stream. The performance of these parallel implementations was measured on three chemical kinetic models of increasing size across several multicore and many-core platforms. Two separate benchmarks were conducted to clearly determine any performance advantage offered by either method. The first benchmark measured the run-time of evaluating the right-hand-side source terms in parallel and the second benchmark integrated a series of constant-pressure, homogeneous reactors using the Rosenbrock and Runge-Kutta solvers. The right-hand-side evaluations with SIMD parallelism on the host multicore Xeon CPU and many-core Xeon Phi co-processor performed approximately three times faster than the baseline multithreaded C++ code. The SIMT parallel model on the host and Phi was 13%-35% slower than the baseline while the SIMT model on the NVIDIA Kepler GPU provided approximately the same performance as the SIMD model on the Phi. The runtimes for both ODE solvers decreased significantly with the SIMD implementations on the host CPU (2.5-2.7 ×) and Xeon Phi coprocessor (4.7-4.9 ×) compared to the baseline parallel code. The SIMT implementations on the GPU ran 1.5-1.6 times faster than the baseline multithreaded CPU code; however, this was significantly slower than the SIMD versions on the host CPU or the Xeon Phi. The performance difference between the three platforms was attributed to thread divergence caused by the adaptive step-sizes within the ODE integrators. Analysis showed that the wider vector width of the GPU incurs a higher level of divergence than the narrower Sandy Bridge or Xeon Phi. The significant performance improvement provided by the SIMD parallel strategy motivates further research into more ODE solver methods that are both SIMD-friendly and computationally efficient.
NASA Astrophysics Data System (ADS)
Hauth, T.; Innocente and, V.; Piparo, D.
2012-12-01
The processing of data acquired by the CMS detector at LHC is carried out with an object-oriented C++ software framework: CMSSW. With the increasing luminosity delivered by the LHC, the treatment of recorded data requires extraordinary large computing resources, also in terms of CPU usage. A possible solution to cope with this task is the exploitation of the features offered by the latest microprocessor architectures. Modern CPUs present several vector units, the capacity of which is growing steadily with the introduction of new processor generations. Moreover, an increasing number of cores per die is offered by the main vendors, even on consumer hardware. Most recent C++ compilers provide facilities to take advantage of such innovations, either by explicit statements in the programs sources or automatically adapting the generated machine instructions to the available hardware, without the need of modifying the existing code base. Programming techniques to implement reconstruction algorithms and optimised data structures are presented, that aim to scalable vectorization and parallelization of the calculations. One of their features is the usage of new language features of the C++11 standard. Portions of the CMSSW framework are illustrated which have been found to be especially profitable for the application of vectorization and multi-threading techniques. Specific utility components have been developed to help vectorization and parallelization. They can easily become part of a larger common library. To conclude, careful measurements are described, which show the execution speedups achieved via vectorised and multi-threaded code in the context of CMSSW.
SDR implementation of the receiver of adaptive communication system
NASA Astrophysics Data System (ADS)
Skarzynski, Jacek; Darmetko, Marcin; Kozlowski, Sebastian; Kurek, Krzysztof
2016-04-01
The paper presents software implementation of a receiver forming a part of an adaptive communication system. The system is intended for communication with a satellite placed in a low Earth orbit (LEO). The ability of adaptation is believed to increase the total amount of data transmitted from the satellite to the ground station. Depending on the signal-to-noise ratio (SNR) of the received signal, adaptive transmission is realized using different transmission modes, i.e., different modulation schemes (BPSK, QPSK, 8-PSK, and 16-APSK) and different convolutional code rates (1/2, 2/3, 3/4, 5/6, and 7/8). The receiver consists of a software-defined radio (SDR) module (National Instruments USRP-2920) and a multithread reception software running on Windows operating system. In order to increase the speed of signal processing, the software takes advantage of single instruction multiple data instructions supported by x86 processor architecture.
Kernel optimization for short-range molecular dynamics
NASA Astrophysics Data System (ADS)
Hu, Changjun; Wang, Xianmeng; Li, Jianjiang; He, Xinfu; Li, Shigang; Feng, Yangde; Yang, Shaofeng; Bai, He
2017-02-01
To optimize short-range force computations in Molecular Dynamics (MD) simulations, multi-threading and SIMD optimizations are presented in this paper. With respect to multi-threading optimization, a Partition-and-Separate-Calculation (PSC) method is designed to avoid write conflicts caused by using Newton's third law. Serial bottlenecks are eliminated with no additional memory usage. The method is implemented by using the OpenMP model. Furthermore, the PSC method is employed on Intel Xeon Phi coprocessors in both native and offload models. We also evaluate the performance of the PSC method under different thread affinities on the MIC architecture. In the SIMD execution, we explain the performance influence in the PSC method, considering the "if-clause" of the cutoff radius check. The experiment results show that our PSC method is relatively more efficient compared to some traditional methods. In double precision, our 256-bit SIMD implementation is about 3 times faster than the scalar version.
NASA Technical Reports Server (NTRS)
Havelund, Klaus
1999-01-01
The JAVA PATHFINDER, JPF, is a translator from a subset of JAVA 1.0 to PROMELA, the programming language of the SPIN model checker. The purpose of JPF is to establish a framework for verification and debugging of JAVA programming based on model checking. The main goal is to automate program verification such that a programmer can apply it in the daily work without the need for a specialist to manually reformulate a program into a different notation in order to analyze the program. The system is especially suited for analyzing multi-threaded JAVA applications, where normal testing usually falls short. The system can find deadlocks and violations of boolean assertions stated by the programmer in a special assertion language. This document explains how to Use JPF.
Allison, J.; Amako, K.; Apostolakis, J.; ...
2016-07-01
Geant4 is a software toolkit for the simulation of the passage of particles through matter. It is used by a large number of experiments and projects in a variety of application domains, including high energy physics, astrophysics and space science, medical physics and radiation protection. Over the past several years, major changes have been made to the toolkit in order to accommodate the needs of these user communities, and to efficiently exploit the growth of computing power made available by advances in technology. In conclusion, the adaptation of Geant4 to multithreading, advances in physics, detector modeling and visualization, extensions tomore » the toolkit, including biasing and reverse Monte Carlo, and tools for physics and release validation are discussed here.« less
Contention Modeling for Multithreaded Distributed Shared Memory Machines: The Cray XMT
DOE Office of Scientific and Technical Information (OSTI.GOV)
Secchi, Simone; Tumeo, Antonino; Villa, Oreste
Distributed Shared Memory (DSM) machines are a wide class of multi-processor computing systems where a large virtually-shared address space is mapped on a network of physically distributed memories. High memory latency and network contention are two of the main factors that limit performance scaling of such architectures. Modern high-performance computing DSM systems have evolved toward exploitation of massive hardware multi-threading and fine-grained memory hashing to tolerate irregular latencies, avoid network hot-spots and enable high scaling. In order to model the performance of such large-scale machines, parallel simulation has been proved to be a promising approach to achieve good accuracy inmore » reasonable times. One of the most critical factors in solving the simulation speed-accuracy trade-off is network modeling. The Cray XMT is a massively multi-threaded supercomputing architecture that belongs to the DSM class, since it implements a globally-shared address space abstraction on top of a physically distributed memory substrate. In this paper, we discuss the development of a contention-aware network model intended to be integrated in a full-system XMT simulator. We start by measuring the effects of network contention in a 128-processor XMT machine and then investigate the trade-off that exists between simulation accuracy and speed, by comparing three network models which operate at different levels of accuracy. The comparison and model validation is performed by executing a string-matching algorithm on the full-system simulator and on the XMT, using three datasets that generate noticeably different contention patterns.« less
Hydraulic conditions of flood flows in a Polish Carpathian river subjected to variable human impacts
NASA Astrophysics Data System (ADS)
Radecki-Pawlik, Artur; Czech, Wiktoria; Wyżga, Bartłomiej; Mikuś, Paweł; Zawiejska, Joanna; Ruiz-Villanueva, Virginia
2016-04-01
Channel morphology of the Czarny Dunajec River, Polish Carpathians, has been considerably modified as a result of channelization and gravel-mining induced channel incision, and now it varies from a single-thread, incised or regulated channel to an unmanaged, multi-thread channel. We investigated effects of these distinct channel morphologies on the conditions for flood flows in a study of 25 cross-sections from the middle river course where the Czarny Dunajec receives no significant tributaries and flood discharges increase little in the downstream direction. Cross-sectional morphology, channel slope and roughness of particular cross-section parts were used as input data for the hydraulic modelling performed with the 1D steady-flow HEC-RAS model for discharges with recurrence interval from 1.5 to 50 years. The model for each cross-section was calibrated with the water level of a 20-year flood from May 2014, determined shortly after the flood on the basis of high-water marks. Results indicated that incised and channelized river reaches are typified by similar flow widths and cross-sectional flow areas, which are substantially smaller than those in the multi-thread reach. However, because of steeper channel slope in the incised reach than in the channelized reach, the three river reaches differ in unit stream power and bed shear stress, which attain the highest values in the incised reach, intermediate values in the channelized reach, and the lowest ones in the multi-thread reach. These patterns of flow power and hydraulic forces are reflected in significant differences in river competence between the three river reaches. Since the introduction of the channelization scheme 30 years ago, sedimentation has reduced its initial flow conveyance by more than half and elevated water stages at given flood discharges by about 0.5-0.7 m. This partly reflects a progressive growth of natural levees along artificially stabilized channel banks. By contrast, sediments of natural levees deposited along the multi-thread channel and subsequently eroded in the course of lateral channel migration and floodplain reworking; as a result, they do not reduce the conveyance of floodplain flows in this reach. This study was performed within the scope of the Research Project DEC-2013/09/B/ST10/00056 financed by the National Science Centre of Poland.
History of river regulation of the Noce River (NE Italy) and related bio-morphodynamic responses
NASA Astrophysics Data System (ADS)
Serlet, Alyssa; Scorpio, Vittoria; Mastronunzio, Marco; Proto, Matteo; Zen, Simone; Zolezzi, Guido; Bertoldi, Walter; Comiti, Francesco; Prà, Elena Dai; Surian, Nicola; Gurnell, Angela
2016-04-01
The Noce River is a hydropower-regulated Alpine stream in Northern-East Italy and a major tributary of the Adige River, the second longest Italian river. The objective of the research is to investigate the response of the lower course of the Noce to two main stages of hydromorphological regulation; channelization/ diversion and, one century later, hydropower regulation. This research uses a historical reconstruction to link the geomorphic response with natural and human-induced factors by identifying morphological and vegetation features from historical maps and airborne photogrammetry and implementing a quantitative analysis of the river response to channelization and flow / sediment supply regulation related to hydropower development. A descriptive overview is presented. The concept of evolutionary trajectory is integrated with predictions from morphodynamic theories for river bars that allow increased insight to investigate the river response to a complex sequence of regulatory events such as development of bars, islands and riparian vegetation. Until the mid-19th century the river had a multi-thread channel pattern. Thereafter (1852) the river was straightened and diverted. Upstream of Mezzolombardo village the river was constrained between embankments of approximately 100 m width while downstream they are of approximately 50 m width. Since channelization some interesting geomorphic changes have appeared in the river e.g. the appearance of alternate bars in the channel. In 1926 there was a breach in the right bank of the downstream part that resulted in a multi-thread river reach which can be viewed as a recovery to the earlier multi-thread pattern. After the 1950's the flow and sediment supply became strongly regulated by hydropower development. The analysis of aerial images reveals that the multi-thread reach became progressively stabilized by vegetation development over the bars, though signs of some dynamics can still be recognizable today, despite the strong hydropeaking that dominates the flow regime. The results of the historical analysis will be used in a larger framework that focuses on interdisciplinary research of interactions between flow, sediment and vegetation in regulated rivers and aims to enhance knowledge on the interplay between river bars and vegetation in the perspective of providing enhanced tools for river rehabilitation and restoration.
Transition Flight Control Room Automation
NASA Technical Reports Server (NTRS)
Welborn, Curtis Ray
1990-01-01
The Workstation Prototype Laboratory is currently working on a number of projects which we feel can have a direct impact on ground operations automation. These projects include: The Fuel Cell Monitoring System (FCMS), which will monitor and detect problems with the fuel cells on the Shuttle. FCMS will use a combination of rules (forward/backward) and multi-threaded procedures which run concurrently with the rules, to implement the malfunction algorithms of the EGIL flight controllers. The combination of rule based reasoning and procedural reasoning allows us to more easily map the malfunction algorithms into a real-time system implementation. A graphical computation language (AGCOMPL). AGCOMPL is an experimental prototype to determine the benefits and drawbacks of using a graphical language to design computations (algorithms) to work on Shuttle or Space Station telemetry and trajectory data. The design of a system which will allow a model of an electrical system, including telemetry sensors, to be configured on the screen graphically using previously defined electrical icons. This electrical model would then be used to generate rules and procedures for detecting malfunctions in the electrical components of the model. A generic message management (GMM) system. GMM is being designed as a message management system for real-time applications which send advisory messages to a user. The primary purpose of GMM is to reduce the risk of overloading a user with information when multiple failures occurs and in assisting the developer in devising an explanation facility. The emphasis of our work is to develop practical tools and techniques, while determining the feasibility of a given approach, including identification of appropriate software tools to support research, application and tool building activities.
Transition flight control room automation
NASA Technical Reports Server (NTRS)
Welborn, Curtis Ray
1990-01-01
The Workstation Prototype Laboratory is currently working on a number of projects which can have a direct impact on ground operations automation. These projects include: (1) The fuel cell monitoring system (FCMS), which will monitor and detect problems with the fuel cells on the shuttle. FCMS will use a combination of rules (forward/backward) and multithreaded procedures, which run concurrently with the rules, to implement the malfunction algorithms of the EGIL flight controllers. The combination of rule-based reasoning and procedural reasoning allows us to more easily map the malfunction algorithms into a real-time system implementation. (2) A graphical computation language (AGCOMPL) is an experimental prototype to determine the benefits and drawbacks of using a graphical language to design computations (algorithms) to work on shuttle or space station telemetry and trajectory data. (3) The design of a system will allow a model of an electrical system, including telemetry sensors, to be configured on the screen graphically using previously defined electrical icons. This electrical model would then be used to generate rules and procedures for detecting malfunctions in the electrical components of the model. (4) A generic message management (GMM) system is being designed for real-time applications as a message management system which sends advisory messages to a user. The primary purpose of GMM is to reduce the risk of overloading a user with information when multiple failures occur and to assist the developer in the devising an explanation facility. The emphasis of our work is to develop practical tools and techniques, including identification of appropriate software tools to support research, application, and tool building activities, while determining the feasibility of a given approach.
Unipro UGENE: a unified bioinformatics toolkit.
Okonechnikov, Konstantin; Golosova, Olga; Fursov, Mikhail
2012-04-15
Unipro UGENE is a multiplatform open-source software with the main goal of assisting molecular biologists without much expertise in bioinformatics to manage, analyze and visualize their data. UGENE integrates widely used bioinformatics tools within a common user interface. The toolkit supports multiple biological data formats and allows the retrieval of data from remote data sources. It provides visualization modules for biological objects such as annotated genome sequences, Next Generation Sequencing (NGS) assembly data, multiple sequence alignments, phylogenetic trees and 3D structures. Most of the integrated algorithms are tuned for maximum performance by the usage of multithreading and special processor instructions. UGENE includes a visual environment for creating reusable workflows that can be launched on local resources or in a High Performance Computing (HPC) environment. UGENE is written in C++ using the Qt framework. The built-in plugin system and structured UGENE API make it possible to extend the toolkit with new functionality. UGENE binaries are freely available for MS Windows, Linux and Mac OS X at http://ugene.unipro.ru/download.html. UGENE code is licensed under the GPLv2; the information about the code licensing and copyright of integrated tools can be found in the LICENSE.3rd_party file provided with the source bundle.
MSAProbs-MPI: parallel multiple sequence aligner for distributed-memory systems.
González-Domínguez, Jorge; Liu, Yongchao; Touriño, Juan; Schmidt, Bertil
2016-12-15
MSAProbs is a state-of-the-art protein multiple sequence alignment tool based on hidden Markov models. It can achieve high alignment accuracy at the expense of relatively long runtimes for large-scale input datasets. In this work we present MSAProbs-MPI, a distributed-memory parallel version of the multithreaded MSAProbs tool that is able to reduce runtimes by exploiting the compute capabilities of common multicore CPU clusters. Our performance evaluation on a cluster with 32 nodes (each containing two Intel Haswell processors) shows reductions in execution time of over one order of magnitude for typical input datasets. Furthermore, MSAProbs-MPI using eight nodes is faster than the GPU-accelerated QuickProbs running on a Tesla K20. Another strong point is that MSAProbs-MPI can deal with large datasets for which MSAProbs and QuickProbs might fail due to time and memory constraints, respectively. Source code in C ++ and MPI running on Linux systems as well as a reference manual are available at http://msaprobs.sourceforge.net CONTACT: jgonzalezd@udc.esSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
AIBench: a rapid application development framework for translational research in biomedicine.
Glez-Peña, D; Reboiro-Jato, M; Maia, P; Rocha, M; Díaz, F; Fdez-Riverola, F
2010-05-01
Applied research in both biomedical discovery and translational medicine today often requires the rapid development of fully featured applications containing both advanced and specific functionalities, for real use in practice. In this context, new tools are demanded that allow for efficient generation, deployment and reutilization of such biomedical applications as well as their associated functionalities. In this context this paper presents AIBench, an open-source Java desktop application framework for scientific software development with the goal of providing support to both fundamental and applied research in the domain of translational biomedicine. AIBench incorporates a powerful plug-in engine, a flexible scripting platform and takes advantage of Java annotations, reflection and various design principles in order to make it easy to use, lightweight and non-intrusive. By following a basic input-processing-output life cycle, it is possible to fully develop multiplatform applications using only three types of concepts: operations, data-types and views. The framework automatically provides functionalities that are present in a typical scientific application including user parameter definition, logging facilities, multi-threading execution, experiment repeatability and user interface workflow management, among others. The proposed framework architecture defines a reusable component model which also allows assembling new applications by the reuse of libraries from past projects or third-party software. Copyright (c) 2009 Elsevier Ireland Ltd. All rights reserved.
Benchmarking high performance computing architectures with CMS’ skeleton framework
NASA Astrophysics Data System (ADS)
Sexton-Kennedy, E.; Gartung, P.; Jones, C. D.
2017-10-01
In 2012 CMS evaluated which underlying concurrency technology would be the best to use for its multi-threaded framework. The available technologies were evaluated on the high throughput computing systems dominating the resources in use at that time. A skeleton framework benchmarking suite that emulates the tasks performed within a CMSSW application was used to select Intel’s Thread Building Block library, based on the measured overheads in both memory and CPU on the different technologies benchmarked. In 2016 CMS will get access to high performance computing resources that use new many core architectures; machines such as Cori Phase 1&2, Theta, Mira. Because of this we have revived the 2012 benchmark to test it’s performance and conclusions on these new architectures. This talk will discuss the results of this exercise.
iCrowd: agent-based behavior modeling and crowd simulator
NASA Astrophysics Data System (ADS)
Kountouriotis, Vassilios I.; Paterakis, Manolis; Thomopoulos, Stelios C. A.
2016-05-01
Initially designed in the context of the TASS (Total Airport Security System) FP-7 project, the Crowd Simulation platform developed by the Integrated Systems Lab of the Institute of Informatics and Telecommunications at N.C.S.R. Demokritos, has evolved into a complete domain-independent agent-based behavior simulator with an emphasis on crowd behavior and building evacuation simulation. Under continuous development, it reflects an effort to implement a modern, multithreaded, data-oriented simulation engine employing latest state-of-the-art programming technologies and paradigms. It is based on an extensible architecture that separates core services from the individual layers of agent behavior, offering a concrete simulation kernel designed for high-performance and stability. Its primary goal is to deliver an abstract platform to facilitate implementation of several Agent-Based Simulation solutions with applicability in several domains of knowledge, such as: (i) Crowd behavior simulation during [in/out] door evacuation. (ii) Non-Player Character AI for Game-oriented applications and Gamification activities. (iii) Vessel traffic modeling and simulation for Maritime Security and Surveillance applications. (iv) Urban and Highway Traffic and Transportation Simulations. (v) Social Behavior Simulation and Modeling.
Ezra, Elishai; Maor, Idan; Bavli, Danny; Shalom, Itai; Levy, Gahl; Prill, Sebastian; Jaeger, Magnus S; Nahmias, Yaakov
2015-08-01
Microfluidic applications range from combinatorial synthesis to high throughput screening, with platforms integrating analog perfusion components, digitally controlled micro-valves and a range of sensors that demand a variety of communication protocols. Currently, discrete control units are used to regulate and monitor each component, resulting in scattered control interfaces that limit data integration and synchronization. Here, we present a microprocessor-based control unit, utilizing the MS Gadgeteer open framework that integrates all aspects of microfluidics through a high-current electronic circuit that supports and synchronizes digital and analog signals for perfusion components, pressure elements, and arbitrary sensor communication protocols using a plug-and-play interface. The control unit supports an integrated touch screen and TCP/IP interface that provides local and remote control of flow and data acquisition. To establish the ability of our control unit to integrate and synchronize complex microfluidic circuits we developed an equi-pressure combinatorial mixer. We demonstrate the generation of complex perfusion sequences, allowing the automated sampling, washing, and calibrating of an electrochemical lactate sensor continuously monitoring hepatocyte viability following exposure to the pesticide rotenone. Importantly, integration of an optical sensor allowed us to implement automated optimization protocols that require different computational challenges including: prioritized data structures in a genetic algorithm, distributed computational efforts in multiple-hill climbing searches and real-time realization of probabilistic models in simulated annealing. Our system offers a comprehensive solution for establishing optimization protocols and perfusion sequences in complex microfluidic circuits.
RCrawler: An R package for parallel web crawling and scraping
NASA Astrophysics Data System (ADS)
Khalil, Salim; Fakir, Mohamed
RCrawler is a contributed R package for domain-based web crawling and content scraping. As the first implementation of a parallel web crawler in the R environment, RCrawler can crawl, parse, store pages, extract contents, and produce data that can be directly employed for web content mining applications. However, it is also flexible, and could be adapted to other applications. The main features of RCrawler are multi-threaded crawling, content extraction, and duplicate content detection. In addition, it includes functionalities such as URL and content-type filtering, depth level controlling, and a robot.txt parser. Our crawler has a highly optimized system, and can download a large number of pages per second while being robust against certain crashes and spider traps. In this paper, we describe the design and functionality of RCrawler, and report on our experience of implementing it in an R environment, including different optimizations that handle the limitations of R. Finally, we discuss our experimental results.
A Multi-Threaded Cryptographic Pseudorandom Number Generator Test Suite
2016-09-01
bitcoin thieves, Google releases patch. (2013, Aug. 16). SiliconANGLE. [Online]. Available: http://siliconangle.com/blog/2013/ 08/16/android-crypto-prng...flaw-aided- bitcoin -thieves-google-releases-patch/ [5] M. Gondree. (2014, Sep. 28). NPS POSIX thread pool library. [Online]. Available: https
DOE Office of Scientific and Technical Information (OSTI.GOV)
A parallelization of the k-means++ seed selection algorithm on three distinct hardware platforms: GPU, multicore CPU, and multithreaded architecture. K-means++ was developed by David Arthur and Sergei Vassilvitskii in 2007 as an extension of the k-means data clustering technique. These algorithms allow people to cluster multidimensional data, by attempting to minimize the mean distance of data points within a cluster. K-means++ improved upon traditional k-means by using a more intelligent approach to selecting the initial seeds for the clustering process. While k-means++ has become a popular alternative to traditional k-means clustering, little work has been done to parallelize this technique.more » We have developed original C++ code for parallelizing the algorithm on three unique hardware architectures: GPU using NVidia's CUDA/Thrust framework, multicore CPU using OpenMP, and the Cray XMT multithreaded architecture. By parallelizing the process for these platforms, we are able to perform k-means++ clustering much more quickly than it could be done before.« less
Programs for Testing Processor-in-Memory Computing Systems
NASA Technical Reports Server (NTRS)
Katz, Daniel S.
2006-01-01
The Multithreaded Microbenchmarks for Processor-In-Memory (PIM) Compilers, Simulators, and Hardware are computer programs arranged in a series for use in testing the performances of PIM computing systems, including compilers, simulators, and hardware. The programs at the beginning of the series test basic functionality; the programs at subsequent positions in the series test increasingly complex functionality. The programs are intended to be used while designing a PIM system, and can be used to verify that compilers, simulators, and hardware work correctly. The programs can also be used to enable designers of these system components to examine tradeoffs in implementation. Finally, these programs can be run on non-PIM hardware (either single-threaded or multithreaded) using the POSIX pthreads standard to verify that the benchmarks themselves operate correctly. [POSIX (Portable Operating System Interface for UNIX) is a set of standards that define how programs and operating systems interact with each other. pthreads is a library of pre-emptive thread routines that comply with one of the POSIX standards.
Scheduling based on a dynamic resource connection
NASA Astrophysics Data System (ADS)
Nagiyev, A. E.; Botygin, I. A.; Shersntneva, A. I.; Konyaev, P. A.
2017-02-01
The practical using of distributed computing systems associated with many problems, including troubles with the organization of an effective interaction between the agents located at the nodes of the system, with the specific configuration of each node of the system to perform a certain task, with the effective distribution of the available information and computational resources of the system, with the control of multithreading which implements the logic of solving research problems and so on. The article describes the method of computing load balancing in distributed automatic systems, focused on the multi-agency and multi-threaded data processing. The scheme of the control of processing requests from the terminal devices, providing the effective dynamic scaling of computing power under peak load is offered. The results of the model experiments research of the developed load scheduling algorithm are set out. These results show the effectiveness of the algorithm even with a significant expansion in the number of connected nodes and zoom in the architecture distributed computing system.
Multithreading with separate data to improve the performance of Backpropagation method
NASA Astrophysics Data System (ADS)
Dhamma, Mulia; Zarlis, Muhammad; Budhiarti Nababan, Erna
2017-12-01
Backpropagation is one method of artificial neural network that can make a prediction for a new data with learning by supervised of the past data. The learning process of backpropagation method will become slow if we give too much data for backpropagation method to learn the data. Multithreading with a separate data inside of each thread are being used in order to improve the performance of backpropagtion method . Base on the research for 39 data and also 5 times experiment with separate data into 2 thread, the result showed that the average epoch become 6490 when using 2 thread and 453049 epoch when using only 1 thread. The most lowest epoch for 2 thread is 1295 and 1 thread is 356116. The process of improvement is caused by the minimum error from 2 thread that has been compared to take the weight and bias value. This process will be repeat as long as the backpropagation do learning.
A Bandwidth-Optimized Multi-Core Architecture for Irregular Applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Secchi, Simone; Tumeo, Antonino; Villa, Oreste
This paper presents an architecture template for next-generation high performance computing systems specifically targeted to irregular applications. We start our work by considering that future generation interconnection and memory bandwidth full-system numbers are expected to grow by a factor of 10. In order to keep up with such a communication capacity, while still resorting to fine-grained multithreading as the main way to tolerate unpredictable memory access latencies of irregular applications, we show how overall performance scaling can benefit from the multi-core paradigm. At the same time, we also show how such an architecture template must be coupled with specific techniquesmore » in order to optimize bandwidth utilization and achieve the maximum scalability. We propose a technique based on memory references aggregation, together with the related hardware implementation, as one of such optimization techniques. We explore the proposed architecture template by focusing on the Cray XMT architecture and, using a dedicated simulation infrastructure, validate the performance of our template with two typical irregular applications. Our experimental results prove the benefits provided by both the multi-core approach and the bandwidth optimization reference aggregation technique.« less
SALT: The Simulator for the Analysis of LWP Timing
NASA Technical Reports Server (NTRS)
Springer, Paul L.; Rodrigues, Arun; Brockman, Jay
2006-01-01
With the emergence of new processor architectures that are highly multithreaded, and support features such as full/empty memory semantics and split-phase memory transactions, the need for a processor simulator to handle these features becomes apparent. This paper describes such a simulator, called SALT.
Developing Multimedia Courseware for the Internet's Java versus Shockwave.
ERIC Educational Resources Information Center
Majchrzak, Tina L.
1996-01-01
Describes and compares two methods for developing multimedia courseware for use on the Internet: an authoring tool called Shockwave, and an object-oriented language called Java. Topics include vector graphics, browsers, interaction with network protocols, data security, multithreading, and computer languages versus development environments. (LRW)
Block-Parallel Data Analysis with DIY2
DOE Office of Scientific and Technical Information (OSTI.GOV)
Morozov, Dmitriy; Peterka, Tom
DIY2 is a programming model and runtime for block-parallel analytics on distributed-memory machines. Its main abstraction is block-structured data parallelism: data are decomposed into blocks; blocks are assigned to processing elements (processes or threads); computation is described as iterations over these blocks, and communication between blocks is defined by reusable patterns. By expressing computation in this general form, the DIY2 runtime is free to optimize the movement of blocks between slow and fast memories (disk and flash vs. DRAM) and to concurrently execute blocks residing in memory with multiple threads. This enables the same program to execute in-core, out-of-core, serial,more » parallel, single-threaded, multithreaded, or combinations thereof. This paper describes the implementation of the main features of the DIY2 programming model and optimizations to improve performance. DIY2 is evaluated on benchmark test cases to establish baseline performance for several common patterns and on larger complete analysis codes running on large-scale HPC machines.« less
Inverted Streams in the Aeolis Region
2015-12-10
The sinuous ridges in this image display strong characteristics of ancient meandering riverbeds that are preserved as inverted topography (blue). The ancient river sediments that make up the ridges might have allowed fluids to produce cements (e.g., calcite or iron oxides) to make the channel lithology resistant to weathering and erosion. Later, physical and/or chemical processes removed the weaker surrounding flood plain material and left inverted river channels, or "positive relief." On closer inspection, degradation along sections of some inverted channels display large blocks of cemented sediment that were transported downslope by mass wasting. The sinuous character of the ridges resembles multi-thread river branches, implying that the ancient river flowed down a gentle to nearly horizontal slope (i.e., a moderate to low stream gradient). This ancient river was a mature meandering system, with flow from south to north. Multiple branches that diverted from the main flow later converged back with it. http://photojournal.jpl.nasa.gov/catalog/PIA20210
Optimization of sparse matrix-vector multiplication on emerging multicore platforms
DOE Office of Scientific and Technical Information (OSTI.GOV)
Williams, Samuel; Oliker, Leonid; Vuduc, Richard
2007-01-01
We are witnessing a dramatic change in computer architecture due to the multicore paradigm shift, as every electronic device from cell phones to supercomputers confronts parallelism of unprecedented scale. To fully unleash the potential of these systems, the HPC community must develop multicore specific optimization methodologies for important scientific computations. In this work, we examine sparse matrix-vector multiply (SpMV) - one of the most heavily used kernels in scientific computing - across a broad spectrum of multicore designs. Our experimental platform includes the homogeneous AMD dual-core and Intel quad-core designs, the heterogeneous STI Cell, as well as the first scientificmore » study of the highly multithreaded Sun Niagara2. We present several optimization strategies especially effective for the multicore environment, and demonstrate significant performance improvements compared to existing state-of-the-art serial and parallel SpMV implementations. Additionally, we present key insights into the architectural tradeoffs of leading multicore design strategies, in the context of demanding memory-bound numerical algorithms.« less
Craniux: A LabVIEW-Based Modular Software Framework for Brain-Machine Interface Research
Degenhart, Alan D.; Kelly, John W.; Ashmore, Robin C.; Collinger, Jennifer L.; Tyler-Kabara, Elizabeth C.; Weber, Douglas J.; Wang, Wei
2011-01-01
This paper presents “Craniux,” an open-access, open-source software framework for brain-machine interface (BMI) research. Developed in LabVIEW, a high-level graphical programming environment, Craniux offers both out-of-the-box functionality and a modular BMI software framework that is easily extendable. Specifically, it allows researchers to take advantage of multiple features inherent to the LabVIEW environment for on-the-fly data visualization, parallel processing, multithreading, and data saving. This paper introduces the basic features and system architecture of Craniux and describes the validation of the system under real-time BMI operation using simulated and real electrocorticographic (ECoG) signals. Our results indicate that Craniux is able to operate consistently in real time, enabling a seamless work flow to achieve brain control of cursor movement. The Craniux software framework is made available to the scientific research community to provide a LabVIEW-based BMI software platform for future BMI research and development. PMID:21687575
Craniux: a LabVIEW-based modular software framework for brain-machine interface research.
Degenhart, Alan D; Kelly, John W; Ashmore, Robin C; Collinger, Jennifer L; Tyler-Kabara, Elizabeth C; Weber, Douglas J; Wang, Wei
2011-01-01
This paper presents "Craniux," an open-access, open-source software framework for brain-machine interface (BMI) research. Developed in LabVIEW, a high-level graphical programming environment, Craniux offers both out-of-the-box functionality and a modular BMI software framework that is easily extendable. Specifically, it allows researchers to take advantage of multiple features inherent to the LabVIEW environment for on-the-fly data visualization, parallel processing, multithreading, and data saving. This paper introduces the basic features and system architecture of Craniux and describes the validation of the system under real-time BMI operation using simulated and real electrocorticographic (ECoG) signals. Our results indicate that Craniux is able to operate consistently in real time, enabling a seamless work flow to achieve brain control of cursor movement. The Craniux software framework is made available to the scientific research community to provide a LabVIEW-based BMI software platform for future BMI research and development.
List-mode PET image reconstruction for motion correction using the Intel XEON PHI co-processor
NASA Astrophysics Data System (ADS)
Ryder, W. J.; Angelis, G. I.; Bashar, R.; Gillam, J. E.; Fulton, R.; Meikle, S.
2014-03-01
List-mode image reconstruction with motion correction is computationally expensive, as it requires projection of hundreds of millions of rays through a 3D array. To decrease reconstruction time it is possible to use symmetric multiprocessing computers or graphics processing units. The former can have high financial costs, while the latter can require refactoring of algorithms. The Xeon Phi is a new co-processor card with a Many Integrated Core architecture that can run 4 multiple-instruction, multiple data threads per core with each thread having a 512-bit single instruction, multiple data vector register. Thus, it is possible to run in the region of 220 threads simultaneously. The aim of this study was to investigate whether the Xeon Phi co-processor card is a viable alternative to an x86 Linux server for accelerating List-mode PET image reconstruction for motion correction. An existing list-mode image reconstruction algorithm with motion correction was ported to run on the Xeon Phi coprocessor with the multi-threading implemented using pthreads. There were no differences between images reconstructed using the Phi co-processor card and images reconstructed using the same algorithm run on a Linux server. However, it was found that the reconstruction runtimes were 3 times greater for the Phi than the server. A new version of the image reconstruction algorithm was developed in C++ using OpenMP for mutli-threading and the Phi runtimes decreased to 1.67 times that of the host Linux server. Data transfer from the host to co-processor card was found to be a rate-limiting step; this needs to be carefully considered in order to maximize runtime speeds. When considering the purchase price of a Linux workstation with Xeon Phi co-processor card and top of the range Linux server, the former is a cost-effective computation resource for list-mode image reconstruction. A multi-Phi workstation could be a viable alternative to cluster computers at a lower cost for medical imaging applications.
Benchmarking high performance computing architectures with CMS’ skeleton framework
Sexton-Kennedy, E.; Gartung, P.; Jones, C. D.
2017-11-23
Here, in 2012 CMS evaluated which underlying concurrency technology would be the best to use for its multi-threaded framework. The available technologies were evaluated on the high throughput computing systems dominating the resources in use at that time. A skeleton framework benchmarking suite that emulates the tasks performed within a CMSSW application was used to select Intel’s Thread Building Block library, based on the measured overheads in both memory and CPU on the different technologies benchmarked. In 2016 CMS will get access to high performance computing resources that use new many core architectures; machines such as Cori Phase 1&2, Theta,more » Mira. Because of this we have revived the 2012 benchmark to test it’s performance and conclusions on these new architectures. This talk will discuss the results of this exercise.« less
Benchmarking high performance computing architectures with CMS’ skeleton framework
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sexton-Kennedy, E.; Gartung, P.; Jones, C. D.
Here, in 2012 CMS evaluated which underlying concurrency technology would be the best to use for its multi-threaded framework. The available technologies were evaluated on the high throughput computing systems dominating the resources in use at that time. A skeleton framework benchmarking suite that emulates the tasks performed within a CMSSW application was used to select Intel’s Thread Building Block library, based on the measured overheads in both memory and CPU on the different technologies benchmarked. In 2016 CMS will get access to high performance computing resources that use new many core architectures; machines such as Cori Phase 1&2, Theta,more » Mira. Because of this we have revived the 2012 benchmark to test it’s performance and conclusions on these new architectures. This talk will discuss the results of this exercise.« less
NASA Technical Reports Server (NTRS)
Oliker, Leonid; Heber, Gerd; Biswas, Rupak
2000-01-01
The Conjugate Gradient (CG) algorithm is perhaps the best-known iterative technique to solve sparse linear systems that are symmetric and positive definite. A sparse matrix-vector multiply (SPMV) usually accounts for most of the floating-point operations within a CG iteration. In this paper, we investigate the effects of various ordering and partitioning strategies on the performance of parallel CG and SPMV using different programming paradigms and architectures. Results show that for this class of applications, ordering significantly improves overall performance, that cache reuse may be more important than reducing communication, and that it is possible to achieve message passing performance using shared memory constructs through careful data ordering and distribution. However, a multi-threaded implementation of CG on the Tera MTA does not require special ordering or partitioning to obtain high efficiency and scalability.
CUDA-Accelerated Geodesic Ray-Tracing for Fiber Tracking
van Aart, Evert; Sepasian, Neda; Jalba, Andrei; Vilanova, Anna
2011-01-01
Diffusion Tensor Imaging (DTI) allows to noninvasively measure the diffusion of water in fibrous tissue. By reconstructing the fibers from DTI data using a fiber-tracking algorithm, we can deduce the structure of the tissue. In this paper, we outline an approach to accelerating such a fiber-tracking algorithm using a Graphics Processing Unit (GPU). This algorithm, which is based on the calculation of geodesics, has shown promising results for both synthetic and real data, but is limited in its applicability by its high computational requirements. We present a solution which uses the parallelism offered by modern GPUs, in combination with the CUDA platform by NVIDIA, to significantly reduce the execution time of the fiber-tracking algorithm. Compared to a multithreaded CPU implementation of the same algorithm, our GPU mapping achieves a speedup factor of up to 40 times. PMID:21941525
Dynamic Multiple Work Stealing Strategy for Flexible Load Balancing
NASA Astrophysics Data System (ADS)
Adnan; Sato, Mitsuhisa
Lazy-task creation is an efficient method of overcoming the overhead of the grain-size problem in parallel computing. Work stealing is an effective load balancing strategy for parallel computing. In this paper, we present dynamic work stealing strategies in a lazy-task creation technique for efficient fine-grain task scheduling. The basic idea is to control load balancing granularity depending on the number of task parents in a stack. The dynamic-length strategy of work stealing uses run-time information, which is information on the load of the victim, to determine the number of tasks that a thief is allowed to steal. We compare it with the bottommost first work stealing strategy used in StackThread/MP, and the fixed-length strategy of work stealing, where a thief requests to steal a fixed number of tasks, as well as other multithreaded frameworks such as Cilk and OpenMP task implementations. The experiments show that the dynamic-length strategy of work stealing performs well in irregular workloads such as in UTS benchmarks, as well as in regular workloads such as Fibonacci, Strassen's matrix multiplication, FFT, and Sparse-LU factorization. The dynamic-length strategy works better than the fixed-length strategy because it is more flexible than the latter; this strategy can avoid load imbalance due to overstealing.
An efficient, scalable, and adaptable framework for solving generic systems of level-set PDEs
Mosaliganti, Kishore R.; Gelas, Arnaud; Megason, Sean G.
2013-01-01
In the last decade, level-set methods have been actively developed for applications in image registration, segmentation, tracking, and reconstruction. However, the development of a wide variety of level-set PDEs and their numerical discretization schemes, coupled with hybrid combinations of PDE terms, stopping criteria, and reinitialization strategies, has created a software logistics problem. In the absence of an integrative design, current toolkits support only specific types of level-set implementations which restrict future algorithm development since extensions require significant code duplication and effort. In the new NIH/NLM Insight Toolkit (ITK) v4 architecture, we implemented a level-set software design that is flexible to different numerical (continuous, discrete, and sparse) and grid representations (point, mesh, and image-based). Given that a generic PDE is a summation of different terms, we used a set of linked containers to which level-set terms can be added or deleted at any point in the evolution process. This container-based approach allows the user to explore and customize terms in the level-set equation at compile-time in a flexible manner. The framework is optimized so that repeated computations of common intensity functions (e.g., gradient and Hessians) across multiple terms is eliminated. The framework further enables the evolution of multiple level-sets for multi-object segmentation and processing of large datasets. For doing so, we restrict level-set domains to subsets of the image domain and use multithreading strategies to process groups of subdomains or level-set functions. Users can also select from a variety of reinitialization policies and stopping criteria. Finally, we developed a visualization framework that shows the evolution of a level-set in real-time to help guide algorithm development and parameter optimization. We demonstrate the power of our new framework using confocal microscopy images of cells in a developing zebrafish embryo. PMID:24501592
An efficient, scalable, and adaptable framework for solving generic systems of level-set PDEs.
Mosaliganti, Kishore R; Gelas, Arnaud; Megason, Sean G
2013-01-01
In the last decade, level-set methods have been actively developed for applications in image registration, segmentation, tracking, and reconstruction. However, the development of a wide variety of level-set PDEs and their numerical discretization schemes, coupled with hybrid combinations of PDE terms, stopping criteria, and reinitialization strategies, has created a software logistics problem. In the absence of an integrative design, current toolkits support only specific types of level-set implementations which restrict future algorithm development since extensions require significant code duplication and effort. In the new NIH/NLM Insight Toolkit (ITK) v4 architecture, we implemented a level-set software design that is flexible to different numerical (continuous, discrete, and sparse) and grid representations (point, mesh, and image-based). Given that a generic PDE is a summation of different terms, we used a set of linked containers to which level-set terms can be added or deleted at any point in the evolution process. This container-based approach allows the user to explore and customize terms in the level-set equation at compile-time in a flexible manner. The framework is optimized so that repeated computations of common intensity functions (e.g., gradient and Hessians) across multiple terms is eliminated. The framework further enables the evolution of multiple level-sets for multi-object segmentation and processing of large datasets. For doing so, we restrict level-set domains to subsets of the image domain and use multithreading strategies to process groups of subdomains or level-set functions. Users can also select from a variety of reinitialization policies and stopping criteria. Finally, we developed a visualization framework that shows the evolution of a level-set in real-time to help guide algorithm development and parameter optimization. We demonstrate the power of our new framework using confocal microscopy images of cells in a developing zebrafish embryo.
Benchmark and Framework for Encouraging Research on Multi-Threaded Testing Tools
NASA Technical Reports Server (NTRS)
Havelund, Klaus; Stoller, Scott D.; Ur, Shmuel
2003-01-01
A problem that has been getting prominence in testing is that of looking for intermittent bugs. Multi-threaded code is becoming very common, mostly on the server side. As there is no silver bullet solution, research focuses on a variety of partial solutions. In this paper (invited by PADTAD 2003) we outline a proposed project to facilitate research. The project goals are as follows. The first goal is to create a benchmark that can be used to evaluate different solutions. The benchmark, apart from containing programs with documented bugs, will include other artifacts, such as traces, that are useful for evaluating some of the technologies. The second goal is to create a set of tools with open API s that can be used to check ideas without building a large system. For example an instrumentor will be available, that could be used to test temporal noise making heuristics. The third goal is to create a focus for the research in this area around which a community of people who try to solve similar problems with different techniques, could congregate.
Parallel Computer System for 3D Visualization Stereo on GPU
NASA Astrophysics Data System (ADS)
Al-Oraiqat, Anas M.; Zori, Sergii A.
2018-03-01
This paper proposes the organization of a parallel computer system based on Graphic Processors Unit (GPU) for 3D stereo image synthesis. The development is based on the modified ray tracing method developed by the authors for fast search of tracing rays intersections with scene objects. The system allows significant increase in the productivity for the 3D stereo synthesis of photorealistic quality. The generalized procedure of 3D stereo image synthesis on the Graphics Processing Unit/Graphics Processing Clusters (GPU/GPC) is proposed. The efficiency of the proposed solutions by GPU implementation is compared with single-threaded and multithreaded implementations on the CPU. The achieved average acceleration in multi-thread implementation on the test GPU and CPU is about 7.5 and 1.6 times, respectively. Studying the influence of choosing the size and configuration of the computational Compute Unified Device Archi-tecture (CUDA) network on the computational speed shows the importance of their correct selection. The obtained experimental estimations can be significantly improved by new GPUs with a large number of processing cores and multiprocessors, as well as optimized configuration of the computing CUDA network.
AN MHD AVALANCHE IN A MULTI-THREADED CORONAL LOOP
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hood, A. W.; Cargill, P. J.; Tam, K. V.
For the first time, we demonstrate how an MHD avalanche might occur in a multithreaded coronal loop. Considering 23 non-potential magnetic threads within a loop, we use 3D MHD simulations to show that only one thread needs to be unstable in order to start an avalanche even when the others are below marginal stability. This has significant implications for coronal heating in that it provides for energy dissipation with a trigger mechanism. The instability of the unstable thread follows the evolution determined in many earlier investigations. However, once one stable thread is disrupted, it coalesces with a neighboring thread andmore » this process disrupts other nearby threads. Coalescence with these disrupted threads then occurs leading to the disruption of yet more threads as the avalanche develops. Magnetic energy is released in discrete bursts as the surrounding stable threads are disrupted. The volume integrated heating, as a function of time, shows short spikes suggesting that the temporal form of the heating is more like that of nanoflares than of constant heating.« less
NASA Astrophysics Data System (ADS)
Polito, V.; Testa, P.; De Pontieu, B.; Allred, J. C.
2017-12-01
The observation of the high temperature (above 10 MK) Fe XXI 1354.1 A line with the Interface Region Imaging Spectrograph (IRIS) has provided significant insights into the chromospheric evaporation process in flares. In particular, the line is often observed to be completely blueshifted, in contrast to previous observations at lower spatial and spectral resolution, and in agreement with predictions from theoretical models. Interestingly, the line is also observed to be mostly symmetric and with a large excess above the thermal width. One popular interpretation for the excess broadening is given by assuming a superposition of flows from different loop strands. In this work, we perform a statistical analysis of Fe XXI line profiles observed by IRIS during the impulsive phase of flares and compare our results with hydrodynamic simulations of multi-thread flare loops performed with the 1D RADYN code. Our results indicate that the multi-thread models cannot easily reproduce the symmetry of the line and that some other physical process might need to be invoked in order to explain the observed profiles.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vatsavai, Raju; Burk, Thomas E; Lime, Steve
2012-01-01
The components making up an Open Source GIS are explained in this chapter. A map server (Sect. 30.1) can broadly be defined as a software platform for dynamically generating spatially referenced digital map products. The University of Minnesota MapServer (UMN Map Server) is one such system. Its basic features are visualization, overlay, and query. Section 30.2 names and explains many of the geospatial open source libraries, such as GDAL and OGR. The other libraries are FDO, JTS, GEOS, JCS, MetaCRS, and GPSBabel. The application examples include derived GIS-software and data format conversions. Quantum GIS, its origin and its applications explainedmore » in detail in Sect. 30.3. The features include a rich GUI, attribute tables, vector symbols, labeling, editing functions, projections, georeferencing, GPS support, analysis, and Web Map Server functionality. Future developments will address mobile applications, 3-D, and multithreading. The origins of PostgreSQL are outlined and PostGIS discussed in detail in Sect. 30.4. It extends PostgreSQL by implementing the Simple Feature standard. Section 30.5 details the most important open source licenses such as the GPL, the LGPL, the MIT License, and the BSD License, as well as the role of the Creative Commons.« less
High-Performance, Multi-Node File Copies and Checksums for Clustered File Systems
NASA Technical Reports Server (NTRS)
Kolano, Paul Z.; Ciotti, Robert B.
2012-01-01
Modern parallel file systems achieve high performance using a variety of techniques, such as striping files across multiple disks to increase aggregate I/O bandwidth and spreading disks across multiple servers to increase aggregate interconnect bandwidth. To achieve peak performance from such systems, it is typically necessary to utilize multiple concurrent readers/writers from multiple systems to overcome various singlesystem limitations, such as number of processors and network bandwidth. The standard cp and md5sum tools of GNU coreutils found on every modern Unix/Linux system, however, utilize a single execution thread on a single CPU core of a single system, and hence cannot take full advantage of the increased performance of clustered file systems. Mcp and msum are drop-in replacements for the standard cp and md5sum programs that utilize multiple types of parallelism and other optimizations to achieve maximum copy and checksum performance on clustered file systems. Multi-threading is used to ensure that nodes are kept as busy as possible. Read/write parallelism allows individual operations of a single copy to be overlapped using asynchronous I/O. Multinode cooperation allows different nodes to take part in the same copy/checksum. Split-file processing allows multiple threads to operate concurrently on the same file. Finally, hash trees allow inherently serial checksums to be performed in parallel. Mcp and msum provide significant performance improvements over standard cp and md5sum using multiple types of parallelism and other optimizations. The total speed-ups from all improvements are significant. Mcp improves cp performance over 27x, msum improves md5sum performance almost 19x, and the combination of mcp and msum improves verified copies via cp and md5sum by almost 22x. These improvements come in the form of drop-in replacements for cp and md5sum, so are easily used and are available for download as open source software at http://mutil.sourceforge.net.
FPGA-Based Reconfigurable Processor for Ultrafast Interlaced Ultrasound and Photoacoustic Imaging
Alqasemi, Umar; Li, Hai; Aguirre, Andrés; Zhu, Quing
2016-01-01
In this paper, we report, to the best of our knowledge, a unique field-programmable gate array (FPGA)-based reconfigurable processor for real-time interlaced co-registered ultrasound and photoacoustic imaging and its application in imaging tumor dynamic response. The FPGA is used to control, acquire, store, delay-and-sum, and transfer the data for real-time co-registered imaging. The FPGA controls the ultrasound transmission and ultrasound and photoacoustic data acquisition process of a customized 16-channel module that contains all of the necessary analog and digital circuits. The 16-channel module is one of multiple modules plugged into a motherboard; their beamformed outputs are made available for a digital signal processor (DSP) to access using an external memory interface (EMIF). The FPGA performs a key role through ultrafast reconfiguration and adaptation of its structure to allow real-time switching between the two imaging modes, including transmission control, laser synchronization, internal memory structure, beamforming, and EMIF structure and memory size. It performs another role by parallel accessing of internal memories and multi-thread processing to reduce the transfer of data and the processing load on the DSP. Furthermore, because the laser will be pulsing even during ultrasound pulse-echo acquisition, the FPGA ensures that the laser pulses are far enough from the pulse-echo acquisitions by appropriate time-division multiplexing (TDM). A co-registered ultrasound and photoacoustic imaging system consisting of four FPGA modules (64-channels) is constructed, and its performance is demonstrated using phantom targets and in vivo mouse tumor models. PMID:22828830
FPGA-based reconfigurable processor for ultrafast interlaced ultrasound and photoacoustic imaging.
Alqasemi, Umar; Li, Hai; Aguirre, Andrés; Zhu, Quing
2012-07-01
In this paper, we report, to the best of our knowledge, a unique field-programmable gate array (FPGA)-based reconfigurable processor for real-time interlaced co-registered ultrasound and photoacoustic imaging and its application in imaging tumor dynamic response. The FPGA is used to control, acquire, store, delay-and-sum, and transfer the data for real-time co-registered imaging. The FPGA controls the ultrasound transmission and ultrasound and photoacoustic data acquisition process of a customized 16-channel module that contains all of the necessary analog and digital circuits. The 16-channel module is one of multiple modules plugged into a motherboard; their beamformed outputs are made available for a digital signal processor (DSP) to access using an external memory interface (EMIF). The FPGA performs a key role through ultrafast reconfiguration and adaptation of its structure to allow real-time switching between the two imaging modes, including transmission control, laser synchronization, internal memory structure, beamforming, and EMIF structure and memory size. It performs another role by parallel accessing of internal memories and multi-thread processing to reduce the transfer of data and the processing load on the DSP. Furthermore, because the laser will be pulsing even during ultrasound pulse-echo acquisition, the FPGA ensures that the laser pulses are far enough from the pulse-echo acquisitions by appropriate time-division multiplexing (TDM). A co-registered ultrasound and photoacoustic imaging system consisting of four FPGA modules (64-channels) is constructed, and its performance is demonstrated using phantom targets and in vivo mouse tumor models.
From Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation
Blazewicz, Marek; Hinder, Ian; Koppelman, David M.; ...
2013-01-01
Starting from a high-level problem description in terms of partial differential equations using abstract tensor notation, the Chemora framework discretizes, optimizes, and generates complete high performance codes for a wide range of compute architectures. Chemora extends the capabilities of Cactus, facilitating the usage of large-scale CPU/GPU systems in an efficient manner for complex applications, without low-level code tuning. Chemora achieves parallelism through MPI and multi-threading, combining OpenMP and CUDA. Optimizations include high-level code transformations, efficient loop traversal strategies, dynamically selected data and instruction cache usage strategies, and JIT compilation of GPU code tailored to the problem characteristics. The discretization ismore » based on higher-order finite differences on multi-block domains. Chemora's capabilities are demonstrated by simulations of black hole collisions. This problem provides an acid test of the framework, as the Einstein equations contain hundreds of variables and thousands of terms.« less
Software design of a remote real-time ECG monitoring system
NASA Astrophysics Data System (ADS)
Yu, Chengbo; Tao, Hongyan
2005-12-01
Heart disease is one of the main diseases that threaten the health and lives of human beings. At present, the normal remote ECG monitoring system has the disadvantages of a short testing distance and limitation of monitoring lines. Because of accident and paroxysmal disease, ECG monitoring has extended from the hospital to the family. Therefore, remote ECG monitoring through the Internet has the actual value and significance. The principle and design method of software of the remote dynamic ECG monitor was presented and discussed. The monitoring software is programmed with Delphi software based on client-sever interactive mode. The application program of the system, which makes use of multithreading technology, is shown to perform in an excellent manner. The program includes remote link users and ECG processing, i.e. ECG data's receiving, real-time displaying, recording and replaying. The system can connect many clients simultaneously and perform real-time monitoring to patients.
Naval Research Laboratory Fact Book 2012
2012-11-01
Distributed network-based battle management High performance computing supporting uniform and nonuniform memory access with single and multithreaded...hyperspectral systems VNIR, MWIR, and LWIR high-resolution systems Wideband SAR systems RF and laser data links High-speed, high-power...hyperspectral imaging system Long-wave infrared ( LWIR ) quantum well IR photodetector (QWIP) imaging system Research and Development Services Divi- sion
2008-01-01
Distributed network-based battle management High performance computing supporting uniform and nonuniform memory access with single and multithreaded...pallet Airborne EO/IR and radar sensors VNIR through SWIR hyperspectral systems VNIR, MWIR, and LWIR high-resolution sys- tems Wideband SAR systems...meteorological sensors Hyperspectral sensor systems (PHILLS) Mid-wave infrared (MWIR) Indium Antimonide (InSb) imaging system Long-wave infrared ( LWIR
Best bang for your buck: GPU nodes for GROMACS biomolecular simulations
Páll, Szilárd; Fechner, Martin; Esztermann, Ansgar; de Groot, Bert L.; Grubmüller, Helmut
2015-01-01
The molecular dynamics simulation package GROMACS runs efficiently on a wide variety of hardware from commodity workstations to high performance computing clusters. Hardware features are well‐exploited with a combination of single instruction multiple data, multithreading, and message passing interface (MPI)‐based single program multiple data/multiple program multiple data parallelism while graphics processing units (GPUs) can be used as accelerators to compute interactions off‐loaded from the CPU. Here, we evaluate which hardware produces trajectories with GROMACS 4.6 or 5.0 in the most economical way. We have assembled and benchmarked compute nodes with various CPU/GPU combinations to identify optimal compositions in terms of raw trajectory production rate, performance‐to‐price ratio, energy efficiency, and several other criteria. Although hardware prices are naturally subject to trends and fluctuations, general tendencies are clearly visible. Adding any type of GPU significantly boosts a node's simulation performance. For inexpensive consumer‐class GPUs this improvement equally reflects in the performance‐to‐price ratio. Although memory issues in consumer‐class GPUs could pass unnoticed as these cards do not support error checking and correction memory, unreliable GPUs can be sorted out with memory checking tools. Apart from the obvious determinants for cost‐efficiency like hardware expenses and raw performance, the energy consumption of a node is a major cost factor. Over the typical hardware lifetime until replacement of a few years, the costs for electrical power and cooling can become larger than the costs of the hardware itself. Taking that into account, nodes with a well‐balanced ratio of CPU and consumer‐class GPU resources produce the maximum amount of GROMACS trajectory over their lifetime. © 2015 The Authors. Journal of Computational Chemistry Published by Wiley Periodicals, Inc. PMID:26238484
Best bang for your buck: GPU nodes for GROMACS biomolecular simulations.
Kutzner, Carsten; Páll, Szilárd; Fechner, Martin; Esztermann, Ansgar; de Groot, Bert L; Grubmüller, Helmut
2015-10-05
The molecular dynamics simulation package GROMACS runs efficiently on a wide variety of hardware from commodity workstations to high performance computing clusters. Hardware features are well-exploited with a combination of single instruction multiple data, multithreading, and message passing interface (MPI)-based single program multiple data/multiple program multiple data parallelism while graphics processing units (GPUs) can be used as accelerators to compute interactions off-loaded from the CPU. Here, we evaluate which hardware produces trajectories with GROMACS 4.6 or 5.0 in the most economical way. We have assembled and benchmarked compute nodes with various CPU/GPU combinations to identify optimal compositions in terms of raw trajectory production rate, performance-to-price ratio, energy efficiency, and several other criteria. Although hardware prices are naturally subject to trends and fluctuations, general tendencies are clearly visible. Adding any type of GPU significantly boosts a node's simulation performance. For inexpensive consumer-class GPUs this improvement equally reflects in the performance-to-price ratio. Although memory issues in consumer-class GPUs could pass unnoticed as these cards do not support error checking and correction memory, unreliable GPUs can be sorted out with memory checking tools. Apart from the obvious determinants for cost-efficiency like hardware expenses and raw performance, the energy consumption of a node is a major cost factor. Over the typical hardware lifetime until replacement of a few years, the costs for electrical power and cooling can become larger than the costs of the hardware itself. Taking that into account, nodes with a well-balanced ratio of CPU and consumer-class GPU resources produce the maximum amount of GROMACS trajectory over their lifetime. © 2015 The Authors. Journal of Computational Chemistry Published by Wiley Periodicals, Inc.
Optimization of Sparse Matrix-Vector Multiplication on Emerging Multicore Platforms
DOE Office of Scientific and Technical Information (OSTI.GOV)
Williams, Samuel; Oliker, Leonid; Vuduc, Richard
2008-10-16
We are witnessing a dramatic change in computer architecture due to the multicore paradigm shift, as every electronic device from cell phones to supercomputers confronts parallelism of unprecedented scale. To fully unleash the potential of these systems, the HPC community must develop multicore specific-optimization methodologies for important scientific computations. In this work, we examine sparse matrix-vector multiply (SpMV) - one of the most heavily used kernels in scientific computing - across a broad spectrum of multicore designs. Our experimental platform includes the homogeneous AMD quad-core, AMD dual-core, and Intel quad-core designs, the heterogeneous STI Cell, as well as one ofmore » the first scientific studies of the highly multithreaded Sun Victoria Falls (a Niagara2 SMP). We present several optimization strategies especially effective for the multicore environment, and demonstrate significant performance improvements compared to existing state-of-the-art serial and parallel SpMV implementations. Additionally, we present key insights into the architectural trade-offs of leading multicore design strategies, in the context of demanding memory-bound numerical algorithms.« less
Parallel approach in RDF query processing
NASA Astrophysics Data System (ADS)
Vajgl, Marek; Parenica, Jan
2017-07-01
Parallel approach is nowadays a very cheap solution to increase computational power due to possibility of usage of multithreaded computational units. This hardware became typical part of nowadays personal computers or notebooks and is widely spread. This contribution deals with experiments how evaluation of computational complex algorithm of the inference over RDF data can be parallelized over graphical cards to decrease computational time.
The effects of wildfire on native tree species in the Middle Rio Grande bosques of New Mexico
Brad Johnson; David Merritt
2009-01-01
The cottonwood bosques along the Middle Fork of the Rio Grande (MRG) form a ribbon of surviving habitat in this once vast ecosystem. Historically, the channel had a multi-threaded and braided configuration that created a rich mosaic of habitats, including mixed-aged cottonwood forests, meadows, and willow-dominated riparian wetlands and backwaters (...
NASA Astrophysics Data System (ADS)
Rit, S.; Vila Oliva, M.; Brousmiche, S.; Labarbe, R.; Sarrut, D.; Sharp, G. C.
2014-03-01
We propose the Reconstruction Toolkit (RTK, http://www.openrtk.org), an open-source toolkit for fast cone-beam CT reconstruction, based on the Insight Toolkit (ITK) and using GPU code extracted from Plastimatch. RTK is developed by an open consortium (see affiliations) under the non-contaminating Apache 2.0 license. The quality of the platform is daily checked with regression tests in partnership with Kitware, the company supporting ITK. Several features are already available: Elekta, Varian and IBA inputs, multi-threaded Feldkamp-David-Kress reconstruction on CPU and GPU, Parker short scan weighting, multi-threaded CPU and GPU forward projectors, etc. Each feature is either accessible through command line tools or C++ classes that can be included in independent software. A MIDAS community has been opened to share CatPhan datasets of several vendors (Elekta, Varian and IBA). RTK will be used in the upcoming cone-beam CT scanner developed by IBA for proton therapy rooms. Many features are under development: new input format support, iterative reconstruction, hybrid Monte Carlo / deterministic CBCT simulation, etc. RTK has been built to freely share tomographic reconstruction developments between researchers and is open for new contributions.
New technique for real-time distortion-invariant multiobject recognition and classification
NASA Astrophysics Data System (ADS)
Hong, Rutong; Li, Xiaoshun; Hong, En; Wang, Zuyi; Wei, Hongan
2001-04-01
A real-time hybrid distortion-invariant OPR system was established to make 3D multiobject distortion-invariant automatic pattern recognition. Wavelet transform technique was used to make digital preprocessing of the input scene, to depress the noisy background and enhance the recognized object. A three-layer backpropagation artificial neural network was used in correlation signal post-processing to perform multiobject distortion-invariant recognition and classification. The C-80 and NOA real-time processing ability and the multithread programming technology were used to perform high speed parallel multitask processing and speed up the post processing rate to ROIs. The reference filter library was constructed for the distortion version of 3D object model images based on the distortion parameter tolerance measuring as rotation, azimuth and scale. The real-time optical correlation recognition testing of this OPR system demonstrates that using the preprocessing, post- processing, the nonlinear algorithm os optimum filtering, RFL construction technique and the multithread programming technology, a high possibility of recognition and recognition rate ere obtained for the real-time multiobject distortion-invariant OPR system. The recognition reliability and rate was improved greatly. These techniques are very useful to automatic target recognition.
Implementation of GPU accelerated SPECT reconstruction with Monte Carlo-based scatter correction.
Bexelius, Tobias; Sohlberg, Antti
2018-06-01
Statistical SPECT reconstruction can be very time-consuming especially when compensations for collimator and detector response, attenuation, and scatter are included in the reconstruction. This work proposes an accelerated SPECT reconstruction algorithm based on graphics processing unit (GPU) processing. Ordered subset expectation maximization (OSEM) algorithm with CT-based attenuation modelling, depth-dependent Gaussian convolution-based collimator-detector response modelling, and Monte Carlo-based scatter compensation was implemented using OpenCL. The OpenCL implementation was compared against the existing multi-threaded OSEM implementation running on a central processing unit (CPU) in terms of scatter-to-primary ratios, standardized uptake values (SUVs), and processing speed using mathematical phantoms and clinical multi-bed bone SPECT/CT studies. The difference in scatter-to-primary ratios, visual appearance, and SUVs between GPU and CPU implementations was minor. On the other hand, at its best, the GPU implementation was noticed to be 24 times faster than the multi-threaded CPU version on a normal 128 × 128 matrix size 3 bed bone SPECT/CT data set when compensations for collimator and detector response, attenuation, and scatter were included. GPU SPECT reconstructions show great promise as an every day clinical reconstruction tool.
Real-time heart rate measurement for multi-people using compressive tracking
NASA Astrophysics Data System (ADS)
Liu, Lingling; Zhao, Yuejin; Liu, Ming; Kong, Lingqin; Dong, Liquan; Ma, Feilong; Pang, Zongguang; Cai, Zhi; Zhang, Yachu; Hua, Peng; Yuan, Ruifeng
2017-09-01
The rise of aging population has created a demand for inexpensive, unobtrusive, automated health care solutions. Image PhotoPlethysmoGraphy(IPPG) aids in the development of these solutions by allowing for the extraction of physiological signals from video data. However, the main deficiencies of the recent IPPG methods are non-automated, non-real-time and susceptible to motion artifacts(MA). In this paper, a real-time heart rate(HR) detection method for multiple subjects simultaneously was proposed and realized using the open computer vision(openCV) library, which consists of getting multiple subjects' facial video automatically through a Webcam, detecting the region of interest (ROI) in the video, reducing the false detection rate by our improved Adaboost algorithm, reducing the MA by our improved compress tracking(CT) algorithm, wavelet noise-suppression algorithm for denoising and multi-threads for higher detection speed. For comparison, HR was measured simultaneously using a medical pulse oximetry device for every subject during all sessions. Experimental results on a data set of 30 subjects show that the max average absolute error of heart rate estimation is less than 8 beats per minute (BPM), and the processing speed of every frame has almost reached real-time: the experiments with video recordings of ten subjects under the condition of the pixel resolution of 600× 800 pixels show that the average HR detection time of 10 subjects was about 17 frames per second (fps).
A Survey of Recent MARTe Based Systems
NASA Astrophysics Data System (ADS)
Neto, André C.; Alves, Diogo; Boncagni, Luca; Carvalho, Pedro J.; Valcarcel, Daniel F.; Barbalace, Antonio; De Tommasi, Gianmaria; Fernandes, Horácio; Sartori, Filippo; Vitale, Enzo; Vitelli, Riccardo; Zabeo, Luca
2011-08-01
The Multithreaded Application Real-Time executor (MARTe) is a data driven framework environment for the development and deployment of real-time control algorithms. The main ideas which led to the present version of the framework were to standardize the development of real-time control systems, while providing a set of strictly bounded standard interfaces to the outside world and also accommodating a collection of facilities which promote the speed and ease of development, commissioning and deployment of such systems. At the core of every MARTe based application, is a set of independent inter-communicating software blocks, named Generic Application Modules (GAM), orchestrated by a real-time scheduler. The platform independence of its core library provides MARTe the necessary robustness and flexibility for conveniently testing applications in different environments including non-real-time operating systems. MARTe is already being used in several machines, each with its own peculiarities regarding hardware interfacing, supervisory control configuration, operating system and target control application. This paper presents and compares the most recent results of systems using MARTe: the JET Vertical Stabilization system, which uses the Real Time Application Interface (RTAI) operating system on Intel multi-core processors; the COMPASS plasma control system, driven by Linux RT also on Intel multi-core processors; ISTTOK real-time tomography equilibrium reconstruction which shares the same support configuration of COMPASS; JET error field correction coils based on VME, PowerPC and VxWorks; FTU LH reflected power system running on VME, Intel with RTAI.
A Survey of New Trends in Symbolic Execution for Software Testing and Analysis
NASA Technical Reports Server (NTRS)
Pasareanu, Corina S.; Visser, Willem
2009-01-01
Symbolic execution is a well-known program analysis technique which represents values of program inputs with symbolic values instead of concrete (initialized) data and executes the program by manipulating program expressions involving the symbolic values. Symbolic execution has been proposed over three decades ago but recently it has found renewed interest in the research community, due in part to the progress in decision procedures, availability of powerful computers and new algorithmic developments. We provide a survey of some of the new research trends in symbolic execution, with particular emphasis on applications to test generation and program analysis. We first describe an approach that handles complex programming constructs such as input data structures, arrays, as well as multi-threading. We follow with a discussion of abstraction techniques that can be used to limit the (possibly infinite) number of symbolic configurations that need to be analyzed for the symbolic execution of looping programs. Furthermore, we describe recent hybrid techniques that combine concrete and symbolic execution to overcome some of the inherent limitations of symbolic execution, such as handling native code or availability of decision procedures for the application domain. Finally, we give a short survey of interesting new applications, such as predictive testing, invariant inference, program repair, analysis of parallel numerical programs and differential symbolic execution.
Propel: Tools and Methods for Practical Source Code Model Checking
NASA Technical Reports Server (NTRS)
Mansouri-Samani, Massoud; Mehlitz, Peter; Markosian, Lawrence; OMalley, Owen; Martin, Dale; Moore, Lantz; Penix, John; Visser, Willem
2003-01-01
The work reported here is an overview and snapshot of a project to develop practical model checking tools for in-the-loop verification of NASA s mission-critical, multithreaded programs in Java and C++. Our strategy is to develop and evaluate both a design concept that enables the application of model checking technology to C++ and Java, and a model checking toolset for C++ and Java. The design concept and the associated model checking toolset is called Propel. It builds upon the Java PathFinder (JPF) tool, an explicit state model checker for Java applications developed by the Automated Software Engineering group at NASA Ames Research Center. The design concept that we are developing is Design for Verification (D4V). This is an adaption of existing best design practices that has the desired side-effect of enhancing verifiability by improving modularity and decreasing accidental complexity. D4V, we believe, enhances the applicability of a variety of V&V approaches; we are developing the concept in the context of model checking. The model checking toolset, Propel, is based on extending JPF to handle C++. Our principal tasks in developing the toolset are to build a translator from C++ to Java, productize JPF, and evaluate the toolset in the context of D4V. Through all these tasks we are testing Propel capabilities on customer applications.
NASA Astrophysics Data System (ADS)
Francés, J.; Bleda, S.; Neipp, C.; Márquez, A.; Pascual, I.; Beléndez, A.
2013-03-01
The finite-difference time-domain method (FDTD) allows electromagnetic field distribution analysis as a function of time and space. The method is applied to analyze holographic volume gratings (HVGs) for the near-field distribution at optical wavelengths. Usually, this application requires the simulation of wide areas, which implies more memory and time processing. In this work, we propose a specific implementation of the FDTD method including several add-ons for a precise simulation of optical diffractive elements. Values in the near-field region are computed considering the illumination of the grating by means of a plane wave for different angles of incidence and including absorbing boundaries as well. We compare the results obtained by FDTD with those obtained using a matrix method (MM) applied to diffraction gratings. In addition, we have developed two optimized versions of the algorithm, for both CPU and GPU, in order to analyze the improvement of using the new NVIDIA Fermi GPU architecture versus highly tuned multi-core CPU as a function of the size simulation. In particular, the optimized CPU implementation takes advantage of the arithmetic and data transfer streaming SIMD (single instruction multiple data) extensions (SSE) included explicitly in the code and also of multi-threading by means of OpenMP directives. A good agreement between the results obtained using both FDTD and MM methods is obtained, thus validating our methodology. Moreover, the performance of the GPU is compared to the SSE+OpenMP CPU implementation, and it is quantitatively determined that a highly optimized CPU program can be competitive for a wider range of simulation sizes, whereas GPU computing becomes more powerful for large-scale simulations.
Open-Source Sequence Clustering Methods Improve the State Of the Art.
Kopylova, Evguenia; Navas-Molina, Jose A; Mercier, Céline; Xu, Zhenjiang Zech; Mahé, Frédéric; He, Yan; Zhou, Hong-Wei; Rognes, Torbjørn; Caporaso, J Gregory; Knight, Rob
2016-01-01
Sequence clustering is a common early step in amplicon-based microbial community analysis, when raw sequencing reads are clustered into operational taxonomic units (OTUs) to reduce the run time of subsequent analysis steps. Here, we evaluated the performance of recently released state-of-the-art open-source clustering software products, namely, OTUCLUST, Swarm, SUMACLUST, and SortMeRNA, against current principal options (UCLUST and USEARCH) in QIIME, hierarchical clustering methods in mothur, and USEARCH's most recent clustering algorithm, UPARSE. All the latest open-source tools showed promising results, reporting up to 60% fewer spurious OTUs than UCLUST, indicating that the underlying clustering algorithm can vastly reduce the number of these derived OTUs. Furthermore, we observed that stringent quality filtering, such as is done in UPARSE, can cause a significant underestimation of species abundance and diversity, leading to incorrect biological results. Swarm, SUMACLUST, and SortMeRNA have been included in the QIIME 1.9.0 release. IMPORTANCE Massive collections of next-generation sequencing data call for fast, accurate, and easily accessible bioinformatics algorithms to perform sequence clustering. A comprehensive benchmark is presented, including open-source tools and the popular USEARCH suite. Simulated, mock, and environmental communities were used to analyze sensitivity, selectivity, species diversity (alpha and beta), and taxonomic composition. The results demonstrate that recent clustering algorithms can significantly improve accuracy and preserve estimated diversity without the application of aggressive filtering. Moreover, these tools are all open source, apply multiple levels of multithreading, and scale to the demands of modern next-generation sequencing data, which is essential for the analysis of massive multidisciplinary studies such as the Earth Microbiome Project (EMP) (J. A. Gilbert, J. K. Jansson, and R. Knight, BMC Biol 12:69, 2014, http://dx.doi.org/10.1186/s12915-014-0069-1).
Software architecture and design of the web services facilitating climate model diagnostic analysis
NASA Astrophysics Data System (ADS)
Pan, L.; Lee, S.; Zhang, J.; Tang, B.; Zhai, C.; Jiang, J. H.; Wang, W.; Bao, Q.; Qi, M.; Kubar, T. L.; Teixeira, J.
2015-12-01
Climate model diagnostic analysis is a computationally- and data-intensive task because it involves multiple numerical model outputs and satellite observation data that can both be high resolution. We have built an online tool that facilitates this process. The tool is called Climate Model Diagnostic Analyzer (CMDA). It employs the web service technology and provides a web-based user interface. The benefits of these choices include: (1) No installation of any software other than a browser, hence it is platform compatable; (2) Co-location of computation and big data on the server side, and small results and plots to be downloaded on the client side, hence high data efficiency; (3) multi-threaded implementation to achieve parallel performance on multi-core servers; and (4) cloud deployment so each user has a dedicated virtual machine. In this presentation, we will focus on the computer science aspects of this tool, namely the architectural design, the infrastructure of the web services, the implementation of the web-based user interface, the mechanism of provenance collection, the approach to virtualization, and the Amazon Cloud deployment. As an example, We will describe our methodology to transform an existing science application code into a web service using a Python wrapper interface and Python web service frameworks (i.e., Flask, Gunicorn, and Tornado). Another example is the use of Docker, a light-weight virtualization container, to distribute and deploy CMDA onto an Amazon EC2 instance. Our tool of CMDA has been successfully used in the 2014 Summer School hosted by the JPL Center for Climate Science. Students had positive feedbacks in general and we will report their comments. An enhanced version of CMDA with several new features, some requested by the 2014 students, will be used in the 2015 Summer School soon.
Concurrency Attacks and Defenses
2016-10-04
Enter name(s) of person(s) responsible for writing the report, performing the research, or credited with the content of the report. The form of...Amsterdam Avenue New York, NY 10027-7003 Tel.: 212-939-7012 Fax: 212-666-0140 Email : junfeng@cs.columbia.edu 2. Research Objectives...Multithreaded programs are getting increasingly pervasive and critical. Unfortunately, they remain extremely difficult to write . This difficulty has led to
Distributed Emulation in Support of Large Networks
2016-06-01
Provider LTE Long Term Evolution MB Megabyte MIPS Microprocessor without Interlocked Pipeline Stages MRT Multi-Threaded Routing Toolkit NPS Naval...environment, modifications to a network, protocol, or model can be executed – and the effects measured – without affecting real-world users or services...produce their results when analyzing performance of Long Term Evolution ( LTE ) gateways [3]. Many research scenarios allow problems to be represented
DOE Office of Scientific and Technical Information (OSTI.GOV)
Shin, J; Coss, D; McMurry, J
Purpose: To evaluate the efficiency of multithreaded Geant4 (Geant4-MT, version 10.0) for proton Monte Carlo dose calculations using a high performance computing facility. Methods: Geant4-MT was used to calculate 3D dose distributions in 1×1×1 mm3 voxels in a water phantom and patient's head with a 150 MeV proton beam covering approximately 5×5 cm2 in the water phantom. Three timestamps were measured on the fly to separately analyze the required time for initialization (which cannot be parallelized), processing time of individual threads, and completion time. Scalability of averaged processing time per thread was calculated as a function of thread number (1,more » 100, 150, and 200) for both 1M and 50 M histories. The total memory usage was recorded. Results: Simulations with 50 M histories were fastest with 100 threads, taking approximately 1.3 hours and 6 hours for the water phantom and the CT data, respectively with better than 1.0 % statistical uncertainty. The calculations show 1/N scalability in the event loops for both cases. The gains from parallel calculations started to decrease with 150 threads. The memory usage increases linearly with number of threads. No critical failures were observed during the simulations. Conclusion: Multithreading in Geant4-MT decreased simulation time in proton dose distribution calculations by a factor of 64 and 54 at a near optimal 100 threads for water phantom and patient's data respectively. Further simulations will be done to determine the efficiency at the optimal thread number. Considering the trend of computer architecture development, utilizing Geant4-MT for radiotherapy simulations is an excellent cost-effective alternative for a distributed batch queuing system. However, because the scalability depends highly on simulation details, i.e., the ratio of the processing time of one event versus waiting time to access for the shared event queue, a performance evaluation as described is recommended.« less
A PC-based shutter glasses controller for visual stimulation using multithreading in LabWindows/CVI.
Gramatikov, Ivan; Simons, Kurt; Guyton, David; Gramatikov, Boris
2017-05-01
Amblyopia, commonly known as "lazy eye," is poor vision in an eye from prolonged neurologic suppression. It is a major public health problem, afflicting up to 3.6% of children, and will lead to lifelong visual impairment if not identified and treated in early childhood. Traditional treatment methods, such as occluding or penalizing the good eye with eye patches or blurring eye drops, do not always yield satisfactory results. Newer methods have emerged, based on liquid crystal shutter glasses that intermittently occlude the better eye, or alternately occlude the two eyes, thus stimulating vision in the "lazy" eye. As yet there is no technology that allows easy and efficient optimization of the shuttering characteristics for a given individual. The purpose of this study was to develop an inexpensive, computer-based system to perform liquid crystal shuttering in laboratory and clinical settings to help "wake up" the suppressed eye in amblyopic patients, and to help optimize the individual shuttering parameters such as wave shape, level of transparency/opacity, frequency, and duty cycle of the shuttering. We developed a liquid crystal glasses controller connected by USB cable to a PC computer. It generates the voltage waveforms going to the glasses, and has potentiometer knobs for interactive adjustments by the patient. In order to achieve good timing performance in this bidirectional system, we used multithreading programming techniques with data protection, implemented in LabWindows/CVI. The hardware and software developed were assessed experimentally. We achieved an accuracy of ±1Hz for the frequency, and ±2% for the duty cycle of the occlusion pulses. We consider these values to be satisfactory for the purpose of optimizing the visual stimulation by means of shutter glasses. The system can be used for individual optimization of shuttering attributes by clinicians, for training sessions in clinical settings, or even at home, aimed at stimulating vision in the "lazy" eye. Multithreading offers significant benefits for data acquisition and instrument control, making it possible to implement time-efficient algorithms in inexpensive yet versatile medical instrumentation with only minimum requirements on the hardware. Copyright © 2017 Elsevier B.V. All rights reserved.
Consensus oriented fuzzified decision support for oil spill contingency management.
Liu, Xin; Wirtz, Kai W
2006-06-30
Studies on multi-group multi-criteria decision-making problems for oil spill contingency management are in their infancy. This paper presents a second-order fuzzy comprehensive evaluation (FCE) model to resolve decision-making problems in the area of contingency management after environmental disasters such as oil spills. To assess the performance of different oil combat strategies, second-order FCE allows for the utilization of lexical information, the consideration of ecological and socio-economic criteria and the involvement of a variety of stakeholders. On the other hand, the new approach can be validated by using internal and external checks, which refer to sensitivity tests regarding its internal setups and comparisons with other methods, respectively. Through a case study, the Pallas oil spill in the German Bight in 1998, it is demonstrated that this approach can help decision makers who search for an optimal strategy in multi-thread contingency problems and has a wider application potential in the field of integrated coastal zone management.
A comparison of SuperLU solvers on the intel MIC architecture
NASA Astrophysics Data System (ADS)
Tuncel, Mehmet; Duran, Ahmet; Celebi, M. Serdar; Akaydin, Bora; Topkaya, Figen O.
2016-10-01
In many science and engineering applications, problems may result in solving a sparse linear system AX=B. For example, SuperLU_MCDT, a linear solver, was used for the large penta-diagonal matrices for 2D problems and hepta-diagonal matrices for 3D problems, coming from the incompressible blood flow simulation (see [1]). It is important to test the status and potential improvements of state-of-the-art solvers on new technologies. In this work, sequential, multithreaded and distributed versions of SuperLU solvers (see [2]) are examined on the Intel Xeon Phi coprocessors using offload programming model at the EURORA cluster of CINECA in Italy. We consider a portfolio of test matrices containing patterned matrices from UFMM ([3]) and randomly located matrices. This architecture can benefit from high parallelism and large vectors. We find that the sequential SuperLU benefited up to 45 % performance improvement from the offload programming depending on the sparse matrix type and the size of transferred and processed data.
Fast access to the CMS detector condition data employing HTML5 technologies
NASA Astrophysics Data System (ADS)
Pierro, Giuseppe Antonio; Cavallari, Francesca; Di Guida, Salvatore; Innocente, Vincenzo
2011-12-01
This paper focuses on using HTML version 5 (HTML5) for accessing condition data for the CMS experiment, evaluating the benefits and risks posed by the use of this technology. According to the authors of HTML5, this technology attempts to solve issues found in previous iterations of HTML and addresses the needs of web applications, an area previously not adequately covered by HTML. We demonstrate that employing HTML5 brings important benefits in terms of access performance to the CMS condition data. The combined use of web storage and web sockets allows increasing the performance and reducing the costs in term of computation power, memory usage and network bandwidth for client and server. Above all, the web workers allow creating different scripts that can be executed using multi-thread mode, exploiting multi-core microprocessors. Web workers have been employed in order to substantially decrease the web page rendering time to display the condition data stored in the CMS condition database.
Generalized Symbolic Execution for Model Checking and Testing
NASA Technical Reports Server (NTRS)
Khurshid, Sarfraz; Pasareanu, Corina; Visser, Willem; Kofmeyer, David (Technical Monitor)
2003-01-01
Modern software systems, which often are concurrent and manipulate complex data structures must be extremely reliable. We present a novel framework based on symbolic execution, for automated checking of such systems. We provide a two-fold generalization of traditional symbolic execution based approaches: one, we define a program instrumentation, which enables standard model checkers to perform symbolic execution; two, we give a novel symbolic execution algorithm that handles dynamically allocated structures (e.g., lists and trees), method preconditions (e.g., acyclicity of lists), data (e.g., integers and strings) and concurrency. The program instrumentation enables a model checker to automatically explore program heap configurations (using a systematic treatment of aliasing) and manipulate logical formulae on program data values (using a decision procedure). We illustrate two applications of our framework: checking correctness of multi-threaded programs that take inputs from unbounded domains with complex structure and generation of non-isomorphic test inputs that satisfy a testing criterion. Our implementation for Java uses the Java PathFinder model checker.
Jali - Unstructured Mesh Infrastructure for Multi-Physics Applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Garimella, Rao V; Berndt, Markus; Coon, Ethan
2017-04-13
Jali is a parallel unstructured mesh infrastructure library designed for use by multi-physics simulations. It supports 2D and 3D arbitrary polyhedral meshes distributed over hundreds to thousands of nodes. Jali can read write Exodus II meshes along with fields and sets on the mesh and support for other formats is partially implemented or is (https://github.com/MeshToolkit/MSTK), an open source general purpose unstructured mesh infrastructure library from Los Alamos National Laboratory. While it has been made to work with other mesh frameworks such as MOAB and STKmesh in the past, support for maintaining the interface to these frameworks has been suspended formore » now. Jali supports distributed as well as on-node parallelism. Support of on-node parallelism is through direct use of the the mesh in multi-threaded constructs or through the use of "tiles" which are submeshes or sub-partitions of a partition destined for a compute node.« less
A Massively Parallel Computational Method of Reading Index Files for SOAPsnv.
Zhu, Xiaoqian; Peng, Shaoliang; Liu, Shaojie; Cui, Yingbo; Gu, Xiang; Gao, Ming; Fang, Lin; Fang, Xiaodong
2015-12-01
SOAPsnv is the software used for identifying the single nucleotide variation in cancer genes. However, its performance is yet to match the massive amount of data to be processed. Experiments reveal that the main performance bottleneck of SOAPsnv software is the pileup algorithm. The original pileup algorithm's I/O process is time-consuming and inefficient to read input files. Moreover, the scalability of the pileup algorithm is also poor. Therefore, we designed a new algorithm, named BamPileup, aiming to improve the performance of sequential read, and the new pileup algorithm implemented a parallel read mode based on index. Using this method, each thread can directly read the data start from a specific position. The results of experiments on the Tianhe-2 supercomputer show that, when reading data in a multi-threaded parallel I/O way, the processing time of algorithm is reduced to 3.9 s and the application program can achieve a speedup up to 100×. Moreover, the scalability of the new algorithm is also satisfying.
NASA Astrophysics Data System (ADS)
Huang, Hong-bin; Liu, Wei-ping; Chen, Shun-er; Zheng, Liming
2005-02-01
A new type of CATV network management system developed by universal MCU, which supports SNMP, is proposed in this paper. From the point of view in both hardware and software, the function and method of every modules inside the system, which include communications in the physical layer, protocol process, data process, and etc, are analyzed. In our design, the management system takes IP MAN as data transmission channel and every controlled object in the management structure has a SNMP agent. In the SNMP agent developed, there are four function modules, including physical layer communication module, protocol process module, internal data process module and MIB management module. In the paper, the structure and function of every module are designed and demonstrated while the related hardware circuit, software flow as well as the experimental results are tested. Furthermore, by introducing RTOS into the software programming, the universal MCU procedure can conducts such multi-thread management as fast Ethernet controller driving, TCP/IP process, serial port signal monitoring and so on, which greatly improves efficiency of CPU.
Simulation of DKIST solar adaptive optics system
NASA Astrophysics Data System (ADS)
Marino, Jose; Carlisle, Elizabeth; Schmidt, Dirk
2016-07-01
Solar adaptive optics (AO) simulations are a valuable tool to guide the design and optimization process of current and future solar AO and multi-conjugate AO (MCAO) systems. Solar AO and MCAO systems rely on extended object cross-correlating Shack-Hartmann wavefront sensors to measure the wavefront. Accurate solar AO simulations require computationally intensive operations, which have until recently presented a prohibitive computational cost. We present an update on the status of a solar AO and MCAO simulation tool being developed at the National Solar Observatory. The simulation tool is a multi-threaded application written in the C++ language that takes advantage of current large multi-core CPU computer systems and fast ethernet connections to provide accurate full simulation of solar AO and MCAO systems. It interfaces with KAOS, a state of the art solar AO control software developed by the Kiepenheuer-Institut fuer Sonnenphysik, that provides reliable AO control. We report on the latest results produced by the solar AO simulation tool.
A Multithreaded Missions And Means Framework (MMF) Concept Report
2012-03-01
Vasconcelos , W.; Gibson, C.; Bar-Noy, A.; Borowiecki, K.; La Porta, T.; Pizzocaro, D.; Rowaihy, H.; Pearson, G.; Pham, T. An Ontology Centric...M.; de Mel, G.; Vasconcelos , W.; Sleeman, D.; Colley, S.; La Porta, T. An Ontology-Based Approach to Sensor-Mission Assignment. Proceedings of the...1st Annual Conference of the International Technology Alliance (ACITA 2007), 2007. Preece, A.; Gomez, M.; de Mel, G.; Vasconcelos , W.; Sleeman, D
Systematic and Scalable Testing of Concurrent Programs
2013-12-16
The evaluation of CHESS [107] checked eight different programs ranging from process management libraries to a distributed execution engine to a research...tool (§3.1) targets systematic testing of scheduling nondeterminism in multi- threaded components of the Omega cluster management system [129], while...tool for systematic testing of multithreaded com- ponents of the Omega cluster management system [129]. In particular, §3.1.1 defines a model for
Development of an Autonomous Navigation Technology Test Vehicle
2004-08-01
as an independent thread on processors using the Linux operating system. The computer hardware selected for the nodes that host the MRS threads...communications system design. Linux was chosen as the operating system for all of the single board computers used on the Mule. Linux was specifically...used for system analysis and development. The simple realization of multi-thread processing and inter-process communications in Linux made it a
NASA Astrophysics Data System (ADS)
Sorokin, V. A.; Volkov, Yu V.; Sherstneva, A. I.; Botygin, I. A.
2016-11-01
This paper overviews a method of generating climate regions based on an analytic signal theory. When applied to atmospheric surface layer temperature data sets, the method allows forming climatic structures with the corresponding changes in the temperature to make conclusions on the uniformity of climate in an area and to trace the climate changes in time by analyzing the type group shifts. The algorithm is based on the fact that the frequency spectrum of the thermal oscillation process is narrow-banded and has only one mode for most weather stations. This allows using the analytic signal theory, causality conditions and introducing an oscillation phase. The annual component of the phase, being a linear function, was removed by the least squares method. The remaining phase fluctuations allow consistent studying of their coordinated behavior and timing, using the Pearson correlation coefficient for dependence evaluation. This study includes program experiments to evaluate the calculation efficiency in the phase grouping task. The paper also overviews some single-threaded and multi-threaded computing models. It is shown that the phase grouping algorithm for meteorological data can be parallelized and that a multi-threaded implementation leads to a 25-30% increase in the performance.
In-Memory Graph Databases for Web-Scale Data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Castellana, Vito G.; Morari, Alessandro; Weaver, Jesse R.
RDF databases have emerged as one of the most relevant way for organizing, integrating, and managing expo- nentially growing, often heterogeneous, and not rigidly structured data for a variety of scientific and commercial fields. In this paper we discuss the solutions integrated in GEMS (Graph database Engine for Multithreaded Systems), a software framework for implementing RDF databases on commodity, distributed-memory high-performance clusters. Unlike the majority of current RDF databases, GEMS has been designed from the ground up to primarily employ graph-based methods. This is reflected in all the layers of its stack. The GEMS framework is composed of: a SPARQL-to-C++more » compiler, a library of data structures and related methods to access and modify them, and a custom runtime providing lightweight software multithreading, network messages aggregation and a partitioned global address space. We provide an overview of the framework, detailing its component and how they have been closely designed and customized to address issues of graph methods applied to large-scale datasets on clusters. We discuss in details the principles that enable automatic translation of the queries (expressed in SPARQL, the query language of choice for RDF databases) to graph methods, and identify differences with respect to other RDF databases.« less
A Registration Method Based on Contour Point Cloud for 3D Whole-Body PET and CT Images
Yang, Qiyao; Wang, Zhiguo; Zhang, Guoxu
2017-01-01
The PET and CT fusion image, combining the anatomical and functional information, has important clinical meaning. An effective registration of PET and CT images is the basis of image fusion. This paper presents a multithread registration method based on contour point cloud for 3D whole-body PET and CT images. Firstly, a geometric feature-based segmentation (GFS) method and a dynamic threshold denoising (DTD) method are creatively proposed to preprocess CT and PET images, respectively. Next, a new automated trunk slices extraction method is presented for extracting feature point clouds. Finally, the multithread Iterative Closet Point is adopted to drive an affine transform. We compare our method with a multiresolution registration method based on Mattes Mutual Information on 13 pairs (246~286 slices per pair) of 3D whole-body PET and CT data. Experimental results demonstrate the registration effectiveness of our method with lower negative normalization correlation (NC = −0.933) on feature images and less Euclidean distance error (ED = 2.826) on landmark points, outperforming the source data (NC = −0.496, ED = 25.847) and the compared method (NC = −0.614, ED = 16.085). Moreover, our method is about ten times faster than the compared one. PMID:28316979
Presentation of a large amount of moving objects in a virtual environment
NASA Astrophysics Data System (ADS)
Ye, Huanzhuo; Gong, Jianya; Ye, Jing
2004-05-01
It needs a lot of consideration to manage the presentation of a large amount of moving objects in virtual environment. Motion state model (MSM) is used to represent the motion of objects and 2n tree is used to index the motion data stored in database or files. To minimize the necessary memory occupation for static models, cache with LRU or FIFO refreshing is introduced. DCT and wavelet work well with different playback speeds of motion presentation because they can filter low frequencies from motion data and adjust the filter according to playback speed. Since large amount of data are continuously retrieved, calculated, used for displaying, and then discarded, multithreading technology is naturally employed though single thread with carefully arranged data retrieval also works well when the number of objects is not very big. With multithreading, the level of concurrence should be placed at data retrieval, where waiting may occur, rather than at calculating or displaying, and synchronization should be carefully arranged to make sure that different threads can collaborate well. Collision detection is not needed when playing with history data and sampled current data; however, it is necessary for spatial state prediction. When the current state is presented, either predicting-adjusting method or late updating method could be used according to the users' preference.
NASA Astrophysics Data System (ADS)
Dean, David J.; Schmidt, John C.
2011-03-01
Over the last century, large-scale water development of the upper Rio Grande in the U.S. and Mexico, and of the Rio Conchos in Mexico, has resulted in progressive channel narrowing of the lower Rio Grande in the Big Bend region. We used methods operating at multiple spatial and temporal scales to analyze the rate, magnitude, and processes responsible for channel narrowing. These methods included: hydrologic analysis of historic stream gage data, analysis of notes of measured discharges, historic oblique and aerial photograph analysis, and stratigraphic and dendrogeomorphic analysis of inset floodplain deposits. Our analyses indicate that frequent large floods between 1900 and the mid-1940s acted as a negative feedback mechanism and maintained a wide, sandy, multi-threaded river. Declines in mean and peak flow in the mid-1940s resulted in progressive channel narrowing. Channel narrowing has been temporarily interrupted by occasional large floods that widened the channel, however, channel narrowing has always resumed. After large floods in 1990 and 1991, the active channel width of the lower Rio Grande has narrowed by 36-52%. Narrowing has occurred by the vertical accretion of fine-grained deposits on top of sand and gravel bars, inset within natural levees. Channel narrowing by vertical accretion occurred simultaneously with a rapid invasion of non-native riparian vegetation ( Tamarix spp., Arundo donax) which created a positive feedback and exacerbated the processes of channel narrowing and vertical accretion. In two floodplain trenches, we measured 2.75 and 3.5 m of vertical accretion between 1993 and 2008. In some localities, nearly 90% of bare, active channel bars were converted to vegetated floodplain during the same period. Upward shifts of stage-discharge relations occurred resulting in over-bank flooding at lower discharges, and continued vertical accretion despite a progressive reduction in stream flow. Thus, although the magnitude of the average annual flood was reduced between 40 and 50%, over-bank flooding continued. These changes reflect a shift in the geomorphic nature of the Rio Grande from a wide, laterally unstable, multi-thread river, to a laterally stable, single-thread channel with cohesive, vertical banks, and few active in-channel bars.
NASA Technical Reports Server (NTRS)
Rede, Leonard J.; Booth, Andrew; Hsieh, Jonathon; Summer, Kellee
2004-01-01
This paper presents a discussion of the evolution of a sequencer from a simple EPICS (Experimental Physics and Industrial Control System) based sequencer into a complex implementation designed utilizing UML (Unified Modeling Language) methodologies and a CASE (Computer Aided Software Engineering) tool approach. The main purpose of the sequencer (called the IF Sequencer) is to provide overall control of the Keck Interferometer to enable science operations be carried out by a single operator (and/or observer). The interferometer links the two 10m telescopes of the W. M. Keck Observatory at Mauna Kea, Hawaii. The IF Sequencer is a high-level, multi-threaded, Hare1 finite state machine, software program designed to orchestrate several lower-level hardware and software hard real time subsystems that must perform their work in a specific and sequential order. The sequencing need not be done in hard real-time. Each state machine thread commands either a high-speed real-time multiple mode embedded controller via CORB A, or slower controllers via EPICS Channel Access interfaces. The overall operation of the system is simplified by the automation. The UML is discussed and our use of it to implement the sequencer is presented. The decision to use the Rhapsody product as our CASE tool is explained and reflected upon. Most importantly, a section on lessons learned is presented and the difficulty of integrating CASE tool automatically generated C++ code into a large control system consisting of multiple infrastructures is presented.
NASA Astrophysics Data System (ADS)
Reder, Leonard J.; Booth, Andrew; Hsieh, Jonathan; Summers, Kellee R.
2004-09-01
This paper presents a discussion of the evolution of a sequencer from a simple Experimental Physics and Industrial Control System (EPICS) based sequencer into a complex implementation designed utilizing UML (Unified Modeling Language) methodologies and a Computer Aided Software Engineering (CASE) tool approach. The main purpose of the Interferometer Sequencer (called the IF Sequencer) is to provide overall control of the Keck Interferometer to enable science operations to be carried out by a single operator (and/or observer). The interferometer links the two 10m telescopes of the W. M. Keck Observatory at Mauna Kea, Hawaii. The IF Sequencer is a high-level, multi-threaded, Harel finite state machine software program designed to orchestrate several lower-level hardware and software hard real-time subsystems that must perform their work in a specific and sequential order. The sequencing need not be done in hard real-time. Each state machine thread commands either a high-speed real-time multiple mode embedded controller via CORBA, or slower controllers via EPICS Channel Access interfaces. The overall operation of the system is simplified by the automation. The UML is discussed and our use of it to implement the sequencer is presented. The decision to use the Rhapsody product as our CASE tool is explained and reflected upon. Most importantly, a section on lessons learned is presented and the difficulty of integrating CASE tool automatically generated C++ code into a large control system consisting of multiple infrastructures is presented.
NASA Astrophysics Data System (ADS)
Handhika, T.; Bustamam, A.; Ernastuti, Kerami, D.
2017-07-01
Multi-thread programming using OpenMP on the shared-memory architecture with hyperthreading technology allows the resource to be accessed by multiple processors simultaneously. Each processor can execute more than one thread for a certain period of time. However, its speedup depends on the ability of the processor to execute threads in limited quantities, especially the sequential algorithm which contains a nested loop. The number of the outer loop iterations is greater than the maximum number of threads that can be executed by a processor. The thread distribution technique that had been found previously only be applied by the high-level programmer. This paper generates a parallelization procedure for low-level programmer in dealing with 2-level nested loop problems with the maximum number of threads that can be executed by a processor is smaller than the number of the outer loop iterations. Data preprocessing which is related to the number of the outer loop and the inner loop iterations, the computational time required to execute each iteration and the maximum number of threads that can be executed by a processor are used as a strategy to determine which parallel region that will produce optimal speedup.
fast_protein_cluster: parallel and optimized clustering of large-scale protein modeling data.
Hung, Ling-Hong; Samudrala, Ram
2014-06-15
fast_protein_cluster is a fast, parallel and memory efficient package used to cluster 60 000 sets of protein models (with up to 550 000 models per set) generated by the Nutritious Rice for the World project. fast_protein_cluster is an optimized and extensible toolkit that supports Root Mean Square Deviation after optimal superposition (RMSD) and Template Modeling score (TM-score) as metrics. RMSD calculations using a laptop CPU are 60× faster than qcprot and 3× faster than current graphics processing unit (GPU) implementations. New GPU code further increases the speed of RMSD and TM-score calculations. fast_protein_cluster provides novel k-means and hierarchical clustering methods that are up to 250× and 2000× faster, respectively, than Clusco, and identify significantly more accurate models than Spicker and Clusco. fast_protein_cluster is written in C++ using OpenMP for multi-threading support. Custom streaming Single Instruction Multiple Data (SIMD) extensions and advanced vector extension intrinsics code accelerate CPU calculations, and OpenCL kernels support AMD and Nvidia GPUs. fast_protein_cluster is available under the M.I.T. license. (http://software.compbio.washington.edu/fast_protein_cluster) © The Author 2014. Published by Oxford University Press.
Agent Based Software for the Autonomous Control of Formation Flying Spacecraft
NASA Technical Reports Server (NTRS)
How, Jonathan P.; Campbell, Mark; Dennehy, Neil (Technical Monitor)
2003-01-01
Distributed satellite systems is an enabling technology for many future NASA/DoD earth and space science missions, such as MMS, MAXIM, Leonardo, and LISA [1, 2, 3]. While formation flying offers significant science benefits, to reduce the operating costs for these missions it will be essential that these multiple vehicles effectively act as a single spacecraft by performing coordinated observations. Autonomous guidance, navigation, and control as part of a coordinated fleet-autonomy is a key technology that will help accomplish this complex goal. This is no small task, as most current space missions require significant input from the ground for even relatively simple decisions such as thruster burns. Work for the NMP DS1 mission focused on the development of the New Millennium Remote Agent (NMRA) architecture for autonomous spacecraft control systems. NMRA integrates traditional real-time monitoring and control with components for constraint-based planning, robust multi-threaded execution, and model-based diagnosis and reconfiguration. The complexity of using an autonomous approach for space flight software was evident when most of its capabilities were stripped off prior to launch (although more capability was uplinked subsequently, and the resulting demonstration was very successful).
Real-Time Data Streaming and Storing Structure for the LHD's Fusion Plasma Experiments
NASA Astrophysics Data System (ADS)
Nakanishi, Hideya; Ohsuna, Masaki; Kojima, Mamoru; Imazu, Setsuo; Nonomura, Miki; Emoto, Masahiko; Yoshida, Masanobu; Iwata, Chie; Ida, Katsumi
2016-02-01
The LHD data acquisition and archiving system, i.e., LABCOM system, has been fully equipped with high-speed real-time acquisition, streaming, and storage capabilities. To deal with more than 100 MB/s continuously generated data at each data acquisition (DAQ) node, DAQ tasks have been implemented as multitasking and multithreaded ones in which the shared memory plays the most important role for inter-process fast and massive data handling. By introducing a 10-second time chunk named “subshot,” endless data streams can be stored into a consecutive series of fixed length data blocks so that they will soon become readable by other processes even while the write process is continuing. Real-time device and environmental monitoring are also implemented in the same way with further sparse resampling. The central data storage has been separated into two layers to be capable of receiving multiple 100 MB/s inflows in parallel. For the frontend layer, high-speed SSD arrays are used as the GlusterFS distributed filesystem which can provide max. 2 GB/s throughput. Those design optimizations would be informative for implementing the next-generation data archiving system in big physics, such as ITER.
Porting and refurbishment of the WSS TNG control software
NASA Astrophysics Data System (ADS)
Caproni, Alessandro; Zacchei, Andrea; Vuerli, Claudio; Pucillo, Mauro
2004-09-01
The Workstation Software Sytem (WSS) is the high level control software of the Italian Galileo Galilei Telescope settled in La Palma Canary Island developed at the beginning of '90 for HP-UX workstations. WSS may be seen as a middle layer software system that manages the communications between the real time systems (VME), different workstations and high level applications providing a uniform distributed environment. The project to port the control software from the HP workstation to Linux environment started at the end of 2001. It is aimed to refurbish the control software introducing some of the new software technologies and languages, available for free in the Linux operating system. The project was realized by gradually substituting each HP workstation with a Linux PC with the goal to avoid main changes in the original software running under HP-UX. Three main phases characterized the project: creation of a simulated control room with several Linux PCs running WSS (to check all the functionality); insertion in the simulated control room of some HPs (to check the mixed environment); substitution of HP workstation in the real control room. From a software point of view, the project introduces some new technologies, like multi-threading, and the possibility to develop high level WSS applications with almost every programming language that implements the Berkley sockets. A library to develop java applications has also been created and tested.
The development of data acquisition and processing application system for RF ion source
NASA Astrophysics Data System (ADS)
Zhang, Xiaodan; Wang, Xiaoying; Hu, Chundong; Jiang, Caichao; Xie, Yahong; Zhao, Yuanzhe
2017-07-01
As the key ion source component of nuclear fusion auxiliary heating devices, the radio frequency (RF) ion source is developed and applied gradually to offer a source plasma with the advantages of ease of control and high reliability. In addition, it easily achieves long-pulse steady-state operation. During the process of the development and testing of the RF ion source, a lot of original experimental data will be generated. Therefore, it is necessary to develop a stable and reliable computer data acquisition and processing application system for realizing the functions of data acquisition, storage, access, and real-time monitoring. In this paper, the development of a data acquisition and processing application system for the RF ion source is presented. The hardware platform is based on the PXI system and the software is programmed on the LabVIEW development environment. The key technologies that are used for the implementation of this software programming mainly include the long-pulse data acquisition technology, multi-threading processing technology, transmission control communication protocol, and the Lempel-Ziv-Oberhumer data compression algorithm. Now, this design has been tested and applied on the RF ion source. The test results show that it can work reliably and steadily. With the help of this design, the stable plasma discharge data of the RF ion source are collected, stored, accessed, and monitored in real-time. It is shown that it has a very practical application significance for the RF experiments.
A New Generation of Real-Time Systems in the JET Tokamak
NASA Astrophysics Data System (ADS)
Alves, Diogo; Neto, Andre C.; Valcarcel, Daniel F.; Felton, Robert; Lopez, Juan M.; Barbalace, Antonio; Boncagni, Luca; Card, Peter; De Tommasi, Gianmaria; Goodyear, Alex; Jachmich, Stefan; Lomas, Peter J.; Maviglia, Francesco; McCullen, Paul; Murari, Andrea; Rainford, Mark; Reux, Cedric; Rimini, Fernanda; Sartori, Filippo; Stephen, Adam V.; Vega, Jesus; Vitelli, Riccardo; Zabeo, Luca; Zastrow, Klaus-Dieter
2014-04-01
Recently, a new recipe for developing and deploying real-time systems has become increasingly adopted in the JET tokamak. Powered by the advent of x86 multi-core technology and the reliability of JET's well established Real-Time Data Network (RTDN) to handle all real-time I/O, an official Linux vanilla kernel has been demonstrated to be able to provide real-time performance to user-space applications that are required to meet stringent timing constraints. In particular, a careful rearrangement of the Interrupt ReQuests' (IRQs) affinities together with the kernel's CPU isolation mechanism allows one to obtain either soft or hard real-time behavior depending on the synchronization mechanism adopted. Finally, the Multithreaded Application Real-Time executor (MARTe) framework is used for building applications particularly optimised for exploring multi-core architectures. In the past year, four new systems based on this philosophy have been installed and are now part of JET's routine operation. The focus of the present work is on the configuration aspects that enable these new systems' real-time capability. Details are given about the common real-time configuration of these systems, followed by a brief description of each system together with results regarding their real-time performance. A cycle time jitter analysis of a user-space MARTe based application synchronizing over a network is also presented. The goal is to compare its deterministic performance while running on a vanilla and on a Messaging Real time Grid (MRG) Linux kernel.
Distributed-Memory Breadth-First Search on Massive Graphs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Buluc, Aydin; Beamer, Scott; Madduri, Kamesh
This chapter studies the problem of traversing large graphs using the breadth-first search order on distributed-memory supercomputers. We consider both the traditional level-synchronous top-down algorithm as well as the recently discovered direction optimizing algorithm. We analyze the performance and scalability trade-offs in using different local data structures such as CSR and DCSC, enabling in-node multithreading, and graph decompositions such as 1D and 2D decomposition.
Cost Computations for Cyber Fighter Associate
2015-05-01
associate. Aberdeen Proving Ground (MD): Army Research Laboratory (US); in press. 2 Harman D, Brown S, Henz B, Marvel LM. A communication protocol... Harman , et al.2 A specific class called ListenThread was created for multithreaded listeners. When ListenThread is instantiated, it is passed a given...2. Harman D, Brown S, Henz B, Marvel LM. A communication protocol for CyAMS and the cyber associate interface. Aberdeen Proving Ground (MD): US Army
Thread mapping using system-level model for shared memory multicores
NASA Astrophysics Data System (ADS)
Mitra, Reshmi
Exploring thread-to-core mapping options for a parallel application on a multicore architecture is computationally very expensive. For the same algorithm, the mapping strategy (MS) with the best response time may change with data size and thread counts. The primary challenge is to design a fast, accurate and automatic framework for exploring these MSs for large data-intensive applications. This is to ensure that the users can explore the design space within reasonable machine hours, without thorough understanding on how the code interacts with the platform. Response time is related to the cycles per instructions retired (CPI), taking into account both active and sleep states of the pipeline. This work establishes a hybrid approach, based on Markov Chain Model (MCM) and Model Tree (MT) for system-level steady state CPI prediction. It is designed for shared memory multicore processors with coarse-grained multithreading. The thread status is represented by the MCM states. The program characteristics are modeled as the transition probabilities, representing the system moving between active and suspended thread states. The MT model extrapolates these probabilities for the actual application size (AS) from the smaller AS performance. This aspect of the framework, along with, the use of mathematical expressions for the actual AS performance information, results in a tremendous reduction in the CPI prediction time. The framework is validated using an electromagnetics application. The average performance prediction error for steady state CPI results with 12 different MSs is less than 1%. The total run time of model is of the order of minutes, whereas the actual application execution time is in terms of days.
Implementation of the ATLAS trigger within the multi-threaded software framework AthenaMT
NASA Astrophysics Data System (ADS)
Wynne, Ben; ATLAS Collaboration
2017-10-01
We present an implementation of the ATLAS High Level Trigger, HLT, that provides parallel execution of trigger algorithms within the ATLAS multithreaded software framework, AthenaMT. This development will enable the ATLAS HLT to meet future challenges due to the evolution of computing hardware and upgrades of the Large Hadron Collider, LHC, and ATLAS Detector. During the LHC data-taking period starting in 2021, luminosity will reach up to three times the original design value. Luminosity will increase further, to up to 7.5 times the design value, in 2026 following LHC and ATLAS upgrades. This includes an upgrade of the ATLAS trigger architecture that will result in an increase in the HLT input rate by a factor of 4 to 10 compared to the current maximum rate of 100 kHz. The current ATLAS multiprocess framework, AthenaMP, manages a number of processes that each execute algorithms sequentially for different events. AthenaMT will provide a fully multi-threaded environment that will additionally enable concurrent execution of algorithms within an event. This has the potential to significantly reduce the memory footprint on future manycore devices. An additional benefit of the HLT implementation within AthenaMT is that it facilitates the integration of offline code into the HLT. The trigger must retain high rejection in the face of increasing numbers of pileup collisions. This will be achieved by greater use of offline algorithms that are designed to maximize the discrimination of signal from background. Therefore a unification of the HLT and offline reconstruction software environment is required. This has been achieved while at the same time retaining important HLT-specific optimisations that minimize the computation performed to reach a trigger decision. Such optimizations include early event rejection and reconstruction within restricted geometrical regions. We report on an HLT prototype in which the need for HLT-specific components has been reduced to a minimum. Promising results have been obtained with a prototype that includes the key elements of trigger functionality including regional reconstruction and early event rejection. We report on the first experience of migrating trigger selections to this new framework and present the next steps towards a full implementation of the ATLAS trigger.
Accelerating semantic graph databases on commodity clusters
DOE Office of Scientific and Technical Information (OSTI.GOV)
Morari, Alessandro; Castellana, Vito G.; Haglin, David J.
We are developing a full software system for accelerating semantic graph databases on commodity cluster that scales to hundreds of nodes while maintaining constant query throughput. Our framework comprises a SPARQL to C++ compiler, a library of parallel graph methods and a custom multithreaded runtime layer, which provides a Partitioned Global Address Space (PGAS) programming model with fork/join parallelism and automatic load balancing over a commodity clusters. We present preliminary results for the compiler and for the runtime.
Hardware based redundant multi-threading inside a GPU for improved reliability
Sridharan, Vilas; Gurumurthi, Sudhanva
2015-05-05
A system and method for verifying computation output using computer hardware are provided. Instances of computation are generated and processed on hardware-based processors. As instances of computation are processed, each instance of computation receives a load accessible to other instances of computation. Instances of output are generated by processing the instances of computation. The instances of output are verified against each other in a hardware based processor to ensure accuracy of the output.
ImageJ: Image processing and analysis in Java
NASA Astrophysics Data System (ADS)
Rasband, W. S.
2012-06-01
ImageJ is a public domain Java image processing program inspired by NIH Image. It can display, edit, analyze, process, save and print 8-bit, 16-bit and 32-bit images. It can read many image formats including TIFF, GIF, JPEG, BMP, DICOM, FITS and "raw". It supports "stacks", a series of images that share a single window. It is multithreaded, so time-consuming operations such as image file reading can be performed in parallel with other operations.
Qin, J; Choi, K S; Ho, Simon S M; Heng, P A
2008-01-01
A force prediction algorithm is proposed to facilitate virtual-reality (VR) based collaborative surgical simulation by reducing the effect of network latencies. State regeneration is used to correct the estimated prediction. This algorithm is incorporated into an adaptive transmission protocol in which auxiliary features such as view synchronization and coupling control are equipped to ensure the system consistency. We implemented this protocol using multi-threaded technique on a cluster-based network architecture.
Model Checking with Multi-Threaded IC3 Portfolios
2015-01-15
different runs varies randomly depending on the thread interleaving. The use of a portfolio of solvers to maximize the likelihood of a quick solution is...empirically show (cf. Sec. 5.2) that the predictions based on this formula have high accuracy. Note that each solver in the portfolio potentially searches...speedup of over 300. We also show that widening the proof search of ic3 by randomizing its SAT solver is not as effective as paral- lelization
Maia, Julio Daniel Carvalho; Urquiza Carvalho, Gabriel Aires; Mangueira, Carlos Peixoto; Santana, Sidney Ramos; Cabral, Lucidio Anjos Formiga; Rocha, Gerd B
2012-09-11
In this study, we present some modifications in the semiempirical quantum chemistry MOPAC2009 code that accelerate single-point energy calculations (1SCF) of medium-size (up to 2500 atoms) molecular systems using GPU coprocessors and multithreaded shared-memory CPUs. Our modifications consisted of using a combination of highly optimized linear algebra libraries for both CPU (LAPACK and BLAS from Intel MKL) and GPU (MAGMA and CUBLAS) to hasten time-consuming parts of MOPAC such as the pseudodiagonalization, full diagonalization, and density matrix assembling. We have shown that it is possible to obtain large speedups just by using CPU serial linear algebra libraries in the MOPAC code. As a special case, we show a speedup of up to 14 times for a methanol simulation box containing 2400 atoms and 4800 basis functions, with even greater gains in performance when using multithreaded CPUs (2.1 times in relation to the single-threaded CPU code using linear algebra libraries) and GPUs (3.8 times). This degree of acceleration opens new perspectives for modeling larger structures which appear in inorganic chemistry (such as zeolites and MOFs), biochemistry (such as polysaccharides, small proteins, and DNA fragments), and materials science (such as nanotubes and fullerenes). In addition, we believe that this parallel (GPU-GPU) MOPAC code will make it feasible to use semiempirical methods in lengthy molecular simulations using both hybrid QM/MM and QM/QM potentials.
NASA Astrophysics Data System (ADS)
Kazantsev, Daniil; Pickalov, Valery; Nagella, Srikanth; Pasca, Edoardo; Withers, Philip J.
2018-01-01
In the field of computerized tomographic imaging, many novel reconstruction techniques are routinely tested using simplistic numerical phantoms, e.g. the well-known Shepp-Logan phantom. These phantoms cannot sufficiently cover the broad spectrum of applications in CT imaging where, for instance, smooth or piecewise-smooth 3D objects are common. TomoPhantom provides quick access to an external library of modular analytical 2D/3D phantoms with temporal extensions. In TomoPhantom, quite complex phantoms can be built using additive combinations of geometrical objects, such as, Gaussians, parabolas, cones, ellipses, rectangles and volumetric extensions of them. Newly designed phantoms are better suited for benchmarking and testing of different image processing techniques. Specifically, tomographic reconstruction algorithms which employ 2D and 3D scanning geometries, can be rigorously analyzed using the software. TomoPhantom also provides a capability of obtaining analytical tomographic projections which further extends the applicability of software towards more realistic, free from the "inverse crime" testing. All core modules of the package are written in the C-OpenMP language and wrappers for Python and MATLAB are provided to enable easy access. Due to C-based multi-threaded implementation, volumetric phantoms of high spatial resolution can be obtained with computational efficiency.
NASA Technical Reports Server (NTRS)
Dongarra, Jack
1998-01-01
This exploratory study initiated our inquiry into algorithms and applications that would benefit by latency tolerant approach to algorithm building, including the construction of new algorithms where appropriate. In a multithreaded execution, when a processor reaches a point where remote memory access is necessary, the request is sent out on the network and a context--switch occurs to a new thread of computation. This effectively masks a long and unpredictable latency due to remote loads, thereby providing tolerance to remote access latency. We began to develop standards to profile various algorithm and application parameters, such as the degree of parallelism, granularity, precision, instruction set mix, interprocessor communication, latency etc. These tools will continue to develop and evolve as the Information Power Grid environment matures. To provide a richer context for this research, the project also focused on issues of fault-tolerance and computation migration of numerical algorithms and software. During the initial phase we tried to increase our understanding of the bottlenecks in single processor performance. Our work began by developing an approach for the automatic generation and optimization of numerical software for processors with deep memory hierarchies and pipelined functional units. Based on the results we achieved in this study we are planning to study other architectures of interest, including development of cost models, and developing code generators appropriate to these architectures.
Towards an agent-oriented programming language based on Scala
NASA Astrophysics Data System (ADS)
Mitrović, Dejan; Ivanović, Mirjana; Budimac, Zoran
2012-09-01
Scala and its multi-threaded model based on actors represent an excellent framework for developing purely reactive agents. This paper presents an early research on extending Scala with declarative programming constructs, which would result in a new agent-oriented programming language suitable for developing more advanced, BDI agent architectures. The main advantage the new language over many other existing solutions for programming BDI agents is a natural and straightforward integration of imperative and declarative programming constructs, fitted under a single development framework.
Analysis of Effects of Sensor Multithreading to Generate Local System Event Timelines
2014-03-27
works on logs highlights the importance of logs [17, 18]. The two aforementioned works both reference the same 2009 Data Breach Investigations Report...the data breaches report on, the logs contained evidence of events leading up to 82% of those data breaches . This means that preventing 82% of the data ...report states that of the data breaches reported on, the logs contained evidence of events leading up to 66% of those data breaches . • The 2010 DBIR
ParDRe: faster parallel duplicated reads removal tool for sequencing studies.
González-Domínguez, Jorge; Schmidt, Bertil
2016-05-15
Current next generation sequencing technologies often generate duplicated or near-duplicated reads that (depending on the application scenario) do not provide any interesting biological information but can increase memory requirements and computational time of downstream analysis. In this work we present ParDRe, a de novo parallel tool to remove duplicated and near-duplicated reads through the clustering of Single-End or Paired-End sequences from fasta or fastq files. It uses a novel bitwise approach to compare the suffixes of DNA strings and employs hybrid MPI/multithreading to reduce runtime on multicore systems. We show that ParDRe is up to 27.29 times faster than Fulcrum (a representative state-of-the-art tool) on a platform with two 8-core Sandy-Bridge processors. Source code in C ++ and MPI running on Linux systems as well as a reference manual are available at https://sourceforge.net/projects/pardre/ jgonzalezd@udc.es. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Chen, Xiaojun; Cheng, Jun; Gu, Xin; Sun, Yi; Politis, Constantinus
2016-04-01
Preoperative planning is of great importance for transforaminal endoscopic techniques applied in percutaneous endoscopic lumbar discectomy. In this study, a modular preoperative planning software for transforaminal endoscopic surgery was developed and demonstrated. The path searching method is based on collision detection, and the oriented bounding box was constructed for the anatomical models. Then, image reformatting algorithms were developed for multiplanar reconstruction which provides detailed anatomical information surrounding the virtual planned path. Finally, multithread technique was implemented to realize the steady-state condition of the software. A preoperative planning software for transforaminal endoscopic surgery (TE-Guider) was developed; seven cases of patients with symptomatic lumbar disc herniations were planned preoperatively using TE-Guider. The distances to the midlines and the direction of the optimal paths were exported, and each result was in line with the empirical value. TE-Guider provides an efficient and cost-effective way to search the ideal path and entry point for the puncture. However, more clinical cases will be conducted to demonstrate its feasibility and reliability.
Situation exploration in a persistent surveillance system with multidimensional data
NASA Astrophysics Data System (ADS)
Habibi, Mohammad S.
2013-03-01
There is an emerging need for fusing hard and soft sensor data in an efficient surveillance system to provide accurate estimation of situation awareness. These mostly abstract, multi-dimensional and multi-sensor data pose a great challenge to the user in performing analysis of multi-threaded events efficiently and cohesively. To address this concern an interactive Visual Analytics (VA) application is developed for rapid assessment and evaluation of different hypotheses based on context-sensitive ontology spawn from taxonomies describing human/human and human/vehicle/object interactions. A methodology is described here for generating relevant ontology in a Persistent Surveillance System (PSS) and demonstrates how they can be utilized in the context of PSS to track and identify group activities pertaining to potential threats. The proposed VA system allows for visual analysis of raw data as well as metadata that have spatiotemporal representation and content-based implications. Additionally in this paper, a technique for rapid search of tagged information contingent to ranking and confidence is explained for analysis of multi-dimensional data. Lastly the issue of uncertainty associated with processing and interpretation of heterogeneous data is also addressed.
Integrated optical 3D digital imaging based on DSP scheme
NASA Astrophysics Data System (ADS)
Wang, Xiaodong; Peng, Xiang; Gao, Bruce Z.
2008-03-01
We present a scheme of integrated optical 3-D digital imaging (IO3DI) based on digital signal processor (DSP), which can acquire range images independently without PC support. This scheme is based on a parallel hardware structure with aid of DSP and field programmable gate array (FPGA) to realize 3-D imaging. In this integrated scheme of 3-D imaging, the phase measurement profilometry is adopted. To realize the pipeline processing of the fringe projection, image acquisition and fringe pattern analysis, we present a multi-threads application program that is developed under the environment of DSP/BIOS RTOS (real-time operating system). Since RTOS provides a preemptive kernel and powerful configuration tool, with which we are able to achieve a real-time scheduling and synchronization. To accelerate automatic fringe analysis and phase unwrapping, we make use of the technique of software optimization. The proposed scheme can reach a performance of 39.5 f/s (frames per second), so it may well fit into real-time fringe-pattern analysis and can implement fast 3-D imaging. Experiment results are also presented to show the validity of proposed scheme.
Vessel thermal map real-time system for the JET tokamak
NASA Astrophysics Data System (ADS)
Alves, D.; Felton, R.; Jachmich, S.; Lomas, P.; McCullen, P.; Neto, A.; Valcárcel, D. F.; Arnoux, G.; Card, P.; Devaux, S.; Goodyear, A.; Kinna, D.; Stephen, A.; Zastrow, K.-D.
2012-05-01
The installation of international thermonuclear experimental reactor-relevant materials for the plasma facing components (PFCs) in the Joint European Torus (JET) is expected to have a strong impact on the operation and protection of the experiment. In particular, the use of all-beryllium tiles, which deteriorate at a substantially lower temperature than the formerly installed carbon fiber composite tiles, imposes strict thermal restrictions on the PFCs during operation. Prompt and precise responses are therefore required whenever anomalous temperatures are detected. The new vessel thermal map real-time application collects the temperature measurements provided by dedicated pyrometers and infrared cameras, groups them according to spatial location and probable offending heat source, and raises alarms that will trigger appropriate protective responses. In the context of the JET global scheme for the protection of the new wall, the system is required to run on a 10 ms cycle communicating with other systems through the real-time data network. In order to meet these requirements a commercial off-the-shelf solution has been adopted based on standard x86 multicore technology. Linux and the multithreaded application real-time executor (MARTe) software framework were respectively the operating system of choice and the real-time framework used to build the application. This paper presents an overview of the system with particular technical focus on the configuration of its real-time capability and the benefits of the modular development approach and advanced tools provided by the MARTe framework.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Marquez, Andres; Manzano Franco, Joseph B.; Song, Shuaiwen
With Exascale performance and its challenges in mind, one ubiquitous concern among architects is energy efficiency. Petascale systems projected to Exascale systems are unsustainable at current power consumption rates. One major contributor to system-wide power consumption is the number of memory operations leading to data movement and management techniques applied by the runtime system. To address this problem, we present the concept of the Architected Composite Data Types (ACDT) framework. The framework is made aware of data composites, assigning them a specific layout, transformations and operators. Data manipulation overhead is amortized over a larger number of elements and program performancemore » and power efficiency can be significantly improved. We developed the fundamentals of an ACDT framework on a massively multithreaded adaptive runtime system geared towards Exascale clusters. Showcasing the capability of ACDT, we exercised the framework with two representative processing kernels - Matrix Vector Multiply and the Cholesky Decomposition – applied to sparse matrices. As transformation modules, we applied optimized compress/decompress engines and configured invariant operators for maximum energy/performance efficiency. Additionally, we explored two different approaches based on transformation opaqueness in relation to the application. Under the first approach, the application is agnostic to compression and decompression activity. Such approach entails minimal changes to the original application code, but leaves out potential applicationspecific optimizations. The second approach exposes the decompression process to the application, hereby exposing optimization opportunities that can only be exploited with application knowledge. The experimental results show that the two approaches have their strengths in HW and SW respectively, where the SW approach can yield performance and power improvements that are an order of magnitude better than ACDT-oblivious, hand-optimized implementations.We consider the ACDT runtime framework an important component of compute nodes that will lead towards power efficient Exascale clusters.« less
Lommen, Arjen; van der Kamp, Henk J; Kools, Harrie J; van der Lee, Martijn K; van der Weg, Guido; Mol, Hans G J
2012-11-09
A new alternative data processing tool set, metAlignID, is developed for automated pre-processing and library-based identification and concentration estimation of target compounds after analysis by comprehensive two-dimensional gas chromatography with mass spectrometric detection. The tool set has been developed for and tested on LECO data. The software is developed to run multi-threaded (one thread per processor core) on a standard PC (personal computer) under different operating systems and is as such capable of processing multiple data sets simultaneously. Raw data files are converted into netCDF (network Common Data Form) format using a fast conversion tool. They are then preprocessed using previously developed algorithms originating from metAlign software. Next, the resulting reduced data files are searched against a user-composed library (derived from user or commercial NIST-compatible libraries) (NIST=National Institute of Standards and Technology) and the identified compounds, including an indicative concentration, are reported in Excel format. Data can be processed batch wise. The overall time needed for conversion together with processing and searching of 30 raw data sets for 560 compounds is routinely within an hour. The screening performance is evaluated for detection of pesticides and contaminants in raw data obtained after analysis of soil and plant samples. Results are compared to the existing data-handling routine based on proprietary software (LECO, ChromaTOF). The developed software tool set, which is freely downloadable at www.metalign.nl, greatly accelerates data-analysis and offers more options for fine-tuning automated identification toward specific application needs. The quality of the results obtained is slightly better than the standard processing and also adds a quantitative estimate. The software tool set in combination with two-dimensional gas chromatography coupled to time-of-flight mass spectrometry shows great potential as a highly-automated and fast multi-residue instrumental screening method. Copyright © 2012 Elsevier B.V. All rights reserved.
Unattended real-time re-establishment of visibility in high dynamic range video and stills
NASA Astrophysics Data System (ADS)
Abidi, B.
2014-05-01
We describe a portable unattended persistent surveillance system that corrects for harsh illumination conditions, where bright sun light creates mixed contrast effects, i.e., heavy shadows and washouts. These effects result in high dynamic range scenes, where illuminance can vary from few luxes to a 6 figure value. When using regular monitors and cameras, such wide span of illuminations can only be visualized if the actual range of values is compressed, leading to the creation of saturated and/or dark noisy areas and a loss of information in these areas. Images containing extreme mixed contrast cannot be fully enhanced from a single exposure, simply because all information is not present in the original data. The active intervention in the acquisition process is required. A software package, capable of integrating multiple types of COTS and custom cameras, ranging from Unmanned Aerial Systems (UAS) data links to digital single-lens reflex cameras (DSLR), is described. Hardware and software are integrated via a novel smart data acquisition algorithm, which communicates to the camera the parameters that would maximize information content in the final processed scene. A fusion mechanism is then applied to the smartly acquired data, resulting in an enhanced scene where information in both dark and bright areas is revealed. Multi-threading and parallel processing are exploited to produce automatic real time full motion corrected video. A novel enhancement algorithm was also devised to process data from legacy and non-controllable cameras. The software accepts and processes pre-recorded sequences and stills, enhances visible, night vision, and Infrared data, and successfully applies to night time and dark scenes. Various user options are available, integrating custom functionalities of the application into intuitive and easy to use graphical interfaces. The ensuing increase in visibility in surveillance video and intelligence imagery will expand the performance and timely decision making of the human analyst, as well as that of unmanned systems performing automatic data exploitation, such as target detection and identification.
NASA Astrophysics Data System (ADS)
Huang, Melin; Huang, Bormin; Huang, Allen H.
2014-10-01
For weather forecasting and research, the Weather Research and Forecasting (WRF) model has been developed, consisting of several components such as dynamic solvers and physical simulation modules. WRF includes several Land- Surface Models (LSMs). The LSMs use atmospheric information, the radiative and precipitation forcing from the surface layer scheme, the radiation scheme, and the microphysics/convective scheme all together with the land's state variables and land-surface properties, to provide heat and moisture fluxes over land and sea-ice points. The WRF 5-layer thermal diffusion simulation is an LSM based on the MM5 5-layer soil temperature model with an energy budget that includes radiation, sensible, and latent heat flux. The WRF LSMs are very suitable for massively parallel computation as there are no interactions among horizontal grid points. The features, efficient parallelization and vectorization essentials, of Intel Many Integrated Core (MIC) architecture allow us to optimize this WRF 5-layer thermal diffusion scheme. In this work, we present the results of the computing performance on this scheme with Intel MIC architecture. Our results show that the MIC-based optimization improved the performance of the first version of multi-threaded code on Xeon Phi 5110P by a factor of 2.1x. Accordingly, the same CPU-based optimizations improved the performance on Intel Xeon E5- 2603 by a factor of 1.6x as compared to the first version of multi-threaded code.
muBLASTP: database-indexed protein sequence search on multicore CPUs.
Zhang, Jing; Misra, Sanchit; Wang, Hao; Feng, Wu-Chun
2016-11-04
The Basic Local Alignment Search Tool (BLAST) is a fundamental program in the life sciences that searches databases for sequences that are most similar to a query sequence. Currently, the BLAST algorithm utilizes a query-indexed approach. Although many approaches suggest that sequence search with a database index can achieve much higher throughput (e.g., BLAT, SSAHA, and CAFE), they cannot deliver the same level of sensitivity as the query-indexed BLAST, i.e., NCBI BLAST, or they can only support nucleotide sequence search, e.g., MegaBLAST. Due to different challenges and characteristics between query indexing and database indexing, the existing techniques for query-indexed search cannot be used into database indexed search. muBLASTP, a novel database-indexed BLAST for protein sequence search, delivers identical hits returned to NCBI BLAST. On Intel Haswell multicore CPUs, for a single query, the single-threaded muBLASTP achieves up to a 4.41-fold speedup for alignment stages, and up to a 1.75-fold end-to-end speedup over single-threaded NCBI BLAST. For a batch of queries, the multithreaded muBLASTP achieves up to a 5.7-fold speedups for alignment stages, and up to a 4.56-fold end-to-end speedup over multithreaded NCBI BLAST. With a newly designed index structure for protein database and associated optimizations in BLASTP algorithm, we re-factored BLASTP algorithm for modern multicore processors that achieves much higher throughput with acceptable memory footprint for the database index.
Hydrogen Balmer Line Broadening in Solar and Stellar Flares
NASA Technical Reports Server (NTRS)
Kowalski, Adam F.; Allred, Joel C.; Uitenbroek, Han; Tremblay, Pier-Emmanuel; Brown, Stephen; Carlsson, Mats; Osten, Rachel A.; Wisniewski, John P.; Hawley, Suzanne L.
2017-01-01
The broadening of the hydrogen lines during flares is thought to result from increased charge (electron, proton) density in the flare chromosphere. However, disagreements between theory and modeling prescriptions have precluded an accurate diagnostic of the degree of ionization and compression resulting from flare heating in the chromosphere. To resolve this issue, we have incorporated the unified theory of electric pressure broadening of the hydrogen lines into the non-LTE radiative-transfer code RH. This broadening prescription produces a much more realistic spectrum of the quiescent, A0 star Vega compared to the analytic approximations used as a damping parameter in the Voigt profiles. We test recent radiative-hydrodynamic (RHD) simulations of the atmospheric response to high nonthermal electron beam fluxes with the new broadening prescription and find that the Balmer lines are overbroadened at the densest times in the simulations. Adding many simultaneously heated and cooling model loops as a 'multithread' model improves the agreement with the observations. We revisit the three component phenomenological flare model of the YZ CMi Megaflare using recent and new RHD models. The evolution of the broadening, line flux ratios, and continuum flux ratios are well-reproduced by a multithread model with high-flux nonthermal electron beam heating, an extended decay phase model, and a 'hot spot' atmosphere heated by an ultra relativistic electron beam with reasonable filling factors: approximately 0.1%, 1%, and 0.1% of the visible stellar hemisphere, respectively. The new modeling motivates future work to understand the origin of the extended gradual phase emission.
NASA Astrophysics Data System (ADS)
Sardina, V.
2017-12-01
The Pacific Tsunami Warning Center's round the clock operations rely on the rapid determination of the source parameters of earthquakes occurring around the world. To rapidly estimate source parameters such as earthquake location and magnitude the PTWC analyzes data streams ingested in near-real time from a global network of more than 700 seismic stations. Both the density of this network and the data latency of its member stations at any given time have a direct impact on the speed at which the PTWC scientists on duty can locate an earthquake and estimate its magnitude. In this context, it turns operationally advantageous to have the ability of assessing how quickly the PTWC operational system can reasonably detect and locate and earthquake, estimate its magnitude, and send the corresponding tsunami message whenever appropriate. For this purpose, we designed and implemented a multithreaded C++ software package to generate detection time grids for both P- and S-waves after taking into consideration the seismic network topology and the data latency of its member stations. We first encapsulate all the parameters of interest at a given geographic point, such as geographic coordinates, P- and S-waves detection time in at least a minimum number of stations, and maximum allowed azimuth gap into a DetectionTimePoint class. Then we apply composition and inheritance to define a DetectionTimeLine class that handles a vector of DetectionTimePoint objects along a given latitude. A DetectionTimesGrid class in turn handles the dynamic allocation of new TravelTimeLine objects and assigning the calculation of the corresponding P- and S-waves' detection times to new threads. Finally, we added a GUI that allows the user to interactively set all initial calculation parameters and output options. Initial testing in an eight core system shows that generation of a global 2D grid at 1 degree resolution setting detection on at least 5 stations and no azimuth gap restriction takes under 25 seconds. Under the same initial conditions, generation of a 2D grid at 0.1 degree resolution (2.6 million grid points) takes no more than 22 minutes. This preliminary results show a significant gain in grid generation speed when compared to other implementation via either scripts, or previous versions of the C++ code that did not implement multithreading.
NASA Astrophysics Data System (ADS)
Mikuś, Paweł; Wyżga, Bartłomiej; Radecki-Pawlik, Artur; Zawiejska, Joanna; Amirowicz, Antoni; Oglęcki, Paweł
2016-11-01
Migration of a mountain river channel may cause erosional risk to infrastructure or settlements on the valley floor. Following a flood of 2010, a cutbank in one of the bends of the main channel of the Czarny Dunajec, Polish Carpathians, approached a local road by 50 m. To arrest the erosion of the laterally migrating channel, water authorities planned construction of a ditch cutting the forested neck of the bend, reinforcement of the ditch banks, and damming the main channel with a boulder groyne. In order to avoid channelization of the highly valued, multithread river reach that would deteriorate its ecological status and cause increased flood risk to downstream reaches, an alternative approach to prevent bank erosion was proposed. The new scheme, applied in 2011, included opening of the inlets to inactive side braids located by the neck of the bend of the main channel. This solution reestablished the flow in the steeper low-flow channels, allowing us to expect a cutoff and abandonment of the main channel during subsequent floods. Gravelly deflectors were constructed directly below the inlets to the reactivated side channels to divert the flow into the channels and prevent the water from entering the main channel. Hydraulic measurements performed before and after the implementation of the scheme confirmed that it enabled shifting the main water current, with the highest average velocity and bed shear stress, from the braid closest to the road to the most distant braid. Similar surveys of fish and benthic macroinvertebrate communities indicated that flow reactivation in the side channels was beneficial for these groups of river biota, increasing their abundance and taxonomic richness in the reach. Not only was the implemented solution significantly less expensive, but it also enhanced ecological functions of the multithread channel and the variability of physical habitat conditions and maintained the role of the reach as a wood debris trap. However, avulsion of the main channel in the river bend immediately upstream during the flood in May 2014 again caused erosional risk to the road, although at another location. This indicates that with the highly unstable, multithread channel pattern of the Czarny Dunajec, the best practice of river maintenance in a relatively unmanaged valley reach would be allowing free channel migration within the floodplain area and reinforcing, where necessary, the boundary between the erodible river corridor and the managed terrace.
Aho-Corasick String Matching on Shared and Distributed Memory Parallel Architectures
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tumeo, Antonino; Villa, Oreste; Chavarría-Miranda, Daniel
String matching is at the core of many critical applications, including network intrusion detection systems, search engines, virus scanners, spam filters, DNA and protein sequencing, and data mining. For all of these applications string matching requires a combination of (sometimes all) the following characteristics: high and/or predictable performance, support for large data sets and flexibility of integration and customization. Many software based implementations targeting conventional cache-based microprocessors fail to achieve high and predictable performance requirements, while Field-Programmable Gate Array (FPGA) implementations and dedicated hardware solutions fail to support large data sets (dictionary sizes) and are difficult to integrate and customize.more » The advent of multicore, multithreaded, and GPU-based systems is opening the possibility for software based solutions to reach very high performance at a sustained rate. This paper compares several software-based implementations of the Aho-Corasick string searching algorithm for high performance systems. We discuss the implementation of the algorithm on several types of shared-memory high-performance architectures (Niagara 2, large x86 SMPs and Cray XMT), distributed memory with homogeneous processing elements (InfiniBand cluster of x86 multicores) and heterogeneous processing elements (InfiniBand cluster of x86 multicores with NVIDIA Tesla C10 GPUs). We describe in detail how each solution achieves the objectives of supporting large dictionaries, sustaining high performance, and enabling customization and flexibility using various data sets.« less
D GIS for Flood Modelling in River Valleys
NASA Astrophysics Data System (ADS)
Tymkow, P.; Karpina, M.; Borkowski, A.
2016-06-01
The objective of this study is implementation of system architecture for collecting and analysing data as well as visualizing results for hydrodynamic modelling of flood flows in river valleys using remote sensing methods, tree-dimensional geometry of spatial objects and GPU multithread processing. The proposed solution includes: spatial data acquisition segment, data processing and transformation, mathematical modelling of flow phenomena and results visualization. Data acquisition segment was based on aerial laser scanning supplemented by images in visible range. Vector data creation was based on automatic and semiautomatic algorithms of DTM and 3D spatial features modelling. Algorithms for buildings and vegetation geometry modelling were proposed or adopted from literature. The implementation of the framework was designed as modular software using open specifications and partially reusing open source projects. The database structure for gathering and sharing vector data, including flood modelling results, was created using PostgreSQL. For the internal structure of feature classes of spatial objects in a database, the CityGML standard was used. For the hydrodynamic modelling the solutions of Navier-Stokes equations in two-dimensional version was implemented. Visualization of geospatial data and flow model results was transferred to the client side application. This gave the independence from server hardware platform. A real-world case in Poland, which is a part of Widawa River valley near Wroclaw city, was selected to demonstrate the applicability of proposed system.
Customizing FP-growth algorithm to parallel mining with Charm++ library
NASA Astrophysics Data System (ADS)
Puścian, Marek
2017-08-01
This paper presents a frequent item mining algorithm that was customized to handle growing data repositories. The proposed solution applies Master Slave scheme to frequent pattern growth technique. Efficient utilization of available computation units is achieved by dynamic reallocation of tasks. Conditional frequent trees are assigned to parallel workers basing on their workload. Proposed enhancements have been successfully implemented using Charm++ library. This paper discusses results of the performance of parallelized FP-growth algorithm against different datasets. The approach has been illustrated with many experiments and measurements performed using multiprocessor and multithreaded computer.
NASA Astrophysics Data System (ADS)
Esparza, Javier
In many areas of computer science entities can “reproduce”, “replicate”, or “create new instances”. Paramount examples are threads in multithreaded programs, processes in operating systems, and computer viruses, but many others exist: procedure calls create new incarnations of the callees, web crawlers discover new pages to be explored (and so “create” new tasks), divide-and-conquer procedures split a problem into subproblems, and leaves of tree-based data structures become internal nodes with children. For lack of a better name, I use the generic term systems with process creation to refer to all these entities.
pycola: N-body COLA method code
NASA Astrophysics Data System (ADS)
Tassev, Svetlin; Eisenstein, Daniel J.; Wandelt, Benjamin D.; Zaldarriagag, Matias
2015-09-01
pycola is a multithreaded Python/Cython N-body code, implementing the Comoving Lagrangian Acceleration (COLA) method in the temporal and spatial domains, which trades accuracy at small-scales to gain computational speed without sacrificing accuracy at large scales. This is especially useful for cheaply generating large ensembles of accurate mock halo catalogs required to study galaxy clustering and weak lensing. The COLA method achieves its speed by calculating the large-scale dynamics exactly using LPT while letting the N-body code solve for the small scales, without requiring it to capture exactly the internal dynamics of halos.
Effective Vectorization with OpenMP 4.5
DOE Office of Scientific and Technical Information (OSTI.GOV)
Huber, Joseph N.; Hernandez, Oscar R.; Lopez, Matthew Graham
This paper describes how the Single Instruction Multiple Data (SIMD) model and its extensions in OpenMP work, and how these are implemented in different compilers. Modern processors are highly parallel computational machines which often include multiple processors capable of executing several instructions in parallel. Understanding SIMD and executing instructions in parallel allows the processor to achieve higher performance without increasing the power required to run it. SIMD instructions can significantly reduce the runtime of code by executing a single operation on large groups of data. The SIMD model is so integral to the processor s potential performance that, if SIMDmore » is not utilized, less than half of the processor is ever actually used. Unfortunately, using SIMD instructions is a challenge in higher level languages because most programming languages do not have a way to describe them. Most compilers are capable of vectorizing code by using the SIMD instructions, but there are many code features important for SIMD vectorization that the compiler cannot determine at compile time. OpenMP attempts to solve this by extending the C++/C and Fortran programming languages with compiler directives that express SIMD parallelism. OpenMP is used to pass hints to the compiler about the code to be executed in SIMD. This is a key resource for making optimized code, but it does not change whether or not the code can use SIMD operations. However, in many cases critical functions are limited by a poor understanding of how SIMD instructions are actually implemented, as SIMD can be implemented through vector instructions or simultaneous multi-threading (SMT). We have found that it is often the case that code cannot be vectorized, or is vectorized poorly, because the programmer does not have sufficient knowledge of how SIMD instructions work.« less
MARTe: A Multiplatform Real-Time Framework
NASA Astrophysics Data System (ADS)
Neto, André C.; Sartori, Filippo; Piccolo, Fabio; Vitelli, Riccardo; De Tommasi, Gianmaria; Zabeo, Luca; Barbalace, Antonio; Fernandes, Horacio; Valcarcel, Daniel F.; Batista, Antonio J. N.
2010-04-01
Development of real-time applications is usually associated with nonportable code targeted at specific real-time operating systems. The boundary between hardware drivers, system services, and user code is commonly not well defined, making the development in the target host significantly difficult. The Multithreaded Application Real-Time executor (MARTe) is a framework built over a multiplatform library that allows the execution of the same code in different operating systems. The framework provides the high-level interfaces with hardware, external configuration programs, and user interfaces, assuring at the same time hard real-time performances. End-users of the framework are required to define and implement algorithms inside a well-defined block of software, named Generic Application Module (GAM), that is executed by the real-time scheduler. Each GAM is reconfigurable with a set of predefined configuration meta-parameters and interchanges information using a set of data pipes that are provided as inputs and required as output. Using these connections, different GAMs can be chained either in series or parallel. GAMs can be developed and debugged in a non-real-time system and, only once the robustness of the code and correctness of the algorithm are verified, deployed to the real-time system. The software also supplies a large set of utilities that greatly ease the interaction and debugging of a running system. Among the most useful are a highly efficient real-time logger, HTTP introspection of real-time objects, and HTTP remote configuration. MARTe is currently being used to successfully drive the plasma vertical stabilization controller on the largest magnetic confinement fusion device in the world, with a control loop cycle of 50 ?s and a jitter under 1 ?s. In this particular project, MARTe is used with the Real-Time Application Interface (RTAI)/Linux operating system exploiting the new ?86 multicore processors technology.
Multithreaded Stochastic PDES for Reactions and Diffusions in Neurons.
Lin, Zhongwei; Tropper, Carl; Mcdougal, Robert A; Patoary, Mohammand Nazrul Ishlam; Lytton, William W; Yao, Yiping; Hines, Michael L
2017-07-01
Cells exhibit stochastic behavior when the number of molecules is small. Hence a stochastic reaction-diffusion simulator capable of working at scale can provide a more accurate view of molecular dynamics within the cell. This paper describes a parallel discrete event simulator, Neuron Time Warp-Multi Thread (NTW-MT), developed for the simulation of reaction diffusion models of neurons. To the best of our knowledge, this is the first parallel discrete event simulator oriented towards stochastic simulation of chemical reactions in a neuron. The simulator was developed as part of the NEURON project. NTW-MT is optimistic and thread-based, which attempts to capitalize on multi-core architectures used in high performance machines. It makes use of a multi-level queue for the pending event set and a single roll-back message in place of individual anti-messages to disperse contention and decrease the overhead of processing rollbacks. Global Virtual Time is computed asynchronously both within and among processes to get rid of the overhead for synchronizing threads. Memory usage is managed in order to avoid locking and unlocking when allocating and de-allocating memory and to maximize cache locality. We verified our simulator on a calcium buffer model. We examined its performance on a calcium wave model, comparing it to the performance of a process based optimistic simulator and a threaded simulator which uses a single priority queue for each thread. Our multi-threaded simulator is shown to achieve superior performance to these simulators. Finally, we demonstrated the scalability of our simulator on a larger CICR model and a more detailed CICR model.
van Albada, Sacha J.; Rowley, Andrew G.; Senk, Johanna; Hopkins, Michael; Schmidt, Maximilian; Stokes, Alan B.; Lester, David R.; Diesmann, Markus; Furber, Steve B.
2018-01-01
The digital neuromorphic hardware SpiNNaker has been developed with the aim of enabling large-scale neural network simulations in real time and with low power consumption. Real-time performance is achieved with 1 ms integration time steps, and thus applies to neural networks for which faster time scales of the dynamics can be neglected. By slowing down the simulation, shorter integration time steps and hence faster time scales, which are often biologically relevant, can be incorporated. We here describe the first full-scale simulations of a cortical microcircuit with biological time scales on SpiNNaker. Since about half the synapses onto the neurons arise within the microcircuit, larger cortical circuits have only moderately more synapses per neuron. Therefore, the full-scale microcircuit paves the way for simulating cortical circuits of arbitrary size. With approximately 80, 000 neurons and 0.3 billion synapses, this model is the largest simulated on SpiNNaker to date. The scale-up is enabled by recent developments in the SpiNNaker software stack that allow simulations to be spread across multiple boards. Comparison with simulations using the NEST software on a high-performance cluster shows that both simulators can reach a similar accuracy, despite the fixed-point arithmetic of SpiNNaker, demonstrating the usability of SpiNNaker for computational neuroscience applications with biological time scales and large network size. The runtime and power consumption are also assessed for both simulators on the example of the cortical microcircuit model. To obtain an accuracy similar to that of NEST with 0.1 ms time steps, SpiNNaker requires a slowdown factor of around 20 compared to real time. The runtime for NEST saturates around 3 times real time using hybrid parallelization with MPI and multi-threading. However, achieving this runtime comes at the cost of increased power and energy consumption. The lowest total energy consumption for NEST is reached at around 144 parallel threads and 4.6 times slowdown. At this setting, NEST and SpiNNaker have a comparable energy consumption per synaptic event. Our results widen the application domain of SpiNNaker and help guide its development, showing that further optimizations such as synapse-centric network representation are necessary to enable real-time simulation of large biological neural networks. PMID:29875620
Function-based design process for an intelligent ground vehicle vision system
NASA Astrophysics Data System (ADS)
Nagel, Robert L.; Perry, Kenneth L.; Stone, Robert B.; McAdams, Daniel A.
2010-10-01
An engineering design framework for an autonomous ground vehicle vision system is discussed. We present both the conceptual and physical design by following the design process, development and testing of an intelligent ground vehicle vision system constructed for the 2008 Intelligent Ground Vehicle Competition. During conceptual design, the requirements for the vision system are explored via functional and process analysis considering the flows into the vehicle and the transformations of those flows. The conceptual design phase concludes with a vision system design that is modular in both hardware and software and is based on a laser range finder and camera for visual perception. During physical design, prototypes are developed and tested independently, following the modular interfaces identified during conceptual design. Prototype models, once functional, are implemented into the final design. The final vision system design uses a ray-casting algorithm to process camera and laser range finder data and identify potential paths. The ray-casting algorithm is a single thread of the robot's multithreaded application. Other threads control motion, provide feedback, and process sensory data. Once integrated, both hardware and software testing are performed on the robot. We discuss the robot's performance and the lessons learned.
Two-level main memory co-design: Multi-threaded algorithmic primitives, analysis, and simulation
Bender, Michael A.; Berry, Jonathan W.; Hammond, Simon D.; ...
2017-01-03
A challenge in computer architecture is that processors often cannot be fed data from DRAM as fast as CPUs can consume it. Therefore, many applications are memory-bandwidth bound. With this motivation and the realization that traditional architectures (with all DRAM reachable only via bus) are insufficient to feed groups of modern processing units, vendors have introduced a variety of non-DDR 3D memory technologies (Hybrid Memory Cube (HMC),Wide I/O 2, High Bandwidth Memory (HBM)). These offer higher bandwidth and lower power by stacking DRAM chips on the processor or nearby on a silicon interposer. We will call these solutions “near-memory,” andmore » if user-addressable, “scratchpad.” High-performance systems on the market now offer two levels of main memory: near-memory on package and traditional DRAM further away. In the near term we expect the latencies near-memory and DRAM to be similar. Here, it is natural to think of near-memory as another module on the DRAM level of the memory hierarchy. Vendors are expected to offer modes in which the near memory is used as cache, but we believe that this will be inefficient.« less
SimExTargId: A comprehensive package for real-time LC-MS data acquisition and analysis.
Edmands, William M B; Hayes, Josie; Rappaport, Stephen M
2018-05-22
Liquid chromatography mass spectrometry (LC-MS) is the favored method for untargeted metabolomic analysis of small molecules in biofluids. Here we present SimExTargId, an open-source R package for autonomous analysis of metabolomic data and real-time observation of experimental runs. This simultaneous, fully automated and multi-threaded (optional) package is a wrapper for vendor-independent format conversion (ProteoWizard), xcms- and CAMERA- based peak-picking, MetMSLine-based pre-processing and covariate-based statistical analysis. Users are notified of detrimental instrument drift or errors by email. Also included are two shiny applications, targetId for real-time MS2 target identification, and peakMonitor to monitor targeted metabolites. SimExTargId is publicly available under GNU LGPL v3.0 license at https://github.com/JosieLHayes/simExTargId, which includes a vignette with example data. SimExTargId should be installed on a dedicated data-processing workstation or server that is networked to the LC-MS platform to facilitate MS1 profiling of metabolomic data. josie.hayes@berkeley.edu. Supplementary data are available at Bioinformatics online.
Use of Annotations for Component and Framework Interoperability
NASA Astrophysics Data System (ADS)
David, O.; Lloyd, W.; Carlson, J.; Leavesley, G. H.; Geter, F.
2009-12-01
The popular programming languages Java and C# provide annotations, a form of meta-data construct. Software frameworks for web integration, web services, database access, and unit testing now take advantage of annotations to reduce the complexity of APIs and the quantity of integration code between the application and framework infrastructure. Adopting annotation features in frameworks has been observed to lead to cleaner and leaner application code. The USDA Object Modeling System (OMS) version 3.0 fully embraces the annotation approach and additionally defines a meta-data standard for components and models. In version 3.0 framework/model integration previously accomplished using API calls is now achieved using descriptive annotations. This enables the framework to provide additional functionality non-invasively such as implicit multithreading, and auto-documenting capabilities while achieving a significant reduction in the size of the model source code. Using a non-invasive methodology leads to models and modeling components with only minimal dependencies on the modeling framework. Since models and modeling components are not directly bound to framework by the use of specific APIs and/or data types they can more easily be reused both within the framework as well as outside of it. To study the effectiveness of an annotation based framework approach with other modeling frameworks, a framework-invasiveness study was conducted to evaluate the effects of framework design on model code quality. A monthly water balance model was implemented across several modeling frameworks and several software metrics were collected. The metrics selected were measures of non-invasive design methods for modeling frameworks from a software engineering perspective. It appears that the use of annotations positively impacts several software quality measures. In a next step, the PRMS model was implemented in OMS 3.0 and is currently being implemented for water supply forecasting in the western United States at the USDA NRCS National Water and Climate Center. PRMS is a component based modular precipitation-runoff model developed to evaluate the impacts of various combinations of precipitation, climate, and land use on streamflow and general basin hydrology. The new OMS 3.0 PRMS model source code is more concise and flexible as a result of using the new framework’s annotation based approach. The fully annotated components are now providing information directly for (i) model assembly and building, (ii) dataflow analysis for implicit multithreading, (iii) automated and comprehensive model documentation of component dependencies, physical data properties, (iv) automated model and component testing, and (v) automated audit-traceability to account for all model resources leading to a particular simulation result. Experience to date has demonstrated the multi-purpose value of using annotations. Annotations are also a feasible and practical method to enable interoperability among models and modeling frameworks. As a prototype example, model code annotations were used to generate binding and mediation code to allow the use of OMS 3.0 model components within the OpenMI context.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mellor-Crummey, John
The PIPER project set out to develop methodologies and software for measurement, analysis, attribution, and presentation of performance data for extreme-scale systems. Goals of the project were to support analysis of massive multi-scale parallelism, heterogeneous architectures, multi-faceted performance concerns, and to support both post-mortem performance analysis to identify program features that contribute to problematic performance and on-line performance analysis to drive adaptation. This final report summarizes the research and development activity at Rice University as part of the PIPER project. Producing a complete suite of performance tools for exascale platforms during the course of this project was impossible since bothmore » hardware and software for exascale systems is still a moving target. For that reason, the project focused broadly on the development of new techniques for measurement and analysis of performance on modern parallel architectures, enhancements to HPCToolkit’s software infrastructure to support our research goals or use on sophisticated applications, engaging developers of multithreaded runtimes to explore how support for tools should be integrated into their designs, engaging operating system developers with feature requests for enhanced monitoring support, engaging vendors with requests that they add hardware measure- ment capabilities and software interfaces needed by tools as they design new components of HPC platforms including processors, accelerators and networks, and finally collaborations with partners interested in using HPCToolkit to analyze and tune scalable parallel applications.« less
A Cross-Platform Infrastructure for Scalable Runtime Application Performance Analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jack Dongarra; Shirley Moore; Bart Miller, Jeffrey Hollingsworth
2005-03-15
The purpose of this project was to build an extensible cross-platform infrastructure to facilitate the development of accurate and portable performance analysis tools for current and future high performance computing (HPC) architectures. Major accomplishments include tools and techniques for multidimensional performance analysis, as well as improved support for dynamic performance monitoring of multithreaded and multiprocess applications. Previous performance tool development has been limited by the burden of having to re-write a platform-dependent low-level substrate for each architecture/operating system pair in order to obtain the necessary performance data from the system. Manual interpretation of performance data is not scalable for large-scalemore » long-running applications. The infrastructure developed by this project provides a foundation for building portable and scalable performance analysis tools, with the end goal being to provide application developers with the information they need to analyze, understand, and tune the performance of terascale applications on HPC architectures. The backend portion of the infrastructure provides runtime instrumentation capability and access to hardware performance counters, with thread-safety for shared memory environments and a communication substrate to support instrumentation of multiprocess and distributed programs. Front end interfaces provides tool developers with a well-defined, platform-independent set of calls for requesting performance data. End-user tools have been developed that demonstrate runtime data collection, on-line and off-line analysis of performance data, and multidimensional performance analysis. The infrastructure is based on two underlying performance instrumentation technologies. These technologies are the PAPI cross-platform library interface to hardware performance counters and the cross-platform Dyninst library interface for runtime modification of executable images. The Paradyn and KOJAK projects have made use of this infrastructure to build performance measurement and analysis tools that scale to long-running programs on large parallel and distributed systems and that automate much of the search for performance bottlenecks.« less
2D MHD AND 1D HD MODELS OF A SOLAR FLARE—A COMPREHENSIVE COMPARISON OF THE RESULTS
DOE Office of Scientific and Technical Information (OSTI.GOV)
Falewicz, R.; Rudawy, P.; Murawski, K.
Without any doubt, solar flaring loops possess a multithread internal structure that is poorly resolved, and there are no means to observe heating episodes and thermodynamic evolution of the individual threads. These limitations cause fundamental problems in numerical modeling of flaring loops, such as selection of a structure and a number of threads, and an implementation of a proper model of the energy deposition process. A set of one-dimensional (1D) hydrodynamic and two-dimensional (2D) magnetohydrodynamic models of a flaring loop are developed to compare energy redistribution and plasma dynamics in the course of a prototypical solar flare. Basic parameters ofmore » the modeled loop are set according to the progenitor M1.8 flare recorded in AR 10126 on 2002 September 20 between 09:21 UT and 09:50 UT. The nonideal 1D models include thermal conduction and radiative losses of the optically thin plasma as energy-loss mechanisms, while the nonideal 2D models take into account viscosity and thermal conduction as energy-loss mechanisms only. The 2D models have a continuous distribution of the parameters of the plasma across the loop and are powered by varying in time and space along and across the loop heating flux. We show that such 2D models are an extreme borderline case of a multithread internal structure of the flaring loop, with a filling factor equal to 1. Nevertheless, these simple models ensure the general correctness of the obtained results and can be adopted as a correct approximation of the real flaring structures.« less
Improvement and speed optimization of numerical tsunami modelling program using OpenMP technology
NASA Astrophysics Data System (ADS)
Chernov, A.; Zaytsev, A.; Yalciner, A.; Kurkin, A.
2009-04-01
Currently, the basic problem of tsunami modeling is low speed of calculations which is unacceptable for services of the operative notification. Existing algorithms of numerical modeling of hydrodynamic processes of tsunami waves are developed without taking the opportunities of modern computer facilities. There is an opportunity to have considerable acceleration of process of calculations by using parallel algorithms. We discuss here new approach to parallelization tsunami modeling code using OpenMP Technology (for multiprocessing systems with the general memory). Nowadays, multiprocessing systems are easily accessible for everyone. The cost of the use of such systems becomes much lower comparing to the costs of clusters. This opportunity also benefits all programmers to apply multithreading algorithms on desktop computers of researchers. Other important advantage of the given approach is the mechanism of the general memory - there is no necessity to send data on slow networks (for example Ethernet). All memory is the common for all computing processes; it causes almost linear scalability of the program and processes. In the new version of NAMI DANCE using OpenMP technology and multi-threading algorithm provide 80% gain in speed in comparison with the one-thread version for dual-processor unit. The speed increased and 320% gain was attained for four core processor unit of PCs. Thus, it was possible to reduce considerably time of performance of calculations on the scientific workstations (desktops) without complete change of the program and user interfaces. The further modernization of algorithms of preparation of initial data and processing of results using OpenMP looks reasonable. The final version of NAMI DANCE with the increased computational speed can be used not only for research purposes but also in real time Tsunami Warning Systems.
Enabling Graph Appliance for Genome Assembly
DOE Office of Scientific and Technical Information (OSTI.GOV)
Singh, Rina; Graves, Jeffrey A; Lee, Sangkeun
2015-01-01
In recent years, there has been a huge growth in the amount of genomic data available as reads generated from various genome sequencers. The number of reads generated can be huge, ranging from hundreds to billions of nucleotide, each varying in size. Assembling such large amounts of data is one of the challenging computational problems for both biomedical and data scientists. Most of the genome assemblers developed have used de Bruijn graph techniques. A de Bruijn graph represents a collection of read sequences by billions of vertices and edges, which require large amounts of memory and computational power to storemore » and process. This is the major drawback to de Bruijn graph assembly. Massively parallel, multi-threaded, shared memory systems can be leveraged to overcome some of these issues. The objective of our research is to investigate the feasibility and scalability issues of de Bruijn graph assembly on Cray s Urika-GD system; Urika-GD is a high performance graph appliance with a large shared memory and massively multithreaded custom processor designed for executing SPARQL queries over large-scale RDF data sets. However, to the best of our knowledge, there is no research on representing a de Bruijn graph as an RDF graph or finding Eulerian paths in RDF graphs using SPARQL for potential genome discovery. In this paper, we address the issues involved in representing a de Bruin graphs as RDF graphs and propose an iterative querying approach for finding Eulerian paths in large RDF graphs. We evaluate the performance of our implementation on real world ebola genome datasets and illustrate how genome assembly can be accomplished with Urika-GD using iterative SPARQL queries.« less
Memory Benchmarks for SMP-Based High Performance Parallel Computers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yoo, A B; de Supinski, B; Mueller, F
2001-11-20
As the speed gap between CPU and main memory continues to grow, memory accesses increasingly dominates the performance of many applications. The problem is particularly acute for symmetric multiprocessor (SMP) systems, where the shared memory may be accessed concurrently by a group of threads running on separate CPUs. Unfortunately, several key issues governing memory system performance in current systems are not well understood. Complex interactions between the levels of the memory hierarchy, buses or switches, DRAM back-ends, system software, and application access patterns can make it difficult to pinpoint bottlenecks and determine appropriate optimizations, and the situation is even moremore » complex for SMP systems. To partially address this problem, we formulated a set of multi-threaded microbenchmarks for characterizing and measuring the performance of the underlying memory system in SMP-based high-performance computers. We report our use of these microbenchmarks on two important SMP-based machines. This paper has four primary contributions. First, we introduce a microbenchmark suite to systematically assess and compare the performance of different levels in SMP memory hierarchies. Second, we present a new tool based on hardware performance monitors to determine a wide array of memory system characteristics, such as cache sizes, quickly and easily; by using this tool, memory performance studies can be targeted to the full spectrum of performance regimes with many fewer data points than is otherwise required. Third, we present experimental results indicating that the performance of applications with large memory footprints remains largely constrained by memory. Fourth, we demonstrate that thread-level parallelism further degrades memory performance, even for the latest SMPs with hardware prefetching and switch-based memory interconnects.« less
Nonlinear dynamic macromodeling techniques for audio systems
NASA Astrophysics Data System (ADS)
Ogrodzki, Jan; Bieńkowski, Piotr
2015-09-01
This paper develops a modelling method and a models identification technique for the nonlinear dynamic audio systems. Identification is performed by means of a behavioral approach based on a polynomial approximation. This approach makes use of Discrete Fourier Transform and Harmonic Balance Method. A model of an audio system is first created and identified and then it is simulated in real time using an algorithm of low computational complexity. The algorithm consists in real time emulation of the system response rather than in simulation of the system itself. The proposed software is written in Python language using object oriented programming techniques. The code is optimized for a multithreads environment.
Kalman filter tracking on parallel architectures
NASA Astrophysics Data System (ADS)
Cerati, G.; Elmer, P.; Krutelyov, S.; Lantz, S.; Lefebvre, M.; McDermott, K.; Riley, D.; Tadel, M.; Wittich, P.; Wurthwein, F.; Yagil, A.
2017-10-01
We report on the progress of our studies towards a Kalman filter track reconstruction algorithm with optimal performance on manycore architectures. The combinatorial structure of these algorithms is not immediately compatible with an efficient SIMD (or SIMT) implementation; the challenge for us is to recast the existing software so it can readily generate hundreds of shared-memory threads that exploit the underlying instruction set of modern processors. We show how the data and associated tasks can be organized in a way that is conducive to both multithreading and vectorization. We demonstrate very good performance on Intel Xeon and Xeon Phi architectures, as well as promising first results on Nvidia GPUs.
A wavelet approach to binary blackholes with asynchronous multitasking
NASA Astrophysics Data System (ADS)
Lim, Hyun; Hirschmann, Eric; Neilsen, David; Anderson, Matthew; Debuhr, Jackson; Zhang, Bo
2016-03-01
Highly accurate simulations of binary black holes and neutron stars are needed to address a variety of interesting problems in relativistic astrophysics. We present a new method for the solving the Einstein equations (BSSN formulation) using iterated interpolating wavelets. Wavelet coefficients provide a direct measure of the local approximation error for the solution and place collocation points that naturally adapt to features of the solution. Further, they exhibit exponential convergence on unevenly spaced collection points. The parallel implementation of the wavelet simulation framework presented here deviates from conventional practice in combining multi-threading with a form of message-driven computation sometimes referred to as asynchronous multitasking.
ROOT 6 and beyond: TObject, C++14 and many cores
Bellenot, B.; Canal, Ph; Couet, O.; ...
2015-12-23
Following the release of version 6, ROOT has entered a new area of development. It will leverage the industrial strength compiler library shipping in ROOT 6 and its support of the C++11/14 standard, to significantly simplify and harden ROOT's interfaces and to clarify and substantially improve ROOT's support for multi-threaded environments. Furthermore, this talk will also recap the most important new features and enhancements in ROOT in general, focusing on those allowed by the improved interpreter and better compiler support, including I/O for smart pointers, easier type safe access to the content of TTrees and enhanced multi processor support.
Muon g-2 Reconstruction and Analysis Framework for the Muon Anomalous Precession Frequency
DOE Office of Scientific and Technical Information (OSTI.GOV)
Khaw, Kim Siang
The Muon g-2 experiment at Fermilab, with the aim to measure the muon anomalous magnetic moment to an unprecedented level of 140~ppb, has started beam and detector commissioning in Summer 2017. To deal with incoming data projected to be around tens of petabytes, a robust data reconstruction and analysis chain based on Fermilab's \\textit{art} event-processing framework is developed. Herein, I report the current status of the framework, together with its novel features such as multi-threaded algorithms for online data quality monitor (DQM) and fast-turnaround operation (nearline). Performance of the framework during the commissioning run is also discussed.
M3T: Morphable Multithreaded Memory Tiles
2004-01-01
4 5 6 7 8 9 10 A B C D E F G H I J K L x[0] x[1] x[2] x[3] x[4] x[5] x[6] x[7] y[0] y[4] y[2] y[6] y[1] y[5] y[3] y[7] Figure 4: 8-point radix-2...CONPAR 94, September. 33. [NAR95] Narayan, S. and Gajski , D. (1995) “Interfacing Incompatible Protocols Using Interface Process Generation,” 32nd...Parallel and Distributed Tools, August. 42. [SOH95] Sohi, G ., Breach, S. and Vajapeyam, S. (1995) “Multiscalar Processors,” Proceedings of the 22nd
Visual gene developer: a fully programmable bioinformatics software for synthetic gene optimization.
Jung, Sang-Kyu; McDonald, Karen
2011-08-16
Direct gene synthesis is becoming more popular owing to decreases in gene synthesis pricing. Compared with using natural genes, gene synthesis provides a good opportunity to optimize gene sequence for specific applications. In order to facilitate gene optimization, we have developed a stand-alone software called Visual Gene Developer. The software not only provides general functions for gene analysis and optimization along with an interactive user-friendly interface, but also includes unique features such as programming capability, dedicated mRNA secondary structure prediction, artificial neural network modeling, network & multi-threaded computing, and user-accessible programming modules. The software allows a user to analyze and optimize a sequence using main menu functions or specialized module windows. Alternatively, gene optimization can be initiated by designing a gene construct and configuring an optimization strategy. A user can choose several predefined or user-defined algorithms to design a complicated strategy. The software provides expandable functionality as platform software supporting module development using popular script languages such as VBScript and JScript in the software programming environment. Visual Gene Developer is useful for both researchers who want to quickly analyze and optimize genes, and those who are interested in developing and testing new algorithms in bioinformatics. The software is available for free download at http://www.visualgenedeveloper.net.
A simple modern correctness condition for a space-based high-performance multiprocessor
NASA Technical Reports Server (NTRS)
Probst, David K.; Li, Hon F.
1992-01-01
A number of U.S. national programs, including space-based detection of ballistic missile launches, envisage putting significant computing power into space. Given sufficient progress in low-power VLSI, multichip-module packaging and liquid-cooling technologies, we will see design of high-performance multiprocessors for individual satellites. In very high speed implementations, performance depends critically on tolerating large latencies in interprocessor communication; without latency tolerance, performance is limited by the vastly differing time scales in processor and data-memory modules, including interconnect times. The modern approach to tolerating remote-communication cost in scalable, shared-memory multiprocessors is to use a multithreaded architecture, and alter the semantics of shared memory slightly, at the price of forcing the programmer either to reason about program correctness in a relaxed consistency model or to agree to program in a constrained style. The literature on multiprocessor correctness conditions has become increasingly complex, and sometimes confusing, which may hinder its practical application. We propose a simple modern correctness condition for a high-performance, shared-memory multiprocessor; the correctness condition is based on a simple interface between the multiprocessor architecture and a high-performance, shared-memory multiprocessor; the correctness condition is based on a simple interface between the multiprocessor architecture and the parallel programming system.
Flexbar 3.0 - SIMD and multicore parallelization.
Roehr, Johannes T; Dieterich, Christoph; Reinert, Knut
2017-09-15
High-throughput sequencing machines can process many samples in a single run. For Illumina systems, sequencing reads are barcoded with an additional DNA tag that is contained in the respective sequencing adapters. The recognition of barcode and adapter sequences is hence commonly needed for the analysis of next-generation sequencing data. Flexbar performs demultiplexing based on barcodes and adapter trimming for such data. The massive amounts of data generated on modern sequencing machines demand that this preprocessing is done as efficiently as possible. We present Flexbar 3.0, the successor of the popular program Flexbar. It employs now twofold parallelism: multi-threading and additionally SIMD vectorization. Both types of parallelism are used to speed-up the computation of pair-wise sequence alignments, which are used for the detection of barcodes and adapters. Furthermore, new features were included to cover a wide range of applications. We evaluated the performance of Flexbar based on a simulated sequencing dataset. Our program outcompetes other tools in terms of speed and is among the best tools in the presented quality benchmark. https://github.com/seqan/flexbar. johannes.roehr@fu-berlin.de or knut.reinert@fu-berlin.de. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Visual gene developer: a fully programmable bioinformatics software for synthetic gene optimization
2011-01-01
Background Direct gene synthesis is becoming more popular owing to decreases in gene synthesis pricing. Compared with using natural genes, gene synthesis provides a good opportunity to optimize gene sequence for specific applications. In order to facilitate gene optimization, we have developed a stand-alone software called Visual Gene Developer. Results The software not only provides general functions for gene analysis and optimization along with an interactive user-friendly interface, but also includes unique features such as programming capability, dedicated mRNA secondary structure prediction, artificial neural network modeling, network & multi-threaded computing, and user-accessible programming modules. The software allows a user to analyze and optimize a sequence using main menu functions or specialized module windows. Alternatively, gene optimization can be initiated by designing a gene construct and configuring an optimization strategy. A user can choose several predefined or user-defined algorithms to design a complicated strategy. The software provides expandable functionality as platform software supporting module development using popular script languages such as VBScript and JScript in the software programming environment. Conclusion Visual Gene Developer is useful for both researchers who want to quickly analyze and optimize genes, and those who are interested in developing and testing new algorithms in bioinformatics. The software is available for free download at http://www.visualgenedeveloper.net. PMID:21846353
Evolution of channel morphology in a large river subject to rectification
NASA Astrophysics Data System (ADS)
Scorpio, Vittoria; Mastronunzio, Marco; Proto, Matteo; Zen, Simone; Bertoldi, Walter; Prà, Elena Dai; Comiti, Francesco; Surian, Nicola; Zolezzi, Guido
2016-04-01
Many large rivers in Europe have been subject to heavy modifications for land reclamation and flood mitigation through centuries. As a consequence, the study of the pre-alteration morphological patterns and of the related channel evolution following the anthropic modifications is rather challenging. The Adige River is the second longest river in Italy and drains 12,100 km2 of the Eastern Italian Alps. Currently, it features a straight to sinuous pattern and an average channel width of 40-60 m. A massive rectification scheme aiming at land reclamation of the Adige valley bottom was planned in the late 18th century, and implemented starting in the first decades of 19th century. Nowadays, it can be considered one of the most altered rivers in Italy, not only due to channelization but also to the presence of many hydropower reservoirs and check-dams along its tributaries. This study aims to the reconstruction of the Adige River's evolutionary trajectory over the last 250 years, and comprehension of key control factors driving channel evolution. A multi-temporal analysis of historical maps and orthophotos from 1776, to 2006 was performed in order to assess channel modifications. In addition, land use changes at the basin scale, years of occurrence of most relevant flood events, and climate variability over the investigated period were analyzed. The detailed topographical map surveyed in 1803 was taken as a reference, and the study sector (115 km long) was divided into 39 reaches. Active channel, bars, riparian vegetation and channel control works were geo-processed. Results show that the Adige River suffered the most intense alteration from 1803 to 1855, and especially from 1847 to 1855. During this period channel narrowing ranged from 14% to 70%, coupled with pattern changes and decreases in the braiding, sinuosity and anabrancing indices. Most important alterations occurred in the reaches presenting a multi-thread morphology in 1803, as their average width declined from 220 m to 110 m. On the contrary, reaches originally sinuous remained quite stable, decreasing from 100 m to 95 m. Overall, relevant channel morphology modifications took place by 1855, when channel configuration had shifted from alternating longitudinal sequences of multi-thread and single-thread, at the beginning of the 19th century, to mainly single-thread. Total length of multi-thread reaches shifted from 31% in 1805, to 22% in 1847, to 8% in the 1855. On the contrary, sinuous and straight patterns increased from 26% (in 1803) to 62% (in 1847), up to 77% of the whole studied river length in 1855. Nevertheless, overall increases in channel braiding and mean channel width was observed downstream of the confluences with the main tributaries. Analysis of the evolutionary trajectory of channel morphology and of controlling factors, shows that human disturbances have largely prevailed over climatic influences in constraining the Adige's dynamics and morphology, mainly because of channelization causing sharp changes in channel pattern and width that occurred during the 19th century.
Halvade-RNA: Parallel variant calling from transcriptomic data using MapReduce.
Decap, Dries; Reumers, Joke; Herzeel, Charlotte; Costanza, Pascal; Fostier, Jan
2017-01-01
Given the current cost-effectiveness of next-generation sequencing, the amount of DNA-seq and RNA-seq data generated is ever increasing. One of the primary objectives of NGS experiments is calling genetic variants. While highly accurate, most variant calling pipelines are not optimized to run efficiently on large data sets. However, as variant calling in genomic data has become common practice, several methods have been proposed to reduce runtime for DNA-seq analysis through the use of parallel computing. Determining the effectively expressed variants from transcriptomics (RNA-seq) data has only recently become possible, and as such does not yet benefit from efficiently parallelized workflows. We introduce Halvade-RNA, a parallel, multi-node RNA-seq variant calling pipeline based on the GATK Best Practices recommendations. Halvade-RNA makes use of the MapReduce programming model to create and manage parallel data streams on which multiple instances of existing tools such as STAR and GATK operate concurrently. Whereas the single-threaded processing of a typical RNA-seq sample requires ∼28h, Halvade-RNA reduces this runtime to ∼2h using a small cluster with two 20-core machines. Even on a single, multi-core workstation, Halvade-RNA can significantly reduce runtime compared to using multi-threading, thus providing for a more cost-effective processing of RNA-seq data. Halvade-RNA is written in Java and uses the Hadoop MapReduce 2.0 API. It supports a wide range of distributions of Hadoop, including Cloudera and Amazon EMR.
RACER: Effective Race Detection Using AspectJ
NASA Technical Reports Server (NTRS)
Bodden, Eric; Havelund, Klaus
2008-01-01
The limits of coding with joint constraints on detected and undetected error rates Programming errors occur frequently in large software systems, and even more so if these systems are concurrent. In the past, researchers have developed specialized programs to aid programmers detecting concurrent programming errors such as deadlocks, livelocks, starvation and data races. In this work we propose a language extension to the aspect-oriented programming language AspectJ, in the form of three new built-in pointcuts, lock(), unlock() and may be Shared(), which allow programmers to monitor program events where locks are granted or handed back, and where values are accessed that may be shared amongst multiple Java threads. We decide thread-locality using a static thread-local objects analysis developed by others. Using the three new primitive pointcuts, researchers can directly implement efficient monitoring algorithms to detect concurrent programming errors online. As an example, we expose a new algorithm which we call RACER, an adoption of the well-known ERASER algorithm to the memory model of Java. We implemented the new pointcuts as an extension to the Aspect Bench Compiler, implemented the RACER algorithm using this language extension and then applied the algorithm to the NASA K9 Rover Executive. Our experiments proved our implementation very effective. In the Rover Executive RACER finds 70 data races. Only one of these races was previously known.We further applied the algorithm to two other multi-threaded programs written by Computer Science researchers, in which we found races as well.
Data Fusion and Visualization with the OpenEarth Framework (OEF)
NASA Astrophysics Data System (ADS)
Nadeau, D. R.; Baru, C.; Fouch, M. J.; Crosby, C. J.
2010-12-01
Data fusion is an increasingly important problem to solve as we strive to integrate data from multiple sources and build better models of the complex processes operating at the Earth’s surface and its interior. These data are often large, multi-dimensional, and subject to differing conventions for file formats, data structures, coordinate spaces, units of measure, and metadata organization. When visualized, these data require differing, and often conflicting, conventions for visual representations, dimensionality, icons, color schemes, labeling, and interaction. These issues make the visualization of fused Earth science data particularly difficult. The OpenEarth Framework (OEF) is an open-source data fusion and visualization suite of software being developed at the Supercomputer Center at the University of California, San Diego. Funded by the NSF, the project is leveraging virtual globe technology from NASA’s WorldWind to create interactive 3D visualization tools that combine layered data from a variety of sources to create a holistic view of features at, above, and beneath the Earth’s surface. The OEF architecture is cross-platform, multi-threaded, modular, and based upon Java. The OEF’s modular approach yields a collection of compatible mix-and-match components for assembling custom applications. Available modules support file format handling, web service communications, data management, data filtering, user interaction, and 3D visualization. File parsers handle a variety of formal and de facto standard file formats. Each one imports data into a general-purpose data representation that supports multidimensional grids, topography, points, lines, polygons, images, and more. From there these data then may be manipulated, merged, filtered, reprojected, and visualized. Visualization features support conventional and new visualization techniques for looking at topography, tomography, maps, and feature geometry. 3D grid data such as seismic tomography may be sliced by multiple oriented cutting planes and isosurfaced to create 3D skins that trace feature boundaries within the data. Topography may be overlaid with satellite imagery along with data such as gravity and magnetics measurements. Multiple data sets may be visualized simultaneously using overlapping layers and a common 3D+time coordinate space. Data management within the OEF handles and hides the quirks of differing file formats, web protocols, storage structures, coordinate spaces, and metadata representations. Derived data are computed automatically to support interaction and visualization while the original data is left unchanged in its original form. Data is cached for better memory and network efficiency, and all visualization is accelerated by 3D graphics hardware found on today’s computers. The OpenEarth Framework project is currently prototyping the software for use in the visualization, and integration of continental scale geophysical data being produced by EarthScope-related research in the Western US. The OEF is providing researchers with new ways to display and interrogate their data and is anticipated to be a valuable tool for future EarthScope-related research.
Rapid evaluation and quality control of next generation sequencing data with FaQCs.
Lo, Chien-Chi; Chain, Patrick S G
2014-11-19
Next generation sequencing (NGS) technologies that parallelize the sequencing process and produce thousands to millions, or even hundreds of millions of sequences in a single sequencing run, have revolutionized genomic and genetic research. Because of the vagaries of any platform's sequencing chemistry, the experimental processing, machine failure, and so on, the quality of sequencing reads is never perfect, and often declines as the read is extended. These errors invariably affect downstream analysis/application and should therefore be identified early on to mitigate any unforeseen effects. Here we present a novel FastQ Quality Control Software (FaQCs) that can rapidly process large volumes of data, and which improves upon previous solutions to monitor the quality and remove poor quality data from sequencing runs. Both the speed of processing and the memory footprint of storing all required information have been optimized via algorithmic and parallel processing solutions. The trimmed output compared side-by-side with the original data is part of the automated PDF output. We show how this tool can help data analysis by providing a few examples, including an increased percentage of reads recruited to references, improved single nucleotide polymorphism identification as well as de novo sequence assembly metrics. FaQCs combines several features of currently available applications into a single, user-friendly process, and includes additional unique capabilities such as filtering the PhiX control sequences, conversion of FASTQ formats, and multi-threading. The original data and trimmed summaries are reported within a variety of graphics and reports, providing a simple way to do data quality control and assurance.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lo, Chien -Chi; Chain, Patrick S. G.
Background: Next generation sequencing (NGS) technologies that parallelize the sequencing process and produce thousands to millions, or even hundreds of millions of sequences in a single sequencing run, have revolutionized genomic and genetic research. Because of the vagaries of any platform's sequencing chemistry, the experimental processing, machine failure, and so on, the quality of sequencing reads is never perfect, and often declines as the read is extended. These errors invariably affect downstream analysis/application and should therefore be identified early on to mitigate any unforeseen effects. Results: Here we present a novel FastQ Quality Control Software (FaQCs) that can rapidly processmore » large volumes of data, and which improves upon previous solutions to monitor the quality and remove poor quality data from sequencing runs. Both the speed of processing and the memory footprint of storing all required information have been optimized via algorithmic and parallel processing solutions. The trimmed output compared side-by-side with the original data is part of the automated PDF output. We show how this tool can help data analysis by providing a few examples, including an increased percentage of reads recruited to references, improved single nucleotide polymorphism identification as well as de novo sequence assembly metrics. Conclusion: FaQCs combines several features of currently available applications into a single, user-friendly process, and includes additional unique capabilities such as filtering the PhiX control sequences, conversion of FASTQ formats, and multi-threading. The original data and trimmed summaries are reported within a variety of graphics and reports, providing a simple way to do data quality control and assurance.« less
OpenGeoSys-GEMS: Hybrid parallelization of a reactive transport code with MPI and threads
NASA Astrophysics Data System (ADS)
Kosakowski, G.; Kulik, D. A.; Shao, H.
2012-04-01
OpenGeoSys-GEMS is a generic purpose reactive transport code based on the operator splitting approach. The code couples the Finite-Element groundwater flow and multi-species transport modules of the OpenGeoSys (OGS) project (http://www.ufz.de/index.php?en=18345) with the GEM-Selektor research package to model thermodynamic equilibrium of aquatic (geo)chemical systems utilizing the Gibbs Energy Minimization approach (http://gems.web.psi.ch/). The combination of OGS and the GEM-Selektor kernel (GEMS3K) is highly flexible due to the object-oriented modular code structures and the well defined (memory based) data exchange modules. Like other reactive transport codes, the practical applicability of OGS-GEMS is often hampered by the long calculation time and large memory requirements. • For realistic geochemical systems which might include dozens of mineral phases and several (non-ideal) solid solutions the time needed to solve the chemical system with GEMS3K may increase exceptionally. • The codes are coupled in a sequential non-iterative loop. In order to keep the accuracy, the time step size is restricted. In combination with a fine spatial discretization the time step size may become very small which increases calculation times drastically even for small 1D problems. • The current version of OGS is not optimized for memory use and the MPI version of OGS does not distribute data between nodes. Even for moderately small 2D problems the number of MPI processes that fit into memory of up-to-date workstations or HPC hardware is limited. One strategy to overcome the above mentioned restrictions of OGS-GEMS is to parallelize the coupled code. For OGS a parallelized version already exists. It is based on a domain decomposition method implemented with MPI and provides a parallel solver for fluid and mass transport processes. In the coupled code, after solving fluid flow and solute transport, geochemical calculations are done in form of a central loop over all finite element nodes with calls to GEMS3K and consecutive calculations of changed material parameters. In a first step the existing MPI implementation was utilized to parallelize this loop. Calculations were split between the MPI processes and afterwards data was synchronized by using MPI communication routines. Furthermore, multi-threaded calculation of the loop was implemented with help of the boost thread library (http://www.boost.org). This implementation provides a flexible environment to distribute calculations between several threads. For each MPI process at least one and up to several dozens of worker threads are spawned. These threads do not replicate the complete OGS-GEM data structure and use only a limited amount of memory. Calculation of the central geochemical loop is shared between all threads. Synchronization between the threads is done by barrier commands. The overall number of local threads times MPI processes should match the number of available computing nodes. The combination of multi-threading and MPI provides an effective and flexible environment to speed up OGS-GEMS calculations while limiting the required memory use. Test calculations on different hardware show that for certain types of applications tremendous speedups are possible.
Monte Carlo study of four dimensional binary hard hypersphere mixtures
NASA Astrophysics Data System (ADS)
Bishop, Marvin; Whitlock, Paula A.
2012-01-01
A multithreaded Monte Carlo code was used to study the properties of binary mixtures of hard hyperspheres in four dimensions. The ratios of the diameters of the hyperspheres examined were 0.4, 0.5, 0.6, and 0.8. Many total densities of the binary mixtures were investigated. The pair correlation functions and the equations of state were determined and compared with other simulation results and theoretical predictions. At lower diameter ratios the pair correlation functions of the mixture agree with the pair correlation function of a one component fluid at an appropriately scaled density. The theoretical results for the equation of state compare well to the Monte Carlo calculations for all but the highest densities studied.
Abraham, Mark James; Murtola, Teemu; Schulz, Roland; ...
2015-07-15
GROMACS is one of the most widely used open-source and free software codes in chemistry, used primarily for dynamical simulations of biomolecules. It provides a rich set of calculation types, preparation and analysis tools. Several advanced techniques for free-energy calculations are supported. In version 5, it reaches new performance heights, through several new and enhanced parallelization algorithms. This work on every level; SIMD registers inside cores, multithreading, heterogeneous CPU–GPU acceleration, state-of-the-art 3D domain decomposition, and ensemble-level parallelization through built-in replica exchange and the separate Copernicus framework. Finally, the latest best-in-class compressed trajectory storage format is supported.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Abraham, Mark James; Murtola, Teemu; Schulz, Roland
GROMACS is one of the most widely used open-source and free software codes in chemistry, used primarily for dynamical simulations of biomolecules. It provides a rich set of calculation types, preparation and analysis tools. Several advanced techniques for free-energy calculations are supported. In version 5, it reaches new performance heights, through several new and enhanced parallelization algorithms. This work on every level; SIMD registers inside cores, multithreading, heterogeneous CPU–GPU acceleration, state-of-the-art 3D domain decomposition, and ensemble-level parallelization through built-in replica exchange and the separate Copernicus framework. Finally, the latest best-in-class compressed trajectory storage format is supported.
Electromagnetic Physics Models for Parallel Computing Architectures
NASA Astrophysics Data System (ADS)
Amadio, G.; Ananya, A.; Apostolakis, J.; Aurora, A.; Bandieramonte, M.; Bhattacharyya, A.; Bianchini, C.; Brun, R.; Canal, P.; Carminati, F.; Duhem, L.; Elvira, D.; Gheata, A.; Gheata, M.; Goulas, I.; Iope, R.; Jun, S. Y.; Lima, G.; Mohanty, A.; Nikitina, T.; Novak, M.; Pokorski, W.; Ribon, A.; Seghal, R.; Shadura, O.; Vallecorsa, S.; Wenzel, S.; Zhang, Y.
2016-10-01
The recent emergence of hardware architectures characterized by many-core or accelerated processors has opened new opportunities for concurrent programming models taking advantage of both SIMD and SIMT architectures. GeantV, a next generation detector simulation, has been designed to exploit both the vector capability of mainstream CPUs and multi-threading capabilities of coprocessors including NVidia GPUs and Intel Xeon Phi. The characteristics of these architectures are very different in terms of the vectorization depth and type of parallelization needed to achieve optimal performance. In this paper we describe implementation of electromagnetic physics models developed for parallel computing architectures as a part of the GeantV project. Results of preliminary performance evaluation and physics validation are presented as well.
MULTI-CORE AND OPTICAL PROCESSOR RELATED APPLICATIONS RESEARCH AT OAK RIDGE NATIONAL LABORATORY
DOE Office of Scientific and Technical Information (OSTI.GOV)
Barhen, Jacob; Kerekes, Ryan A; ST Charles, Jesse Lee
2008-01-01
High-speed parallelization of common tasks holds great promise as a low-risk approach to achieving the significant increases in signal processing and computational performance required for next generation innovations in reconfigurable radio systems. Researchers at the Oak Ridge National Laboratory have been working on exploiting the parallelization offered by this emerging technology and applying it to a variety of problems. This paper will highlight recent experience with four different parallel processors applied to signal processing tasks that are directly relevant to signal processing required for SDR/CR waveforms. The first is the EnLight Optical Core Processor applied to matched filter (MF) correlationmore » processing via fast Fourier transform (FFT) of broadband Dopplersensitive waveforms (DSW) using active sonar arrays for target tracking. The second is the IBM CELL Broadband Engine applied to 2-D discrete Fourier transform (DFT) kernel for image processing and frequency domain processing. And the third is the NVIDIA graphical processor applied to document feature clustering. EnLight Optical Core Processor. Optical processing is inherently capable of high-parallelism that can be translated to very high performance, low power dissipation computing. The EnLight 256 is a small form factor signal processing chip (5x5 cm2) with a digital optical core that is being developed by an Israeli startup company. As part of its evaluation of foreign technology, ORNL's Center for Engineering Science Advanced Research (CESAR) had access to a precursor EnLight 64 Alpha hardware for a preliminary assessment of capabilities in terms of large Fourier transforms for matched filter banks and on applications related to Doppler-sensitive waveforms. This processor is optimized for array operations, which it performs in fixed-point arithmetic at the rate of 16 TeraOPS at 8-bit precision. This is approximately 1000 times faster than the fastest DSP available today. The optical core performs the matrix-vector multiplications, where the nominal matrix size is 256x256. The system clock is 125MHz. At each clock cycle, 128K multiply-and-add operations per second (OPS) are carried out, which yields a peak performance of 16 TeraOPS. IBM Cell Broadband Engine. The Cell processor is the extraordinary resulting product of 5 years of sustained, intensive R&D collaboration (involving over $400M investment) between IBM, Sony, and Toshiba. Its architecture comprises one multithreaded 64-bit PowerPC processor element (PPE) with VMX capabilities and two levels of globally coherent cache, and 8 synergistic processor elements (SPEs). Each SPE consists of a processor (SPU) designed for streaming workloads, local memory, and a globally coherent direct memory access (DMA) engine. Computations are performed in 128-bit wide single instruction multiple data streams (SIMD). An integrated high-bandwidth element interconnect bus (EIB) connects the nine processors and their ports to external memory and to system I/O. The Applied Software Engineering Research (ASER) Group at the ORNL is applying the Cell to a variety of text and image analysis applications. Research on Cell-equipped PlayStation3 (PS3) consoles has led to the development of a correlation-based image recognition engine that enables a single PS3 to process images at more than 10X the speed of state-of-the-art single-core processors. NVIDIA Graphics Processing Units. The ASER group is also employing the latest NVIDIA graphical processing units (GPUs) to accelerate clustering of thousands of text documents using recently developed clustering algorithms such as document flocking and affinity propagation.« less
NASA Astrophysics Data System (ADS)
Kozhikkottu, Vivek J.
The scaling of integrated circuits into the nanometer regime has led to variations emerging as a primary concern for designers of integrated circuits. Variations are an inevitable consequence of the semiconductor manufacturing process, and also arise due to the side-effects of operation of integrated circuits (voltage, temperature, and aging). Conventional design approaches, which are based on design corners or worst-case scenarios, leave designers with an undesirable choice between the considerable overheads associated with over-design and significantly reduced manufacturing yield. Techniques for variation-tolerant design at the logic, circuit and layout levels of the design process have been developed and are in commercial use. However, with the incessant increase in variations due to technology scaling and design trends such as near-threshold computing, these techniques are no longer sufficient to contain the effects of variations, and there is a need to address variations at all stages of design. This thesis addresses the problem of variation-tolerant design at the earliest stages of the design process, where the system-level design decisions that are made can have a very significant impact. There are two key aspects to making system-level design variation-aware. First, analysis techniques must be developed to project the impact of variations on system-level metrics such as application performance and energy. Second, variation-tolerant design techniques need to be developed to absorb the residual impact of variations (that cannot be contained through lower-level techniques). In this thesis, we address both these facets by developing robust and scalable variation-aware analysis and variation mitigation techniques at the system level. The first contribution of this thesis is a variation-aware system-level performance analysis framework. We address the key challenge of translating the per-component clock frequency distributions into a system-level application performance distribution. This task is particularly complex and challenging due to the inter-dependencies between components' execution, indirect effects of shared resources, and interactions between multiple system-level "execution paths". We argue that accurate variation-aware performance analysis requires Monte-Carlo based repeated system execution. Our proposed analysis framework leverages emulation to significantly speedup performance analysis without sacrificing the generality and accuracy achieved by Monte-Carlo based simulations. Our experiments show performance improvements of around 60x compared to state-of-the-art hardware-software co-simulation tools and also underscore the framework's potential to enable variation-aware design and exploration at the system level. Our second contribution addresses the problem of designing variation-tolerant SoCs using recovery based design, a popular circuit design paradigm that addresses variations by eliminating guard-bands and operating circuits at close to "zero margins" while detecting and recovering from timing errors. While previous efforts have demonstrated the potential benefits of recovery based design, we identify several challenges that need to be addressed in order to apply this technique to SoCs. We present a systematic design framework to apply recovery based design at the system level. We propose to partition SoCs into "recovery islands", wherein each recovery island consists of one or more SoC components that can recover independent of the rest of the SoC. We present a variation-aware design methodology that partitions a given SoC into recovery islands and computes the optimal operating points for each island, taking into account the various trade-offs involved. Our experiments demonstrate that the proposed design framework achieves an average of 32% energy savings over conventional worst-case designs, with negligible losses in performance. The third contribution of this thesis introduces disproportionate allocation of shared system resources as a means to combat the adverse impact of within-die variations on multi-core platforms. For multi-threaded programs executing on variation-impacted multi-cores platforms, we make the key observation that thread performance is not only a function of the frequency of the core on which it is executing on, but also depends upon the amount of shared system resources allocated to it. We utilize this insight to design a variation-aware runtime scheme which allocates the ways of a last-level shared L2 cache amongst the different cores/threads of a multi-core platform taking into account both application characteristics as well as chip specific variation profiles. Our experiments on 100 quad-core chips, each with a distinct variation profile, shows on an average 15% performance improvements for a suite of multi-threaded benchmarks. Our final contribution investigates the variation-tolerant design of domain-specific accelerators and demonstrates how the unique architectural properties of these accelerators can be leveraged to create highly effective variation tolerance mechanisms. We explore this concept through the variation-tolerant design of a vector processor that efficiently executes applications from the domains of recognition, mining and synthesis (RMS). We develop a novel design approach for variation tolerance, which leverages the unique nature of the vector reduction operations performed by this processor to effectively predict and preempt the occurrence of timing errors under variations and subsequently restore the correct output at the end of each vector reduction operation. We implement the above predict, preempt and restore operations by suitably enhancing the processor hardware and the application software and demonstrate considerable energy benefits (on an average 32%) across six applications from the domains of RMS. In conclusion, our work provides system designers with powerful tools and mechanisms in their efforts to combat variations, resulting in improved designer productivity and variation-tolerant systems.
Evidence for Nonuniform Heating of Coronal Loops Inferred from Multithread Modeling of TRACE Data
NASA Astrophysics Data System (ADS)
Aschwanden, Markus J.; Nightingale, Richard W.; Alexander, David
2000-10-01
The temperature Te(s) and density structure ne(s) of active region loops in EUV observed with TRACE is modeled with a multithread model, synthesized from the summed emission of many loop threads that have a distribution of maximum temperatures and that satisfy the steady state Rosner-Tucker-Vaiana (RTV) scaling law, modified by Serio et al. for gravitational stratification (called RTVSp in the following). In a recent Letter, Reale & Peres demonstrated that this method can explain the almost isothermal appearance of TRACE loops (observed by Lenz et al.) as derived from the filter-ratio method. From model-fitting of the 171 and 195 Å fluxes of 41 loops, which have loop half-lengths in the range of L=4-320 Mm, we find that (1) the EUV loops consist of near-isothermal loop threads with substantially smaller temperature gradients than are predicted by the RTVSp model; (2) the loop base pressure, p0~0.3+/-0.1 dynes cm-2, is independent of the loop length L, and it agrees with the RTVSp model for the shortest loops but exceeds the RTVSp model up to a factor of 35 for the largest loops; and (3) the pressure scale height is consistent with hydrostatic equilibrium for the shortest loops but exceeds the temperature scale height up to a factor of ~3 for the largest loops. The data indicate that cool EUV loops in the temperature range of Te~0.8-1.6 MK cannot be explained with the static steady state RTVSp model in terms of uniform heating but are fully consistent with Serio's model in the case of nonuniform heating (RTVSph), with heating scale heights in the range of sH=17+/-6 Mm. This heating function provides almost uniform heating for small loops (L<~20 Mm), but restricts heating to the footpoints of large loops (L~50-300 Mm).
NASA Astrophysics Data System (ADS)
Brzuszek, Marcin; Daniluk, Andrzej
2006-11-01
Writing a concurrent program can be more difficult than writing a sequential program. Programmer needs to think about synchronisation, race conditions and shared variables. Transactions help reduce the inconvenience of using threads. A transaction is an abstraction, which allows programmers to group a sequence of actions on the program into a logical, higher-level computation unit. This paper presents multithreaded versions of the GROWTH program, which allow to calculate the layer coverages during the growth of thin epitaxial films and the corresponding RHEED intensities according to the kinematical approximation. The presented programs also contain graphical user interfaces, which enable displaying program data at run-time. New version program summaryTitles of programs:GROWTHGr, GROWTH06 Catalogue identifier:ADVL_v2_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADVL_v2_0 Program obtainable from:CPC Program Library, Queen's University of Belfast, N. Ireland Catalogue identifier of previous version:ADVL Does the new version supersede the original program:No Computer for which the new version is designed and others on which it has been tested: Pentium-based PC Operating systems or monitors under which the new version has been tested: Windows 9x, XP, NT Programming language used:Object Pascal Memory required to execute with typical data:More than 1 MB Number of bits in a word:64 bits Number of processors used:1 No. of lines in distributed program, including test data, etc.:20 931 Number of bytes in distributed program, including test data, etc.: 1 311 268 Distribution format:tar.gz Nature of physical problem: The programs compute the RHEED intensities during the growth of thin epitaxial structures prepared using the molecular beam epitaxy (MBE). The computations are based on the use of kinematical diffraction theory [P.I. Cohen, G.S. Petrich, P.R. Pukite, G.J. Whaley, A.S. Arrott, Surf. Sci. 216 (1989) 222. [1
NASA Astrophysics Data System (ADS)
Nguyen, Khoa Dang; Ha, Cheolkeun
2018-04-01
Hardware-in-the-loop simulation (HILS) is well known as an effective approach in the design of unmanned aerial vehicles (UAV) systems, enabling engineers to test the control algorithm on a hardware board with a UAV model on the software. Performance of HILS is determined by performances of the control algorithm, the developed model, and the signal transfer between the hardware and software. The result of HILS is degraded if any signal could not be transferred to the correct destination. Therefore, this paper aims to develop a middleware software to secure communications in HILS system for testing the operation of a quad-rotor UAV. In our HILS, the Gazebo software is used to generate a nonlinear six-degrees-of-freedom (6DOF) model, sensor model, and 3D visualization for the quad-rotor UAV. Meanwhile, the flight control algorithm is designed and implemented on the Pixhawk hardware. New middleware software, referred to as the control application software (CAS), is proposed to ensure the connection and data transfer between Gazebo and Pixhawk using the multithread structure in Qt Creator. The CAS provides a graphical user interface (GUI), allowing the user to monitor the status of packet transfer, and perform the flight control commands and the real-time tuning parameters for the quad-rotor UAV. Numerical implementations have been performed to prove the effectiveness of the middleware software CAS suggested in this paper.
EPICS as a MARTe Configuration Environment
NASA Astrophysics Data System (ADS)
Valcarcel, Daniel F.; Barbalace, Antonio; Neto, André; Duarte, André S.; Alves, Diogo; Carvalho, Bernardo B.; Carvalho, Pedro J.; Sousa, Jorge; Fernandes, Horácio; Goncalves, Bruno; Sartori, Filippo; Manduchi, Gabriele
2011-08-01
The Multithreaded Application Real-Time executor (MARTe) software provides an environment for the hard real-time execution of codes while leveraging a standardized algorithm development process. The Experimental Physics and Industrial Control System (EPICS) software allows the deployment and remote monitoring of networked control systems. Channel Access (CA) is the protocol that enables the communication between EPICS distributed components. It allows to set and monitor process variables across the network belonging to different systems. The COntrol and Data Acquisition and Communication (CODAC) system for the ITER Tokamak will be EPICS based and will be used to monitor and live configure the plant controllers. The reconfiguration capability in a hard real-time system requires strict latencies from the request to the actuation and it is a key element in the design of the distributed control algorithm. Presently, MARTe and its objects are configured using a well-defined structured language. After each configuration, all objects are destroyed and the system rebuilt, following the strong hard real-time rule that a real-time system in online mode must behave in a strictly deterministic fashion. This paper presents the design and considerations to use MARTe as a plant controller and enable it to be EPICS monitorable and configurable without disturbing the execution at any time, in particular during a plasma discharge. The solutions designed for this will be presented and discussed.
A software module for implementing auditory and visual feedback on a video-based eye tracking system
NASA Astrophysics Data System (ADS)
Rosanlall, Bharat; Gertner, Izidor; Geri, George A.; Arrington, Karl F.
2016-05-01
We describe here the design and implementation of a software module that provides both auditory and visual feedback of the eye position measured by a commercially available eye tracking system. The present audio-visual feedback module (AVFM) serves as an extension to the Arrington Research ViewPoint EyeTracker, but it can be easily modified for use with other similar systems. Two modes of audio feedback and one mode of visual feedback are provided in reference to a circular area-of-interest (AOI). Auditory feedback can be either a click tone emitted when the user's gaze point enters or leaves the AOI, or a sinusoidal waveform with frequency inversely proportional to the distance from the gaze point to the center of the AOI. Visual feedback is in the form of a small circular light patch that is presented whenever the gaze-point is within the AOI. The AVFM processes data that are sent to a dynamic-link library by the EyeTracker. The AVFM's multithreaded implementation also allows real-time data collection (1 kHz sampling rate) and graphics processing that allow display of the current/past gaze-points as well as the AOI. The feedback provided by the AVFM described here has applications in military target acquisition and personnel training, as well as in visual experimentation, clinical research, marketing research, and sports training.
Introducing concurrency in the Gaudi data processing framework
NASA Astrophysics Data System (ADS)
Clemencic, Marco; Hegner, Benedikt; Mato, Pere; Piparo, Danilo
2014-06-01
In the past, the increasing demands for HEP processing resources could be fulfilled by the ever increasing clock-frequencies and by distributing the work to more and more physical machines. Limitations in power consumption of both CPUs and entire data centres are bringing an end to this era of easy scalability. To get the most CPU performance per watt, future hardware will be characterised by less and less memory per processor, as well as thinner, more specialized and more numerous cores per die, and rather heterogeneous resources. To fully exploit the potential of the many cores, HEP data processing frameworks need to allow for parallel execution of reconstruction or simulation algorithms on several events simultaneously. We describe our experience in introducing concurrency related capabilities into Gaudi, a generic data processing software framework, which is currently being used by several HEP experiments, including the ATLAS and LHCb experiments at the LHC. After a description of the concurrent framework and the most relevant design choices driving its development, we describe the behaviour of the framework in a more realistic environment, using a subset of the real LHCb reconstruction workflow, and present our strategy and the used tools to validate the physics outcome of the parallel framework against the results of the present, purely sequential LHCb software. We then summarize the measurement of the code performance of the multithreaded application in terms of memory and CPU usage.
Li, Chuan; Petukh, Marharyta; Li, Lin; Alexov, Emil
2013-08-15
Due to the enormous importance of electrostatics in molecular biology, calculating the electrostatic potential and corresponding energies has become a standard computational approach for the study of biomolecules and nano-objects immersed in water and salt phase or other media. However, the electrostatics of large macromolecules and macromolecular complexes, including nano-objects, may not be obtainable via explicit methods and even the standard continuum electrostatics methods may not be applicable due to high computational time and memory requirements. Here, we report further development of the parallelization scheme reported in our previous work (Li, et al., J. Comput. Chem. 2012, 33, 1960) to include parallelization of the molecular surface and energy calculations components of the algorithm. The parallelization scheme utilizes different approaches such as space domain parallelization, algorithmic parallelization, multithreading, and task scheduling, depending on the quantity being calculated. This allows for efficient use of the computing resources of the corresponding computer cluster. The parallelization scheme is implemented in the popular software DelPhi and results in speedup of several folds. As a demonstration of the efficiency and capability of this methodology, the electrostatic potential, and electric field distributions are calculated for the bovine mitochondrial supercomplex illustrating their complex topology, which cannot be obtained by modeling the supercomplex components alone. Copyright © 2013 Wiley Periodicals, Inc.
elPrep: High-Performance Preparation of Sequence Alignment/Map Files for Variant Calling
Decap, Dries; Fostier, Jan; Reumers, Joke
2015-01-01
elPrep is a high-performance tool for preparing sequence alignment/map files for variant calling in sequencing pipelines. It can be used as a replacement for SAMtools and Picard for preparation steps such as filtering, sorting, marking duplicates, reordering contigs, and so on, while producing identical results. What sets elPrep apart is its software architecture that allows executing preparation pipelines by making only a single pass through the data, no matter how many preparation steps are used in the pipeline. elPrep is designed as a multithreaded application that runs entirely in memory, avoids repeated file I/O, and merges the computation of several preparation steps to significantly speed up the execution time. For example, for a preparation pipeline of five steps on a whole-exome BAM file (NA12878), we reduce the execution time from about 1:40 hours, when using a combination of SAMtools and Picard, to about 15 minutes when using elPrep, while utilising the same server resources, here 48 threads and 23GB of RAM. For the same pipeline on whole-genome data (NA12878), elPrep reduces the runtime from 24 hours to less than 5 hours. As a typical clinical study may contain sequencing data for hundreds of patients, elPrep can remove several hundreds of hours of computing time, and thus substantially reduce analysis time and cost. PMID:26182406
Street Viewer: An Autonomous Vision Based Traffic Tracking System.
Bottino, Andrea; Garbo, Alessandro; Loiacono, Carmelo; Quer, Stefano
2016-06-03
The development of intelligent transportation systems requires the availability of both accurate traffic information in real time and a cost-effective solution. In this paper, we describe Street Viewer, a system capable of analyzing the traffic behavior in different scenarios from images taken with an off-the-shelf optical camera. Street Viewer operates in real time on embedded hardware architectures with limited computational resources. The system features a pipelined architecture that, on one side, allows one to exploit multi-threading intensively and, on the other side, allows one to improve the overall accuracy and robustness of the system, since each layer is aimed at refining for the following layers the information it receives as input. Another relevant feature of our approach is that it is self-adaptive. During an initial setup, the application runs in learning mode to build a model of the flow patterns in the observed area. Once the model is stable, the system switches to the on-line mode where the flow model is used to count vehicles traveling on each lane and to produce a traffic information summary. If changes in the flow model are detected, the system switches back autonomously to the learning mode. The accuracy and the robustness of the system are analyzed in the paper through experimental results obtained on several different scenarios and running the system for long periods of time.
Visual Analytics for Power Grid Contingency Analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wong, Pak C.; Huang, Zhenyu; Chen, Yousu
2014-01-20
Contingency analysis is the process of employing different measures to model scenarios, analyze them, and then derive the best response to remove the threats. This application paper focuses on a class of contingency analysis problems found in the power grid management system. A power grid is a geographically distributed interconnected transmission network that transmits and delivers electricity from generators to end users. The power grid contingency analysis problem is increasingly important because of both the growing size of the underlying raw data that need to be analyzed and the urgency to deliver working solutions in an aggressive timeframe. Failure tomore » do so may bring significant financial, economic, and security impacts to all parties involved and the society at large. The paper presents a scalable visual analytics pipeline that transforms about 100 million contingency scenarios to a manageable size and form for grid operators to examine different scenarios and come up with preventive or mitigation strategies to address the problems in a predictive and timely manner. Great attention is given to the computational scalability, information scalability, visual scalability, and display scalability issues surrounding the data analytics pipeline. Most of the large-scale computation requirements of our work are conducted on a Cray XMT multi-threaded parallel computer. The paper demonstrates a number of examples using western North American power grid models and data.« less
Zhang, Bo; Yang, Xiang; Yang, Fei; Yang, Xin; Qin, Chenghu; Han, Dong; Ma, Xibo; Liu, Kai; Tian, Jie
2010-09-13
In molecular imaging (MI), especially the optical molecular imaging, bioluminescence tomography (BLT) emerges as an effective imaging modality for small animal imaging. The finite element methods (FEMs), especially the adaptive finite element (AFE) framework, play an important role in BLT. The processing speed of the FEMs and the AFE framework still needs to be improved, although the multi-thread CPU technology and the multi CPU technology have already been applied. In this paper, we for the first time introduce a new kind of acceleration technology to accelerate the AFE framework for BLT, using the graphics processing unit (GPU). Besides the processing speed, the GPU technology can get a balance between the cost and performance. The CUBLAS and CULA are two main important and powerful libraries for programming on NVIDIA GPUs. With the help of CUBLAS and CULA, it is easy to code on NVIDIA GPU and there is no need to worry about the details about the hardware environment of a specific GPU. The numerical experiments are designed to show the necessity, effect and application of the proposed CUBLAS and CULA based GPU acceleration. From the results of the experiments, we can reach the conclusion that the proposed CUBLAS and CULA based GPU acceleration method can improve the processing speed of the AFE framework very much while getting a balance between cost and performance.
Final report for the Tera Computer TTI CRADA
DOE Office of Scientific and Technical Information (OSTI.GOV)
Davidson, G.S.; Pavlakos, C.; Silva, C.
1997-01-01
Tera Computer and Sandia National Laboratories have completed a CRADA, which examined the Tera Multi-Threaded Architecture (MTA) for use with large codes of importance to industry and DOE. The MTA is an innovative architecture that uses parallelism to mask latency between memories and processors. The physical implementation is a parallel computer with high cross-section bandwidth and GaAs processors designed by Tera, which support many small computation threads and fast, lightweight context switches between them. When any thread blocks while waiting for memory accesses to complete, another thread immediately begins execution so that high CPU utilization is maintained. The Tera MTAmore » parallel computer has a single, global address space, which is appealing when porting existing applications to a parallel computer. This ease of porting is further enabled by compiler technology that helps break computations into parallel threads. DOE and Sandia National Laboratories were interested in working with Tera to further develop this computing concept. While Tera Computer would continue the hardware development and compiler research, Sandia National Laboratories would work with Tera to ensure that their compilers worked well with important Sandia codes, most particularly CTH, a shock physics code used for weapon safety computations. In addition to that important code, Sandia National Laboratories would complete research on a robotic path planning code, SANDROS, which is important in manufacturing applications, and would evaluate the MTA performance on this code. Finally, Sandia would work directly with Tera to develop 3D visualization codes, which would be appropriate for use with the MTA. Each of these tasks has been completed to the extent possible, given that Tera has just completed the MTA hardware. All of the CRADA work had to be done on simulators.« less
NASA Astrophysics Data System (ADS)
Brown, R. A.; Pasternack, G. B.; Wallender, W. W.
2014-06-01
The synthesis of artificial landforms is complementary to geomorphic analysis because it affords a reflection on both the characteristics and intrinsic formative processes of real world conditions. Moreover, the applied terminus of geomorphic theory is commonly manifested in the engineering and rehabilitation of riverine landforms where the goal is to create specific processes associated with specific morphology. To date, the synthesis of river topography has been explored outside of geomorphology through artistic renderings, computer science applications, and river rehabilitation design; while within geomorphology it has been explored using morphodynamic modeling, such as one-dimensional simulation of river reach profiles, two-dimensional simulation of river networks, and three-dimensional simulation of subreach scale river morphology. To date, no approach allows geomorphologists, engineers, or river rehabilitation practitioners to create landforms of prescribed conditions. In this paper a method for creating topography of synthetic river valleys is introduced that utilizes a theoretical framework that draws from fluvial geomorphology, computer science, and geometric modeling. Such a method would be valuable to geomorphologists in understanding form-process linkages as well as to engineers and river rehabilitation practitioners in developing design surfaces that can be rapidly iterated. The method introduced herein relies on the discretization of river valley topography into geometric elements associated with overlapping and orthogonal two-dimensional planes such as the planform, profile, and cross section that are represented by mathematical functions, termed geometric element equations. Topographic surfaces can be parameterized independently or dependently using a geomorphic covariance structure between the spatial series of geometric element equations. To illustrate the approach and overall model flexibility examples are provided that are associated with mountain, lowland, and hybrid synthetic river valleys. To conclude, recommended advances such as multithread channels are discussed along with potential applications.
Rapid evaluation and quality control of next generation sequencing data with FaQCs
Lo, Chien -Chi; Chain, Patrick S. G.
2014-12-01
Background: Next generation sequencing (NGS) technologies that parallelize the sequencing process and produce thousands to millions, or even hundreds of millions of sequences in a single sequencing run, have revolutionized genomic and genetic research. Because of the vagaries of any platform's sequencing chemistry, the experimental processing, machine failure, and so on, the quality of sequencing reads is never perfect, and often declines as the read is extended. These errors invariably affect downstream analysis/application and should therefore be identified early on to mitigate any unforeseen effects. Results: Here we present a novel FastQ Quality Control Software (FaQCs) that can rapidly processmore » large volumes of data, and which improves upon previous solutions to monitor the quality and remove poor quality data from sequencing runs. Both the speed of processing and the memory footprint of storing all required information have been optimized via algorithmic and parallel processing solutions. The trimmed output compared side-by-side with the original data is part of the automated PDF output. We show how this tool can help data analysis by providing a few examples, including an increased percentage of reads recruited to references, improved single nucleotide polymorphism identification as well as de novo sequence assembly metrics. Conclusion: FaQCs combines several features of currently available applications into a single, user-friendly process, and includes additional unique capabilities such as filtering the PhiX control sequences, conversion of FASTQ formats, and multi-threading. The original data and trimmed summaries are reported within a variety of graphics and reports, providing a simple way to do data quality control and assurance.« less
NASA Astrophysics Data System (ADS)
Redolfi, M.; Tubino, M.; Bertoldi, W.; Brasington, J.
2016-08-01
Understanding the role of external controls on the morphology of braided rivers is currently limited by the dearth of robust metrics to quantify and distinguish the diversity of channel form. Most existing measures are strongly dependent on river stage and unable to account for the three-dimensional complexity that is apparent in digital terrain models of braided rivers. In this paper, we introduce a simple, stage-independent morphological indicator that enables the analysis of reach-scale regime morphology as a function of slope, discharge, sediment size, and degree of confinement. The index is derived from the bed elevation frequency distribution and characterizes a statistical width-depth curve averaged longitudinally over multiple channel widths. In this way, we define a "synthetic channel" described by a simple parameter that embeds information about the river morphological complexity. Under the assumption of uniform flow, this approach can be extended to provide estimates of the reach-averaged shear stress distribution, bed load flux, and at-a-station-variability of wetted width. We test this approach using data from a wide range of labile channels including 58 flume experiments and three gravel bed braided rivers. Results demonstrate a strong relationship between the unit discharge and the shape of the elevation distribution, which varies between a U shape for typical single-thread confined channels and a Y shape for multithread reaches. Finally, we discuss the use of the metric as a diagnostic index of river condition that may be used to support inferences about the river morphological trajectory.
NASA Tech Briefs, January 2006
NASA Technical Reports Server (NTRS)
2006-01-01
Topics covered include: Semiautonomous Avionics-and-Sensors System for a UAV; Biomimetic/Optical Sensors for Detecting Bacterial Species; System Would Detect Foreign-Object Damage in Turbofan Engine; Detection of Water Hazards for Autonomous Robotic Vehicles; Fuel Cells Utilizing Oxygen From Air at Low Pressures; Hybrid Ion-Detector/Data-Acquisition System for a TOF-MS; Spontaneous-Desorption Ionizer for a TOF-MS; Equipment for On-Wafer Testing From 220 to 325 GHz; Computing Isentropic Flow Properties of Air/R-134a Mixtures; Java Mission Evaluation Workstation System; Using a Quadtree Algorithm To Assess Line of Sight; Software for Automated Generation of Cartesian Meshes; Optics Program Modified for Multithreaded Parallel Computing; Programs for Testing Processor-in-Memory Computing Systems; PVM Enhancement for Beowulf Multiple-Processor Nodes; Ion-Exclusion Chromatography for Analyzing Organics in Water; Selective Plasma Deposition of Fluorocarbon Films on SAMs; Water-Based Pressure-Sensitive Paints; System Finds Horizontal Location of Center of Gravity; Predicting Tail Buffet Loads of a Fighter Airplane; Water Containment Systems for Testing High-Speed Flywheels; Vapor-Compression Heat Pumps for Operation Aboard Spacecraft; Multistage Electrophoretic Separators; Recovering Residual Xenon Propellant for an Ion Propulsion System; Automated Solvent Seaming of Large Polyimide Membranes; Manufacturing Precise, Lightweight Paraboloidal Mirrors; Analysis of Membrane Lipids of Airborne Micro-Organisms; Noninvasive Diagnosis of Coronary Artery Disease Using 12-Lead High-Frequency Electrocardiograms; Dual-Laser-Pulse Ignition; Enhanced-Contrast Viewing of White-Hot Objects in Furnaces; Electrically Tunable Terahertz Quantum-Cascade Lasers; Few-Mode Whispering-Gallery-Mode Resonators; Conflict-Aware Scheduling Algorithm; and Real-Time Diagnosis of Faults Using a Bank of Kalman Filters.
Bioprospecting the Bibleome: Adding Evidence to Support the Inflammatory Basis of Cancer.
Elkin, Peter L; Frankel, Andrew; Liebow-Liebling, Ester H; Elkin, Jared R; Tuttle, Mark S; Brown, Steven H
2012-05-05
BioProspecting is a novel approach that enabled our team to mine genetic marker related data from the New England Journal of Medicine (NEJM) utilizing Systematized Nomenclature of Medicine-Clinical Terms (SNOMED CT) and the Human Gene Ontology (HUGO). Genes associated with disorders using the Multi-threaded Clinical Vocabulary Server (MCVS) Natural Language Processing (NLP) engine, whose output was represented as an ontology-network incorporating the semantic encodings of the literature. Metabolic functions were used to identify potentially novel relationships between (genes or proteins) and (diseases or drugs). In an effort to identify genes important to transformation of normal tissue into a malignancy, we went on to identify the genes linked to multiple cancers and then mapped those genes to metabolic and signaling pathways. Ten Genes were related to 30 or more cancers, 72 genes were related to 20 or more cancers and 191 genes were related to 10 or more cancers. The three pathways most often associated with the top 200 novel cancer markers were the Acute Phase Response Signaling, the Glucocorticoid Receptor Signaling and the Hepatic Fibrosis/Hepatic Stellate Cell Activation pathway. This association highlights the role of inflammation in the induction and perhaps transformation of mortal cells into cancers. BioProspecting can speed our identification and understanding of synergies between articles in the biomedical literature. In this case we found considerable synergy between the Oncology literature and the Sepsis literature. By mapping these associations to known metabolic, regulatory and signaling pathways we were able to identify further evidence for the inflammatory basis of cancer.
Atlas-guided cluster analysis of large tractography datasets.
Ros, Christian; Güllmar, Daniel; Stenzel, Martin; Mentzel, Hans-Joachim; Reichenbach, Jürgen Rainer
2013-01-01
Diffusion Tensor Imaging (DTI) and fiber tractography are important tools to map the cerebral white matter microstructure in vivo and to model the underlying axonal pathways in the brain with three-dimensional fiber tracts. As the fast and consistent extraction of anatomically correct fiber bundles for multiple datasets is still challenging, we present a novel atlas-guided clustering framework for exploratory data analysis of large tractography datasets. The framework uses an hierarchical cluster analysis approach that exploits the inherent redundancy in large datasets to time-efficiently group fiber tracts. Structural information of a white matter atlas can be incorporated into the clustering to achieve an anatomically correct and reproducible grouping of fiber tracts. This approach facilitates not only the identification of the bundles corresponding to the classes of the atlas; it also enables the extraction of bundles that are not present in the atlas. The new technique was applied to cluster datasets of 46 healthy subjects. Prospects of automatic and anatomically correct as well as reproducible clustering are explored. Reconstructed clusters were well separated and showed good correspondence to anatomical bundles. Using the atlas-guided cluster approach, we observed consistent results across subjects with high reproducibility. In order to investigate the outlier elimination performance of the clustering algorithm, scenarios with varying amounts of noise were simulated and clustered with three different outlier elimination strategies. By exploiting the multithreading capabilities of modern multiprocessor systems in combination with novel algorithms, our toolkit clusters large datasets in a couple of minutes. Experiments were conducted to investigate the achievable speedup and to demonstrate the high performance of the clustering framework in a multiprocessing environment.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Levine, Benjamin G., E-mail: ben.levine@temple.ed; Stone, John E., E-mail: johns@ks.uiuc.ed; Kohlmeyer, Axel, E-mail: akohlmey@temple.ed
2011-05-01
The calculation of radial distribution functions (RDFs) from molecular dynamics trajectory data is a common and computationally expensive analysis task. The rate limiting step in the calculation of the RDF is building a histogram of the distance between atom pairs in each trajectory frame. Here we present an implementation of this histogramming scheme for multiple graphics processing units (GPUs). The algorithm features a tiling scheme to maximize the reuse of data at the fastest levels of the GPU's memory hierarchy and dynamic load balancing to allow high performance on heterogeneous configurations of GPUs. Several versions of the RDF algorithm aremore » presented, utilizing the specific hardware features found on different generations of GPUs. We take advantage of larger shared memory and atomic memory operations available on state-of-the-art GPUs to accelerate the code significantly. The use of atomic memory operations allows the fast, limited-capacity on-chip memory to be used much more efficiently, resulting in a fivefold increase in performance compared to the version of the algorithm without atomic operations. The ultimate version of the algorithm running in parallel on four NVIDIA GeForce GTX 480 (Fermi) GPUs was found to be 92 times faster than a multithreaded implementation running on an Intel Xeon 5550 CPU. On this multi-GPU hardware, the RDF between two selections of 1,000,000 atoms each can be calculated in 26.9 s per frame. The multi-GPU RDF algorithms described here are implemented in VMD, a widely used and freely available software package for molecular dynamics visualization and analysis.« less
Stone, John E.; Kohlmeyer, Axel
2011-01-01
The calculation of radial distribution functions (RDFs) from molecular dynamics trajectory data is a common and computationally expensive analysis task. The rate limiting step in the calculation of the RDF is building a histogram of the distance between atom pairs in each trajectory frame. Here we present an implementation of this histogramming scheme for multiple graphics processing units (GPUs). The algorithm features a tiling scheme to maximize the reuse of data at the fastest levels of the GPU’s memory hierarchy and dynamic load balancing to allow high performance on heterogeneous configurations of GPUs. Several versions of the RDF algorithm are presented, utilizing the specific hardware features found on different generations of GPUs. We take advantage of larger shared memory and atomic memory operations available on state-of-the-art GPUs to accelerate the code significantly. The use of atomic memory operations allows the fast, limited-capacity on-chip memory to be used much more efficiently, resulting in a fivefold increase in performance compared to the version of the algorithm without atomic operations. The ultimate version of the algorithm running in parallel on four NVIDIA GeForce GTX 480 (Fermi) GPUs was found to be 92 times faster than a multithreaded implementation running on an Intel Xeon 5550 CPU. On this multi-GPU hardware, the RDF between two selections of 1,000,000 atoms each can be calculated in 26.9 seconds per frame. The multi-GPU RDF algorithms described here are implemented in VMD, a widely used and freely available software package for molecular dynamics visualization and analysis. PMID:21547007
Wilson, J Adam; Williams, Justin C
2009-01-01
The clock speeds of modern computer processors have nearly plateaued in the past 5 years. Consequently, neural prosthetic systems that rely on processing large quantities of data in a short period of time face a bottleneck, in that it may not be possible to process all of the data recorded from an electrode array with high channel counts and bandwidth, such as electrocorticographic grids or other implantable systems. Therefore, in this study a method of using the processing capabilities of a graphics card [graphics processing unit (GPU)] was developed for real-time neural signal processing of a brain-computer interface (BCI). The NVIDIA CUDA system was used to offload processing to the GPU, which is capable of running many operations in parallel, potentially greatly increasing the speed of existing algorithms. The BCI system records many channels of data, which are processed and translated into a control signal, such as the movement of a computer cursor. This signal processing chain involves computing a matrix-matrix multiplication (i.e., a spatial filter), followed by calculating the power spectral density on every channel using an auto-regressive method, and finally classifying appropriate features for control. In this study, the first two computationally intensive steps were implemented on the GPU, and the speed was compared to both the current implementation and a central processing unit-based implementation that uses multi-threading. Significant performance gains were obtained with GPU processing: the current implementation processed 1000 channels of 250 ms in 933 ms, while the new GPU method took only 27 ms, an improvement of nearly 35 times.
Enabling communication concurrency through flexible MPI endpoints
Dinan, James; Grant, Ryan E.; Balaji, Pavan; ...
2014-09-23
MPI defines a one-to-one relationship between MPI processes and ranks. This model captures many use cases effectively; however, it also limits communication concurrency and interoperability between MPI and programming models that utilize threads. Our paper describes the MPI endpoints extension, which relaxes the longstanding one-to-one relationship between MPI processes and ranks. Using endpoints, an MPI implementation can map separate communication contexts to threads, allowing them to drive communication independently. Also, endpoints enable threads to be addressable in MPI operations, enhancing interoperability between MPI and other programming models. Furthermore, these characteristics are illustrated through several examples and an empirical study thatmore » contrasts current multithreaded communication performance with the need for high degrees of communication concurrency to achieve peak communication performance.« less
Implementing Shared Memory Parallelism in MCBEND
NASA Astrophysics Data System (ADS)
Bird, Adam; Long, David; Dobson, Geoff
2017-09-01
MCBEND is a general purpose radiation transport Monte Carlo code from AMEC Foster Wheelers's ANSWERS® Software Service. MCBEND is well established in the UK shielding community for radiation shielding and dosimetry assessments. The existing MCBEND parallel capability effectively involves running the same calculation on many processors. This works very well except when the memory requirements of a model restrict the number of instances of a calculation that will fit on a machine. To more effectively utilise parallel hardware OpenMP has been used to implement shared memory parallelism in MCBEND. This paper describes the reasoning behind the choice of OpenMP, notes some of the challenges of multi-threading an established code such as MCBEND and assesses the performance of the parallel method implemented in MCBEND.
2nd-Order CESE Results For C1.4: Vortex Transport by Uniform Flow
NASA Technical Reports Server (NTRS)
Friedlander, David J.
2015-01-01
The Conservation Element and Solution Element (CESE) method was used as implemented in the NASA research code ez4d. The CESE method is a time accurate formulation with flux-conservation in both space and time. The method treats the discretized derivatives of space and time identically and while the 2nd-order accurate version was used, high-order versions exist, the 2nd-order accurate version was used. In regards to the ez4d code, it is an unstructured Navier-Stokes solver coded in C++ with serial and parallel versions available. As part of its architecture, ez4d has the capability to utilize multi-thread and Messaging Passage Interface (MPI) for parallel runs.
Electromagnetic physics models for parallel computing architectures
Amadio, G.; Ananya, A.; Apostolakis, J.; ...
2016-11-21
The recent emergence of hardware architectures characterized by many-core or accelerated processors has opened new opportunities for concurrent programming models taking advantage of both SIMD and SIMT architectures. GeantV, a next generation detector simulation, has been designed to exploit both the vector capability of mainstream CPUs and multi-threading capabilities of coprocessors including NVidia GPUs and Intel Xeon Phi. The characteristics of these architectures are very different in terms of the vectorization depth and type of parallelization needed to achieve optimal performance. In this paper we describe implementation of electromagnetic physics models developed for parallel computing architectures as a part ofmore » the GeantV project. Finally, the results of preliminary performance evaluation and physics validation are presented as well.« less
First experience of vectorizing electromagnetic physics models for detector simulation
NASA Astrophysics Data System (ADS)
Amadio, G.; Apostolakis, J.; Bandieramonte, M.; Bianchini, C.; Bitzes, G.; Brun, R.; Canal, P.; Carminati, F.; de Fine Licht, J.; Duhem, L.; Elvira, D.; Gheata, A.; Jun, S. Y.; Lima, G.; Novak, M.; Presbyterian, M.; Shadura, O.; Seghal, R.; Wenzel, S.
2015-12-01
The recent emergence of hardware architectures characterized by many-core or accelerated processors has opened new opportunities for concurrent programming models taking advantage of both SIMD and SIMT architectures. The GeantV vector prototype for detector simulations has been designed to exploit both the vector capability of mainstream CPUs and multi-threading capabilities of coprocessors including NVidia GPUs and Intel Xeon Phi. The characteristics of these architectures are very different in terms of the vectorization depth, parallelization needed to achieve optimal performance or memory access latency and speed. An additional challenge is to avoid the code duplication often inherent to supporting heterogeneous platforms. In this paper we present the first experience of vectorizing electromagnetic physics models developed for the GeantV project.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Attinella, John E.; Davis, Kristan D.; Musselman, Roy G.
Methods, apparatuses, and computer program products for servicing a globally broadcast interrupt signal in a multi-threaded computer comprising a plurality of processor threads. Embodiments include an interrupt controller indicating in a plurality of local interrupt status locations that a globally broadcast interrupt signal has been received by the interrupt controller. Embodiments also include a thread determining that a local interrupt status location corresponding to the thread indicates that the globally broadcast interrupt signal has been received by the interrupt controller. Embodiments also include the thread processing one or more entries in a global interrupt status bit queue based on whethermore » global interrupt status bits associated with the globally broadcast interrupt signal are locked. Each entry in the global interrupt status bit queue corresponds to a queued global interrupt.« less
Enabling communication concurrency through flexible MPI endpoints
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dinan, James; Grant, Ryan E.; Balaji, Pavan
MPI defines a one-to-one relationship between MPI processes and ranks. This model captures many use cases effectively; however, it also limits communication concurrency and interoperability between MPI and programming models that utilize threads. Our paper describes the MPI endpoints extension, which relaxes the longstanding one-to-one relationship between MPI processes and ranks. Using endpoints, an MPI implementation can map separate communication contexts to threads, allowing them to drive communication independently. Also, endpoints enable threads to be addressable in MPI operations, enhancing interoperability between MPI and other programming models. Furthermore, these characteristics are illustrated through several examples and an empirical study thatmore » contrasts current multithreaded communication performance with the need for high degrees of communication concurrency to achieve peak communication performance.« less
Enabling communication concurrency through flexible MPI endpoints
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dinan, James; Grant, Ryan E.; Balaji, Pavan
MPI defines a one-to-one relationship between MPI processes and ranks. This model captures many use cases effectively; however, it also limits communication concurrency and interoperability between MPI and programming models that utilize threads. This paper describes the MPI endpoints extension, which relaxes the longstanding one-to-one relationship between MPI processes and ranks. Using endpoints, an MPI implementation can map separate communication contexts to threads, allowing them to drive communication independently. Endpoints also enable threads to be addressable in MPI operations, enhancing interoperability between MPI and other programming models. These characteristics are illustrated through several examples and an empirical study that contrastsmore » current multithreaded communication performance with the need for high degrees of communication concurrency to achieve peak communication performance.« less
High Performance Semantic Factoring of Giga-Scale Semantic Graph Databases
DOE Office of Scientific and Technical Information (OSTI.GOV)
Joslyn, Cliff A.; Adolf, Robert D.; Al-Saffar, Sinan
2010-10-04
As semantic graph database technology grows to address components ranging from extant large triple stores to SPARQL endpoints over SQL-structured relational databases, it will become increasingly important to be able to bring high performance computational resources to bear on their analysis, interpretation, and visualization, especially with respect to their innate semantic structure. Our research group built a novel high performance hybrid system comprising computational capability for semantic graph database processing utilizing the large multithreaded architecture of the Cray XMT platform, conventional clusters, and large data stores. In this paper we describe that architecture, and present the results of our deployingmore » that for the analysis of the Billion Triple dataset with respect to its semantic factors.« less
Considerations on the Use of Custom Accelerators for Big Data Analytics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Castellana, Vito G.; Tumeo, Antonino; Minutoli, Marco
Accelerators, including Graphic Processing Units (GPUs) for gen- eral purpose computation, many-core designs with wide vector units (e.g., Intel Phi), have become a common component of many high performance clusters. The appearance of more stable and reliable tools tools that can automatically convert code written in high-level specifications with annotations (such as C or C++) to hardware de- scription languages (High-Level Synthesis - HLS), is also setting the stage for a broader use of reconfigurable devices (e.g., Field Pro- grammable Gate Arrays - FPGAs) in high performance system for the implementation of custom accelerators, helped by the fact that newmore » processors include advanced cache-coherent interconnects for these components. In this chapter, we briefly survey the status of the use of accelerators in high performance systems targeted at big data analytics applications. We argue that, although the progress in the use of accelerators for this class of applications has been sig- nificant, differently from scientific simulations there still are gaps to close. This is particularly true for the ”irregular” behaviors exhibited by no-SQL, graph databases. We focus our attention on the limits of HLS tools for data analytics and graph methods, and discuss a new architectural template that better fits the requirement of this class of applications. We validate the new architectural templates by mod- ifying the Graph Engine for Multithreaded System (GEMS) frame- work to support accelerators generated with such a methodology, and testing with queries coming from the Lehigh University Benchmark (LUBM). The architectural template enables better supporting the task and memory level parallelism present in graph methods by sup- porting a new control model and a enhanced memory interface. We show that out solution allows generating parallel accelerators, pro- viding speed ups with respect to conventional HLS flows. We finally draw conclusions and present a perspective on the use of reconfig- urable devices and Design Automation tools for data analytics.« less
GAMERA - The New Magnetospheric Code
NASA Astrophysics Data System (ADS)
Lyon, J.; Sorathia, K.; Zhang, B.; Merkin, V. G.; Wiltberger, M. J.; Daldorff, L. K. S.
2017-12-01
The Lyon-Fedder-Mobarry (LFM) code has been a main-line magnetospheric simulation code for 30 years. The code base, designed in the age of memory to memory vector ma- chines,is still in wide use for science production but needs upgrading to ensure the long term sustainability. In this presentation, we will discuss our recent efforts to update and improve that code base and also highlight some recent results. The new project GAM- ERA, Grid Agnostic MHD for Extended Research Applications, has kept the original design characteristics of the LFM and made significant improvements. The original de- sign included high order numerical differencing with very aggressive limiting, the ability to use arbitrary, but logically rectangular, grids, and maintenance of div B = 0 through the use of the Yee grid. Significant improvements include high-order upwinding and a non-clipping limiter. One other improvement with wider applicability is an im- proved averaging technique for the singularities in polar and spherical grids. The new code adopts a hybrid structure - multi-threaded OpenMP with an overarching MPI layer for large scale and coupled applications. The MPI layer uses a combination of standard MPI and the Global Array Toolkit from PNL to provide a lightweight mechanism for coupling codes together concurrently. The single processor code is highly efficient and can run magnetospheric simulations at the default CCMC resolution faster than real time on a MacBook pro. We have run the new code through the Athena suite of tests, and the results compare favorably with the codes available to the astrophysics community. LFM/GAMERA has been applied to many different situations ranging from the inner and outer heliosphere and magnetospheres of Venus, the Earth, Jupiter and Saturn. We present example results the Earth's magnetosphere including a coupled ring current (RCM), the magnetospheres of Jupiter and Saturn, and the inner heliosphere.
NASA Technical Reports Server (NTRS)
Kizhner, Semion; Flatley, Thomas P.; Hestnes, Phyllis; Jentoft-Nilsen, Marit; Petrick, David J.; Day, John H. (Technical Monitor)
2001-01-01
Spacecraft telemetry rates have steadily increased over the last decade presenting a problem for real-time processing by ground facilities. This paper proposes a solution to a related problem for the Geostationary Operational Environmental Spacecraft (GOES-8) image processing application. Although large super-computer facilities are the obvious heritage solution, they are very costly, making it imperative to seek a feasible alternative engineering solution at a fraction of the cost. The solution is based on a Personal Computer (PC) platform and synergy of optimized software algorithms and re-configurable computing hardware technologies, such as Field Programmable Gate Arrays (FPGA) and Digital Signal Processing (DSP). It has been shown in [1] and [2] that this configuration can provide superior inexpensive performance for a chosen application on the ground station or on-board a spacecraft. However, since this technology is still maturing, intensive pre-hardware steps are necessary to achieve the benefits of hardware implementation. This paper describes these steps for the GOES-8 application, a software project developed using Interactive Data Language (IDL) (Trademark of Research Systems, Inc.) on a Workstation/UNIX platform. The solution involves converting the application to a PC/Windows/RC platform, selected mainly by the availability of low cost, adaptable high-speed RC hardware. In order for the hybrid system to run, the IDL software was modified to account for platform differences. It was interesting to examine the gains and losses in performance on the new platform, as well as unexpected observations before implementing hardware. After substantial pre-hardware optimization steps, the necessity of hardware implementation for bottleneck code in the PC environment became evident and solvable beginning with the methodology described in [1], [2], and implementing a novel methodology for this specific application [6]. The PC-RC interface bandwidth problem for the class of applications with moderate input-output data rates but large intermediate multi-thread data streams has been addressed and mitigated. This opens a new class of satellite image processing applications for bottleneck problems solution using RC technologies. The issue of a science algorithm level of abstraction necessary for RC hardware implementation is also described. Selected Matlab functions already implemented in hardware were investigated for their direct applicability to the GOES-8 application with the intent to create a library of Matlab and IDL RC functions for ongoing work. A complete class of spacecraft image processing applications using embedded re-configurable computing technology to meet real-time requirements, including performance results and comparison with the existing system, is described in this paper.
Parallel approach for bioinspired algorithms
NASA Astrophysics Data System (ADS)
Zaporozhets, Dmitry; Zaruba, Daria; Kulieva, Nina
2018-05-01
In the paper, a probabilistic parallel approach based on the population heuristic, such as a genetic algorithm, is suggested. The authors proposed using a multithreading approach at the micro level at which new alternative solutions are generated. On each iteration, several threads that independently used the same population to generate new solutions can be started. After the work of all threads, a selection operator combines obtained results in the new population. To confirm the effectiveness of the suggested approach, the authors have developed software on the basis of which experimental computations can be carried out. The authors have considered a classic optimization problem – finding a Hamiltonian cycle in a graph. Experiments show that due to the parallel approach at the micro level, increment of running speed can be obtained on graphs with 250 and more vertices.
Implementing and analyzing the multi-threaded LP-inference
NASA Astrophysics Data System (ADS)
Bolotova, S. Yu; Trofimenko, E. V.; Leschinskaya, M. V.
2018-03-01
The logical production equations provide new possibilities for the backward inference optimization in intelligent production-type systems. The strategy of a relevant backward inference is aimed at minimization of a number of queries to external information source (either to a database or an interactive user). The idea of the method is based on the computing of initial preimages set and searching for the true preimage. The execution of each stage can be organized independently and in parallel and the actual work at a given stage can also be distributed between parallel computers. This paper is devoted to the parallel algorithms of the relevant inference based on the advanced scheme of the parallel computations “pipeline” which allows to increase the degree of parallelism. The author also provides some details of the LP-structures implementation.
Humin to Human: Organic carbon, sediment, and water fluxes along river corridors in a changing world
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sutfin, Nicholas Alan
This is a presentation with slides on What does it mean to be human? ...humin?; River flow and Hydrographs; Snake River altered hydrograph (Marston et al., 2005); Carbon dynamics are important in rivers; Rivers and streams as carbon sink; Reservoirs for organic carbon; Study sites in Colorado; River morphology; Soil sample collection; Surveys at RMNP; Soil organic carbon content at RMNP; Abandoned channels and Cutoffs; East River channel migration and erosion; Linking hydrology to floodplain sediment flux; Impact of Extreme Floods on Floodplain Sediment; Channel Geometry: RMNP; Beavers dams and multithread channels; Geomorphology and carbon in N. St. Vrain Creek;more » Geomorphology and carbon along the East River; Geomorphology and carbon in N. St. Vrain Creek; San Marcos River, etc.« less
GPU and APU computations of Finite Time Lyapunov Exponent fields
NASA Astrophysics Data System (ADS)
Conti, Christian; Rossinelli, Diego; Koumoutsakos, Petros
2012-03-01
We present GPU and APU accelerated computations of Finite-Time Lyapunov Exponent (FTLE) fields. The calculation of FTLEs is a computationally intensive process, as in order to obtain the sharp ridges associated with the Lagrangian Coherent Structures an extensive resampling of the flow field is required. The computational performance of this resampling is limited by the memory bandwidth of the underlying computer architecture. The present technique harnesses data-parallel execution of many-core architectures and relies on fast and accurate evaluations of moment conserving functions for the mesh to particle interpolations. We demonstrate how the computation of FTLEs can be efficiently performed on a GPU and on an APU through OpenCL and we report over one order of magnitude improvements over multi-threaded executions in FTLE computations of bluff body flows.
ParTIES: a toolbox for Paramecium interspersed DNA elimination studies.
Denby Wilkes, Cyril; Arnaiz, Olivier; Sperling, Linda
2016-02-15
Developmental DNA elimination occurs in a wide variety of multicellular organisms, but ciliates are the only single-celled eukaryotes in which this phenomenon has been reported. Despite considerable interest in ciliates as models for DNA elimination, no standard methods for identification and characterization of the eliminated sequences are currently available. We present the Paramecium Toolbox for Interspersed DNA Elimination Studies (ParTIES), designed for Paramecium species, that (i) identifies eliminated sequences, (ii) measures their presence in a sequencing sample and (iii) detects rare elimination polymorphisms. ParTIES is multi-threaded Perl software available at https://github.com/oarnaiz/ParTIES. ParTIES is distributed under the GNU General Public Licence v3. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Final Report: Correctness Tools for Petascale Computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mellor-Crummey, John
2014-10-27
In the course of developing parallel programs for leadership computing systems, subtle programming errors often arise that are extremely difficult to diagnose without tools. To meet this challenge, University of Maryland, the University of Wisconsin—Madison, and Rice University worked to develop lightweight tools to help code developers pinpoint a variety of program correctness errors that plague parallel scientific codes. The aim of this project was to develop software tools that help diagnose program errors including memory leaks, memory access errors, round-off errors, and data races. Research at Rice University focused on developing algorithms and data structures to support efficient monitoringmore » of multithreaded programs for memory access errors and data races. This is a final report about research and development work at Rice University as part of this project.« less
Program Model Checking: A Practitioner's Guide
NASA Technical Reports Server (NTRS)
Pressburger, Thomas T.; Mansouri-Samani, Masoud; Mehlitz, Peter C.; Pasareanu, Corina S.; Markosian, Lawrence Z.; Penix, John J.; Brat, Guillaume P.; Visser, Willem C.
2008-01-01
Program model checking is a verification technology that uses state-space exploration to evaluate large numbers of potential program executions. Program model checking provides improved coverage over testing by systematically evaluating all possible test inputs and all possible interleavings of threads in a multithreaded system. Model-checking algorithms use several classes of optimizations to reduce the time and memory requirements for analysis, as well as heuristics for meaningful analysis of partial areas of the state space Our goal in this guidebook is to assemble, distill, and demonstrate emerging best practices for applying program model checking. We offer it as a starting point and introduction for those who want to apply model checking to software verification and validation. The guidebook will not discuss any specific tool in great detail, but we provide references for specific tools.
Research on TCP/IP network communication based on Node.js
NASA Astrophysics Data System (ADS)
Huang, Jing; Cai, Lixiong
2018-04-01
In the face of big data, long connection and high synchronization, TCP/IP network communication will cause performance bottlenecks due to its blocking multi-threading service model. This paper presents a method of TCP/IP network communication protocol based on Node.js. On the basis of analyzing the characteristics of Node.js architecture and asynchronous non-blocking I/O model, the principle of its efficiency is discussed, and then compare and analyze the network communication model of TCP/IP protocol to expound the reasons why TCP/IP protocol stack is widely used in network communication. Finally, according to the large data and high concurrency in the large-scale grape growing environment monitoring process, a TCP server design based on Node.js is completed. The results show that the example runs stably and efficiently.
FNCS: A Framework for Power System and Communication Networks Co-Simulation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ciraci, Selim; Daily, Jeffrey A.; Fuller, Jason C.
2014-04-13
This paper describes the Fenix framework that uses a federated approach for integrating power grid and communication network simulators. Compared existing approaches, Fenix al- lows co-simulation of both transmission and distribution level power grid simulators with the communication network sim- ulator. To reduce the performance overhead of time synchro- nization, Fenix utilizes optimistic synchronization strategies that make speculative decisions about when the simulators are going to exchange messages. GridLAB-D (a distribution simulator), PowerFlow (a transmission simulator), and ns-3 (a telecommunication simulator) are integrated with the frame- work and are used to illustrate the enhanced performance pro- vided by speculative multi-threadingmore » on a smart grid applica- tion. Our speculative multi-threading approach achieved on average 20% improvement over the existing synchronization methods« less
High performance semantic factoring of giga-scale semantic graph databases.
DOE Office of Scientific and Technical Information (OSTI.GOV)
al-Saffar, Sinan; Adolf, Bob; Haglin, David
2010-10-01
As semantic graph database technology grows to address components ranging from extant large triple stores to SPARQL endpoints over SQL-structured relational databases, it will become increasingly important to be able to bring high performance computational resources to bear on their analysis, interpretation, and visualization, especially with respect to their innate semantic structure. Our research group built a novel high performance hybrid system comprising computational capability for semantic graph database processing utilizing the large multithreaded architecture of the Cray XMT platform, conventional clusters, and large data stores. In this paper we describe that architecture, and present the results of our deployingmore » that for the analysis of the Billion Triple dataset with respect to its semantic factors, including basic properties, connected components, namespace interaction, and typed paths.« less
PVT: An Efficient Computational Procedure to Speed up Next-generation Sequence Analysis
2014-01-01
Background High-throughput Next-Generation Sequencing (NGS) techniques are advancing genomics and molecular biology research. This technology generates substantially large data which puts up a major challenge to the scientists for an efficient, cost and time effective solution to analyse such data. Further, for the different types of NGS data, there are certain common challenging steps involved in analysing those data. Spliced alignment is one such fundamental step in NGS data analysis which is extremely computational intensive as well as time consuming. There exists serious problem even with the most widely used spliced alignment tools. TopHat is one such widely used spliced alignment tools which although supports multithreading, does not efficiently utilize computational resources in terms of CPU utilization and memory. Here we have introduced PVT (Pipelined Version of TopHat) where we take up a modular approach by breaking TopHat’s serial execution into a pipeline of multiple stages, thereby increasing the degree of parallelization and computational resource utilization. Thus we address the discrepancies in TopHat so as to analyze large NGS data efficiently. Results We analysed the SRA dataset (SRX026839 and SRX026838) consisting of single end reads and SRA data SRR1027730 consisting of paired-end reads. We used TopHat v2.0.8 to analyse these datasets and noted the CPU usage, memory footprint and execution time during spliced alignment. With this basic information, we designed PVT, a pipelined version of TopHat that removes the redundant computational steps during ‘spliced alignment’ and breaks the job into a pipeline of multiple stages (each comprising of different step(s)) to improve its resource utilization, thus reducing the execution time. Conclusions PVT provides an improvement over TopHat for spliced alignment of NGS data analysis. PVT thus resulted in the reduction of the execution time to ~23% for the single end read dataset. Further, PVT designed for paired end reads showed an improved performance of ~41% over TopHat (for the chosen data) with respect to execution time. Moreover we propose PVT-Cloud which implements PVT pipeline in cloud computing system. PMID:24894600
PVT: an efficient computational procedure to speed up next-generation sequence analysis.
Maji, Ranjan Kumar; Sarkar, Arijita; Khatua, Sunirmal; Dasgupta, Subhasis; Ghosh, Zhumur
2014-06-04
High-throughput Next-Generation Sequencing (NGS) techniques are advancing genomics and molecular biology research. This technology generates substantially large data which puts up a major challenge to the scientists for an efficient, cost and time effective solution to analyse such data. Further, for the different types of NGS data, there are certain common challenging steps involved in analysing those data. Spliced alignment is one such fundamental step in NGS data analysis which is extremely computational intensive as well as time consuming. There exists serious problem even with the most widely used spliced alignment tools. TopHat is one such widely used spliced alignment tools which although supports multithreading, does not efficiently utilize computational resources in terms of CPU utilization and memory. Here we have introduced PVT (Pipelined Version of TopHat) where we take up a modular approach by breaking TopHat's serial execution into a pipeline of multiple stages, thereby increasing the degree of parallelization and computational resource utilization. Thus we address the discrepancies in TopHat so as to analyze large NGS data efficiently. We analysed the SRA dataset (SRX026839 and SRX026838) consisting of single end reads and SRA data SRR1027730 consisting of paired-end reads. We used TopHat v2.0.8 to analyse these datasets and noted the CPU usage, memory footprint and execution time during spliced alignment. With this basic information, we designed PVT, a pipelined version of TopHat that removes the redundant computational steps during 'spliced alignment' and breaks the job into a pipeline of multiple stages (each comprising of different step(s)) to improve its resource utilization, thus reducing the execution time. PVT provides an improvement over TopHat for spliced alignment of NGS data analysis. PVT thus resulted in the reduction of the execution time to ~23% for the single end read dataset. Further, PVT designed for paired end reads showed an improved performance of ~41% over TopHat (for the chosen data) with respect to execution time. Moreover we propose PVT-Cloud which implements PVT pipeline in cloud computing system.
FLARE FOOTPOINT REGIONS AND A SURGE OBSERVED BY HINODE/EIS, RHESSI, AND SDO/AIA
DOE Office of Scientific and Technical Information (OSTI.GOV)
Doschek, G. A.; Warren, H. P.; Dennis, B. R.
2015-11-01
The Extreme-ultraviolet Imaging Spectrometer (EIS) on the Hinode spacecraft observed flare footpoint regions coincident with a surge for an M3.7 flare observed on 2011 September 25 at N12 E33 in active region 11302. The flare was observed in spectral lines of O vi, Fe x, Fe xii, Fe xiv, Fe xv, Fe xvi, Fe xvii, Fe xxiii, and Fe xxiv. The EIS observations were made coincident with hard X-ray bursts observed by RHESSI. Overlays of the RHESSI images on the EIS raster images at different wavelengths show a spatial coincidence of features in the RHESSI images with the EIS upflowmore » and downflow regions, as well as loop-top or near-loop-top regions. A complex array of phenomena were observed, including multiple evaporation regions and the surge, which was also observed by the Solar Dynamics Observatory/Atmospheric Imaging Assembly telescopes. The slit of the EIS spectrometer covered several flare footpoint regions from which evaporative upflows in Fe xxiii and Fe xxiv lines were observed with Doppler speeds greater than 500 km s{sup −1}. For ions such as Fe xv both evaporative outflows (∼200 km s{sup −1}) and downflows (∼30–50 km s{sup −1}) were observed. Nonthermal motions from 120 to 300 km s{sup −1} were measured in flare lines. In the surge, Doppler speeds are found from about 0 to over 250 km s{sup −1} in lines from ions such as Fe xiv. The nonthermal motions could be due to multiple sources slightly Doppler-shifted from each other or turbulence in the evaporating plasma. We estimate the energetics of the hard X-ray burst and obtain a total flare energy in accelerated electrons of ≥7 × 10{sup 28} erg. This is a lower limit because only an upper limit can be determined for the low-energy cutoff to the electron spectrum. We find that detailed modeling of this event would require a multithreaded model owing to its complexity.« less
Automatic multiple-sample applicator and electrophoresis apparatus
NASA Technical Reports Server (NTRS)
Grunbaum, B. W. (Inventor)
1977-01-01
An apparatus for performing electrophoresis and a multiple-sample applicator is described. Electrophoresis is a physical process in which electrically charged molecules and colloidal particles, upon the application of a dc current, migrate along a gel or a membrane that is wetted with an electrolyte. A multiple-sample applicator is provided which coacts with a novel tank cover to permit an operator either to depress a single button, thus causing multiple samples to be deposited on the gel or on the membrane simultaneously, or to depress one or more sample applicators separately by means of a separate button for each applicator.
Input-independent, Scalable and Fast String Matching on the Cray XMT
DOE Office of Scientific and Technical Information (OSTI.GOV)
Villa, Oreste; Chavarría-Miranda, Daniel; Maschhoff, Kristyn J
2009-05-25
String searching is at the core of many security and network applications like search engines, intrusion detection systems, virus scanners and spam filters. The growing size of on-line content and the increasing wire speeds push the need for fast, and often real- time, string searching solutions. For these conditions, many software implementations (if not all) targeting conventional cache-based microprocessors do not perform well. They either exhibit overall low performance or exhibit highly variable performance depending on the types of inputs. For this reason, real-time state of the art solutions rely on the use of either custom hardware or Field-Programmable Gatemore » Arrays (FPGAs) at the expense of overall system flexibility and programmability. This paper presents a software based implementation of the Aho-Corasick string searching algorithm on the Cray XMT multithreaded shared memory machine. Our so- lution relies on the particular features of the XMT architecture and on several algorith- mic strategies: it is fast, scalable and its performance is virtually content-independent. On a 128-processor Cray XMT, it reaches a scanning speed of ≈ 28 Gbps with a performance variability below 10 %. In the 10 Gbps performance range, variability is below 2.5%. By comparison, an Intel dual-socket, 8-core system running at 2.66 GHz achieves a peak performance which varies from 500 Mbps to 10 Gbps depending on the type of input and dictionary size.« less
Tianxiao Jiang; Siddiqui, Hasan; Ray, Shruti; Asman, Priscella; Ozturk, Musa; Ince, Nuri F
2017-07-01
This paper presents a portable platform to collect and review behavioral data simultaneously with neurophysiological signals. The whole system is comprised of four parts: a sensor data acquisition interface, a socket server for real-time data streaming, a Simulink system for real-time processing and an offline data review and analysis toolbox. A low-cost microcontroller is used to acquire data from external sensors such as accelerometer and hand dynamometer. The micro-controller transfers the data either directly through USB or wirelessly through a bluetooth module to a data server written in C++ for MS Windows OS. The data server also interfaces with the digital glove and captures HD video from webcam. The acquired sensor data are streamed under User Datagram Protocol (UDP) to other applications such as Simulink/Matlab for real-time analysis and recording. Neurophysiological signals such as electroencephalography (EEG), electrocorticography (ECoG) and local field potential (LFP) recordings can be collected simultaneously in Simulink and fused with behavioral data. In addition, we developed a customized Matlab Graphical User Interface (GUI) software to review, annotate and analyze the data offline. The software provides a fast, user-friendly data visualization environment with synchronized video playback feature. The software is also capable of reviewing long-term neural recordings. Other featured functions such as fast preprocessing with multithreaded filters, annotation, montage selection, power-spectral density (PSD) estimate, time-frequency map and spatial spectral map are also implemented.
ATLAS DataFlow Infrastructure: Recent results from ATLAS cosmic and first-beam data-taking
NASA Astrophysics Data System (ADS)
Vandelli, Wainer; ATLAS TDAQ Collaboration
2010-04-01
The ATLAS DataFlow infrastructure is responsible for the collection and conveyance of event data from the detector front-end electronics to the mass storage. Several optimized and multi-threaded applications fulfill this purpose operating over a multi-stage Gigabit Ethernet network which is the backbone of the ATLAS Trigger and Data Acquisition System. The system must be able to efficiently transport event-data with high reliability, while providing aggregated bandwidths larger than 5 GByte/s and coping with many thousands network connections. Nevertheless, routing and streaming capabilities and monitoring and data accounting functionalities are also fundamental requirements. During 2008, a few months of ATLAS cosmic data-taking and the first experience with the LHC beams provided an unprecedented test-bed for the evaluation of the performance of the ATLAS DataFlow, in terms of functionality, robustness and stability. Besides, operating the system far from its design specifications helped in exercising its flexibility and contributed in understanding its limitations. Moreover, the integration with the detector and the interfacing with the off-line data processing and management have been able to take advantage of this extended data taking-period as well. In this paper we report on the usage of the DataFlow infrastructure during the ATLAS data-taking. These results, backed-up by complementary performance tests, validate the architecture of the ATLAS DataFlow and prove that the system is robust, flexible and scalable enough to cope with the final requirements of the ATLAS experiment.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fisher,D
Concerns about the long-term viability of SFS as the metadata store for HPSS have been increasing. A concern that Transarc may discontinue support for SFS motivates us to consider alternative means to store HPSS metadata. The obvious alternative is a commercial database. Commercial databases have the necessary characteristics for storage of HPSS metadata records. They are robust and scalable and can easily accommodate the volume of data that must be stored. They provide programming interfaces, transactional semantics and a full set of maintenance and performance enhancement tools. A team was organized within the HPSS project to study and recommend anmore » approach for the replacement of SFS. Members of the team are David Fisher, Jim Minton, Donna Mecozzi, Danny Cook, Bart Parliman and Lynn Jones. We examined several possible solutions to the problem of replacing SFS, and recommended on May 22, 2000, in a report to the HPSS Technical and Executive Committees, to change HPSS into a database application over either Oracle or DB2. We recommended either Oracle or DB2 on the basis of market share and technical suitability. Oracle and DB2 are dominant offerings in the market, and it is in the best interest of HPSS to use a major player's product. Both databases provide a suitable programming interface. Transaction management functions, support for multi-threaded clients and data manipulation languages (DML) are available. These findings were supported in meetings held with technical experts from both companies. In both cases, the evidence indicated that either database would provide the features needed to host HPSS.« less
GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit
Pronk, Sander; Páll, Szilárd; Schulz, Roland; Larsson, Per; Bjelkmar, Pär; Apostolov, Rossen; Shirts, Michael R.; Smith, Jeremy C.; Kasson, Peter M.; van der Spoel, David; Hess, Berk; Lindahl, Erik
2013-01-01
Motivation: Molecular simulation has historically been a low-throughput technique, but faster computers and increasing amounts of genomic and structural data are changing this by enabling large-scale automated simulation of, for instance, many conformers or mutants of biomolecules with or without a range of ligands. At the same time, advances in performance and scaling now make it possible to model complex biomolecular interaction and function in a manner directly testable by experiment. These applications share a need for fast and efficient software that can be deployed on massive scale in clusters, web servers, distributed computing or cloud resources. Results: Here, we present a range of new simulation algorithms and features developed during the past 4 years, leading up to the GROMACS 4.5 software package. The software now automatically handles wide classes of biomolecules, such as proteins, nucleic acids and lipids, and comes with all commonly used force fields for these molecules built-in. GROMACS supports several implicit solvent models, as well as new free-energy algorithms, and the software now uses multithreading for efficient parallelization even on low-end systems, including windows-based workstations. Together with hand-tuned assembly kernels and state-of-the-art parallelization, this provides extremely high performance and cost efficiency for high-throughput as well as massively parallel simulations. Availability: GROMACS is an open source and free software available from http://www.gromacs.org. Contact: erik.lindahl@scilifelab.se Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23407358
Atlas-Guided Cluster Analysis of Large Tractography Datasets
Ros, Christian; Güllmar, Daniel; Stenzel, Martin; Mentzel, Hans-Joachim; Reichenbach, Jürgen Rainer
2013-01-01
Diffusion Tensor Imaging (DTI) and fiber tractography are important tools to map the cerebral white matter microstructure in vivo and to model the underlying axonal pathways in the brain with three-dimensional fiber tracts. As the fast and consistent extraction of anatomically correct fiber bundles for multiple datasets is still challenging, we present a novel atlas-guided clustering framework for exploratory data analysis of large tractography datasets. The framework uses an hierarchical cluster analysis approach that exploits the inherent redundancy in large datasets to time-efficiently group fiber tracts. Structural information of a white matter atlas can be incorporated into the clustering to achieve an anatomically correct and reproducible grouping of fiber tracts. This approach facilitates not only the identification of the bundles corresponding to the classes of the atlas; it also enables the extraction of bundles that are not present in the atlas. The new technique was applied to cluster datasets of 46 healthy subjects. Prospects of automatic and anatomically correct as well as reproducible clustering are explored. Reconstructed clusters were well separated and showed good correspondence to anatomical bundles. Using the atlas-guided cluster approach, we observed consistent results across subjects with high reproducibility. In order to investigate the outlier elimination performance of the clustering algorithm, scenarios with varying amounts of noise were simulated and clustered with three different outlier elimination strategies. By exploiting the multithreading capabilities of modern multiprocessor systems in combination with novel algorithms, our toolkit clusters large datasets in a couple of minutes. Experiments were conducted to investigate the achievable speedup and to demonstrate the high performance of the clustering framework in a multiprocessing environment. PMID:24386292
Refocusing from a plenoptic camera within seconds on a mobile phone
NASA Astrophysics Data System (ADS)
Gómez-Cárdenes, Ã.`scar; Marichal-Hernández, José G.; Rosa, Fernando L.; Lüke, Jonas P.; Fernández-Valdivia, Juan José; Rodríguez-Ramos, José M.
2014-05-01
Refocusing a plenoptic image by digital means and after the exposure has been thoroughly studied in the last years, but few efforts have been made in the direction of real time implementation in a constrained environment such as that provided by current mobile phones and tablets. In this work we address the aforementioned challenge demonstrating that a complete focal stack, comprising 31 refocused planes from a (256ff16)2 plenoptic image, can be achieved within seconds by a current SoC mobile phone platform. The election of an appropriate algorithm is the key to success. In a previous work we developed an algorithm, the fast approximate 4D:3D discrete Radon transform, that performs this task with linear time complexity where others obtain quadratic or linearithmic time complexity. Moreover, that algorithm does not requires complex number transforms, trigonometric calculus nor even multiplications nor oat numbers. Our algorithm has been ported to a multi core ARM chip on an off-the-shelf tablet running Android. A careful implementation exploiting parallelism at several levels has been necessary. The final implementation takes advantage of multi-threading in native code and NEON SIMD instructions. As a result our current implementation completes the refocusing task within seconds for a 16 megapixels image, much faster than previous attempts running on powerful PC platforms or dedicated hardware. The times consumed by the different stages of the digital refocusing are given and the strategies to achieve this result are discussed. Time results are given for a variety of environments within Android ecosystem, from the weaker/cheaper SoCs to the top of the line for 2013.
High Resolution Modelling of the Congo River's Multi-Threaded Main Stem Hydraulics
NASA Astrophysics Data System (ADS)
Carr, A. B.; Trigg, M.; Tshimanga, R.; Neal, J. C.; Borman, D.; Smith, M. W.; Bola, G.; Kabuya, P.; Mushie, C. A.; Tschumbu, C. L.
2017-12-01
We present the results of a summer 2017 field campaign by members of the Congo River users Hydraulics and Morphology (CRuHM) project, and a subsequent reach-scale hydraulic modelling study on the Congo's main stem. Sonar bathymetry, ADCP transects, and water surface elevation data have been collected along the Congo's heavily multi-threaded middle reach, which exhibits complex in-channel hydraulic processes that are not well understood. To model the entire basin's hydrodynamics, these in-channel hydraulic processes must be parameterised since it is not computationally feasible to represent them explicitly. Furthermore, recent research suggests that relative to other large global rivers, in-channel flows on the Congo represent a relatively large proportion of total flow through the river-floodplain system. We therefore regard sufficient representation of in-channel hydraulic processes as a Congo River hydrodynamic research priority. To enable explicit representation of in-channel hydraulics, we develop a reach-scale (70 km), high resolution hydraulic model. Simulation of flow through individual channel threads provides new information on flow depths and velocities, and will be used to inform the parameterisation of a broader basin-scale hydrodynamic model. The basin-scale model will ultimately be used to investigate floodplain fluxes, flood wave attenuation, and the impact of future hydrological change scenarios on basin hydrodynamics. This presentation will focus on the methodology we use to develop a reach-scale bathymetric DEM. The bathymetry of only a small proportion of channel threads can realistically be captured, necessitating some estimation of the bathymetry of channels not surveyed. We explore different approaches to this bathymetry estimation, and the extent to which it influences hydraulic model predictions. The CRuHM project is a consortium comprising the Universities of Kinshasa, Rhodes, Dar es Salaam, Bristol, and Leeds, and is funded by Royal Society-DFID Africa Capacity Building Initiative. The project aims to strengthen institutional research capacity and advance our understanding of the hydrology, hydrodynamics and sediment dynamics of the world's second largest river system through fieldwork and development of numerical models.
A Model of Beaver Meadow Complex Evolution in the Silvies River Basin, Oregon.
NASA Astrophysics Data System (ADS)
Nash, C.; Grant, G.; Campbell, S. D.
2014-12-01
There is increasing evidence to suggest that the pervasive incision seen in the American West is due, in part, to the removal of beaver (Castor canadensis) in the first half of the 19th century. New restoration strategies for these systems focus on the reintroduction of beaver and construction of beaver dam analogs. Such dams locally raise streams beds and water tables, reconnect incised channels to their former floodplains, trap sediment, increase hydraulic diversity, and promote riparian vegetation. However, the geomorphic and hydrologic impacts of both the original beaver dams and their analogs are poorly understood. Observations in the Silvies River basin in Oregon, USA - an upland, semi-arid catchment with extremely high historic beaver populations and a presently recovering population, inform a conceptual model for valley floor evolution with beaver dams. The evolution of the beaver dam complex is characterized by eight stages of morphologic adjustment: water impoundment, sediment deposition, pond filling, multi-thread meadow creation, dam breaching, channel incision, channel widening, and floodplain development. Well-constructed beaver dams, given sufficient time and sediment flux, will evolve from a series of ponds to a multi-threaded channel flowing through a wet meadow complex. If a dam in the system fails, due to overtopping, undercutting, lack of maintenance, or abandonment, the upstream channel will concentrate into a single channel and incise, followed over time by widening once critical bank heights are exceeded. From stratigraphic, dendrochronologic, and geomorphic measurements, we are constraining average timescales associated with each stage's duration and transitional period. Measured sedimentation rates behind modern beaver dam analogs on five stream systems permit calculation of sediment flux over recent time periods, and aid in developing regional rates of sediment deposition over a range of drainage areas and gradients. Stratigraphic and dendrochronologic records provide insight into rates of incision, widening, and floodplain development. These measurements are leading to an understanding of the timescales associated with each morphologic stage and transition period, as well as the long-term implications of reintroducing beaver into a wide range of stream systems.
NASA Astrophysics Data System (ADS)
Overstreet, B. T.; Legleiter, C. J.
2012-12-01
The Snake River in Grand Teton National Park is a dam-regulated but highly dynamic gravel-bed river that alternates between a single thread and a multithread planform. Identifying key drivers of channel change on this river could improve our understanding of 1) how flow regulation at Jackson Lake Dam has altered the character of the river over time; 2) how changes in the distribution of various types of vegetation impacts river dynamics; and 3) how the Snake River will respond to future human and climate driven disturbances. Despite the importance of monitoring planform changes over time, automated channel extraction and understanding the physical drivers contributing to channel change continue to be challenging yet critical steps in the remote sensing of riverine environments. In this study we use the random forest statistical technique to first classify land cover within the Snake River corridor and then extract channel features from a sequence of high-resolution multispectral images of the Snake River spanning the period from 2006 to 2012, which encompasses both exceptionally dry years and near-record runoff in 2011. We show that the random forest technique can be used to classify images with as few as four spectral bands with far greater accuracy than traditional single-tree classification approaches. Secondly, we couple random forest derived land cover maps with LiDAR derived topography, bathymetry, and canopy height to explore physical drivers contributing to observed channel changes on the Snake River. In conclusion we show that the random forest technique is a powerful tool for classifying multispectral images of rivers. Moreover, we hypothesize that with sufficient data for calculating spatially distributed metrics of channel form and more frequent channel monitoring, this tool can also be used to identify areas with high probabilities of channel change. Land cover maps of a portion of the Snake River produced from digital aerial photography from 2010 and a 2011 WorldView2 satellite image. This pair of maps thus captures changes that occurred during the 2011 runoff
Cloud-based large-scale air traffic flow optimization
NASA Astrophysics Data System (ADS)
Cao, Yi
The ever-increasing traffic demand makes the efficient use of airspace an imperative mission, and this paper presents an effort in response to this call. Firstly, a new aggregate model, called Link Transmission Model (LTM), is proposed, which models the nationwide traffic as a network of flight routes identified by origin-destination pairs. The traversal time of a flight route is assumed to be the mode of distribution of historical flight records, and the mode is estimated by using Kernel Density Estimation. As this simplification abstracts away physical trajectory details, the complexity of modeling is drastically decreased, resulting in efficient traffic forecasting. The predicative capability of LTM is validated against recorded traffic data. Secondly, a nationwide traffic flow optimization problem with airport and en route capacity constraints is formulated based on LTM. The optimization problem aims at alleviating traffic congestions with minimal global delays. This problem is intractable due to millions of variables. A dual decomposition method is applied to decompose the large-scale problem such that the subproblems are solvable. However, the whole problem is still computational expensive to solve since each subproblem is an smaller integer programming problem that pursues integer solutions. Solving an integer programing problem is known to be far more time-consuming than solving its linear relaxation. In addition, sequential execution on a standalone computer leads to linear runtime increase when the problem size increases. To address the computational efficiency problem, a parallel computing framework is designed which accommodates concurrent executions via multithreading programming. The multithreaded version is compared with its monolithic version to show decreased runtime. Finally, an open-source cloud computing framework, Hadoop MapReduce, is employed for better scalability and reliability. This framework is an "off-the-shelf" parallel computing model that can be used for both offline historical traffic data analysis and online traffic flow optimization. It provides an efficient and robust platform for easy deployment and implementation. A small cloud consisting of five workstations was configured and used to demonstrate the advantages of cloud computing in dealing with large-scale parallelizable traffic problems.
Teaching Multiplication and Multiplication Tables by the Application of Finger Multiplication
ERIC Educational Resources Information Center
Bahadir, Elif
2017-01-01
Developments in mathematics education tend to emphasize mathematics teaching with the help of activities that will allow the students to create these concepts rather than to make them memorize mathematical rules. The purpose of this study is to analyze the applicability of the application of multiplication with fingers developed by the researcher.…
A survey of techniques for architecting and managing GPU register file
Mittal, Sparsh
2016-04-07
To support their massively-multithreaded architecture, GPUs use very large register file (RF) which has a capacity higher than even L1 and L2 caches. In total contrast, traditional CPUs use tiny RF and much larger caches to optimize latency. Due to these differences, along with the crucial impact of RF in determining GPU performance, novel and intelligent techniques are required for managing GPU RF. In this paper, we survey the techniques for designing and managing GPU RF. We discuss techniques related to performance, energy and reliability aspects of RF. To emphasize the similarities and differences between the techniques, we classify themmore » along several parameters. Lastly, the aim of this paper is to synthesize the state-of-art developments in RF management and also stimulate further research in this area.« less
Shared prefetching to reduce execution skew in multi-threaded systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Eichenberger, Alexandre E; Gunnels, John A
Mechanisms are provided for optimizing code to perform prefetching of data into a shared memory of a computing device that is shared by a plurality of threads that execute on the computing device. A memory stream of a portion of code that is shared by the plurality of threads is identified. A set of prefetch instructions is distributed across the plurality of threads. Prefetch instructions are inserted into the instruction sequences of the plurality of threads such that each instruction sequence has a separate sub-portion of the set of prefetch instructions, thereby generating optimized code. Executable code is generated basedmore » on the optimized code and stored in a storage device. The executable code, when executed, performs the prefetches associated with the distributed set of prefetch instructions in a shared manner across the plurality of threads.« less
A survey of techniques for architecting and managing GPU register file
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mittal, Sparsh
To support their massively-multithreaded architecture, GPUs use very large register file (RF) which has a capacity higher than even L1 and L2 caches. In total contrast, traditional CPUs use tiny RF and much larger caches to optimize latency. Due to these differences, along with the crucial impact of RF in determining GPU performance, novel and intelligent techniques are required for managing GPU RF. In this paper, we survey the techniques for designing and managing GPU RF. We discuss techniques related to performance, energy and reliability aspects of RF. To emphasize the similarities and differences between the techniques, we classify themmore » along several parameters. Lastly, the aim of this paper is to synthesize the state-of-art developments in RF management and also stimulate further research in this area.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tomopy is a Python toolbox to perform x-ray data processing, image reconstruction and data exchange tasks at synchrotron facilities. The dependencies of the software are currently as follows: -Python related python standard library (http://docs.python.org/2/library/) numpy (http://www.numpy.org/) scipy (http://scipy.org/) matplotlib (http://matplotlip.org/) sphinx (http://sphinx-doc.org) pil (http://www.pythonware.com/products/pil/) pyhdf (http://pysclint.sourceforge.net/pyhdf/) h5py (http://www.h5py.org) pywt (http://www.pybytes.com/pywavelets/) file.py (https://pyspec.svn.sourceforge.net/svnroot/pyspec/trunk/pyspec/ccd/files.py) -C/C++ related: gridec (anonymous?? C-code written back in 1997 that uses standard C library) fftw (http://www.fftw.org/) tomoRecon (multi-threaded C++ verion of gridrec. Author: Mark Rivers from APS. http://cars9.uchicago.edu/software/epics/tomoRecon.html) epics (http://www.aps.anl.gov/epics/)
45 CFR 2523.110 - Can Federal agencies submit multiple applications?
Code of Federal Regulations, 2010 CFR
2010-10-01
... 45 Public Welfare 4 2010-10-01 2010-10-01 false Can Federal agencies submit multiple applications? 2523.110 Section 2523.110 Public Welfare Regulations Relating to Public Welfare (Continued) CORPORATION... AMERICORPS PROGRAM ASSISTANCE § 2523.110 Can Federal agencies submit multiple applications? No. The...
DOE Office of Scientific and Technical Information (OSTI.GOV)
Su, L; Du, X; Liu, T
Purpose: As a module of ARCHER -- Accelerated Radiation-transport Computations in Heterogeneous EnviRonments, ARCHER{sub RT} is designed for RadioTherapy (RT) dose calculation. This paper describes the application of ARCHERRT on patient-dependent TomoTherapy and patient-independent IMRT. It also conducts a 'fair' comparison of different GPUs and multicore CPU. Methods: The source input used for patient-dependent TomoTherapy is phase space file (PSF) generated from optimized plan. For patient-independent IMRT, the open filed PSF is used for different cases. The intensity modulation is simulated by fluence map. The GEANT4 code is used as benchmark. DVH and gamma index test are employed to evaluatemore » the accuracy of ARCHER{sub RT} code. Some previous studies reported misleading speedups by comparing GPU code with serial CPU code. To perform a fairer comparison, we write multi-thread code with OpenMP to fully exploit computing potential of CPU. The hardware involved in this study are a 6-core Intel E5-2620 CPU and 6 NVIDIA M2090 GPUs, a K20 GPU and a K40 GPU. Results: Dosimetric results from ARCHER{sub RT} and GEANT4 show good agreement. The 2%/2mm gamma test pass rates for different clinical cases are 97.2% to 99.7%. A single M2090 GPU needs 50~79 seconds for the simulation to achieve a statistical error of 1% in the PTV. The K40 card is about 1.7∼1.8 times faster than M2090 card. Using 6 M2090 card, the simulation can be finished in about 10 seconds. For comparison, Intel E5-2620 needs 507∼879 seconds for the same simulation. Conclusion: We successfully applied ARCHER{sub RT} to Tomotherapy and patient-independent IMRT, and conducted a fair comparison between GPU and CPU performance. The ARCHER{sub RT} code is both accurate and efficient and may be used towards clinical applications.« less
A hybrid short read mapping accelerator
2013-01-01
Background The rapid growth of short read datasets poses a new challenge to the short read mapping problem in terms of sensitivity and execution speed. Existing methods often use a restrictive error model for computing the alignments to improve speed, whereas more flexible error models are generally too slow for large-scale applications. A number of short read mapping software tools have been proposed. However, designs based on hardware are relatively rare. Field programmable gate arrays (FPGAs) have been successfully used in a number of specific application areas, such as the DSP and communications domains due to their outstanding parallel data processing capabilities, making them a competitive platform to solve problems that are “inherently parallel”. Results We present a hybrid system for short read mapping utilizing both FPGA-based hardware and CPU-based software. The computation intensive alignment and the seed generation operations are mapped onto an FPGA. We present a computationally efficient, parallel block-wise alignment structure (Align Core) to approximate the conventional dynamic programming algorithm. The performance is compared to the multi-threaded CPU-based GASSST and BWA software implementations. For single-end alignment, our hybrid system achieves faster processing speed than GASSST (with a similar sensitivity) and BWA (with a higher sensitivity); for pair-end alignment, our design achieves a slightly worse sensitivity than that of BWA but has a higher processing speed. Conclusions This paper shows that our hybrid system can effectively accelerate the mapping of short reads to a reference genome based on the seed-and-extend approach. The performance comparison to the GASSST and BWA software implementations under different conditions shows that our hybrid design achieves a high degree of sensitivity and requires less overall execution time with only modest FPGA resource utilization. Our hybrid system design also shows that the performance bottleneck for the short read mapping problem can be changed from the alignment stage to the seed generation stage, which provides an additional requirement for the future development of short read aligners. PMID:23441908
Innovative applications of food-related emulsions.
Kiokias, S; Varzakas, T
2017-10-13
Research on oxidative stability of multiple emulsions is very scarce. Given that this is a relevant topic that must be ascertained before the successful application of multiple emulsions in foods (especially when a combination of highly unsaturated oils is used as a lipid phase), this review mainly focuses on various aspects of the multiple emulsions. Fat replacement in meat products using emulsions is critically discussed along with innovative applications of natural antioxidants in food-based emulsions and multiple emulsions based on bioactive compounds/encapsulation as well as confectionery products.
System, methods and apparatus for program optimization for multi-threaded processor architectures
Bastoul, Cedric; Lethin, Richard A; Leung, Allen K; Meister, Benoit J; Szilagyi, Peter; Vasilache, Nicolas T; Wohlford, David E
2015-01-06
Methods, apparatus and computer software product for source code optimization are provided. In an exemplary embodiment, a first custom computing apparatus is used to optimize the execution of source code on a second computing apparatus. In this embodiment, the first custom computing apparatus contains a memory, a storage medium and at least one processor with at least one multi-stage execution unit. The second computing apparatus contains at least two multi-stage execution units that allow for parallel execution of tasks. The first custom computing apparatus optimizes the code for parallelism, locality of operations and contiguity of memory accesses on the second computing apparatus. This Abstract is provided for the sole purpose of complying with the Abstract requirement rules. This Abstract is submitted with the explicit understanding that it will not be used to interpret or to limit the scope or the meaning of the claims.
Real-time video compressing under DSP/BIOS
NASA Astrophysics Data System (ADS)
Chen, Qiu-ping; Li, Gui-ju
2009-10-01
This paper presents real-time MPEG-4 Simple Profile video compressing based on the DSP processor. The programming framework of video compressing is constructed using TMS320C6416 Microprocessor, TDS510 simulator and PC. It uses embedded real-time operating system DSP/BIOS and the API functions to build periodic function, tasks and interruptions etcs. Realize real-time video compressing. To the questions of data transferring among the system. Based on the architecture of the C64x DSP, utilized double buffer switched and EDMA data transfer controller to transit data from external memory to internal, and realize data transition and processing at the same time; the architecture level optimizations are used to improve software pipeline. The system used DSP/BIOS to realize multi-thread scheduling. The whole system realizes high speed transition of a great deal of data. Experimental results show the encoder can realize real-time encoding of 768*576, 25 frame/s video images.
Computing element evolution towards Exascale and its impact on legacy simulation codes
NASA Astrophysics Data System (ADS)
Colin de Verdière, Guillaume J. L.
2015-12-01
In the light of the current race towards the Exascale, this article highlights the main features of the forthcoming computing elements that will be at the core of next generations of supercomputers. The market analysis, underlying this work, shows that computers are facing a major evolution in terms of architecture. As a consequence, it is important to understand the impacts of those evolutions on legacy codes or programming methods. The problems of dissipated power and memory access are discussed and will lead to a vision of what should be an exascale system. To survive, programming languages had to respond to the hardware evolutions either by evolving or with the creation of new ones. From the previous elements, we elaborate why vectorization, multithreading, data locality awareness and hybrid programming will be the key to reach the exascale, implying that it is time to start rewriting codes.
Chen, Zhaoxue; Chen, Hao
2014-01-01
A deconvolution method based on the Gaussian radial basis function (GRBF) interpolation is proposed. Both the original image and Gaussian point spread function are expressed as the same continuous GRBF model, thus image degradation is simplified as convolution of two continuous Gaussian functions, and image deconvolution is converted to calculate the weighted coefficients of two-dimensional control points. Compared with Wiener filter and Lucy-Richardson algorithm, the GRBF method has an obvious advantage in the quality of restored images. In order to overcome such a defect of long-time computing, the method of graphic processing unit multithreading or increasing space interval of control points is adopted, respectively, to speed up the implementation of GRBF method. The experiments show that based on the continuous GRBF model, the image deconvolution can be efficiently implemented by the method, which also has a considerable reference value for the study of three-dimensional microscopic image deconvolution.
A study of the parallel algorithm for large-scale DC simulation of nonlinear systems
NASA Astrophysics Data System (ADS)
Cortés Udave, Diego Ernesto; Ogrodzki, Jan; Gutiérrez de Anda, Miguel Angel
Newton-Raphson DC analysis of large-scale nonlinear circuits may be an extremely time consuming process even if sparse matrix techniques and bypassing of nonlinear models calculation are used. A slight decrease in the time required for this task may be enabled on multi-core, multithread computers if the calculation of the mathematical models for the nonlinear elements as well as the stamp management of the sparse matrix entries are managed through concurrent processes. This numerical complexity can be further reduced via the circuit decomposition and parallel solution of blocks taking as a departure point the BBD matrix structure. This block-parallel approach may give a considerable profit though it is strongly dependent on the system topology and, of course, on the processor type. This contribution presents the easy-parallelizable decomposition-based algorithm for DC simulation and provides a detailed study of its effectiveness.
NASA Astrophysics Data System (ADS)
Rodrigues, Manuel J.; Fernandes, David E.; Silveirinha, Mário G.; Falcão, Gabriel
2018-01-01
This work introduces a parallel computing framework to characterize the propagation of electron waves in graphene-based nanostructures. The electron wave dynamics is modeled using both "microscopic" and effective medium formalisms and the numerical solution of the two-dimensional massless Dirac equation is determined using a Finite-Difference Time-Domain scheme. The propagation of electron waves in graphene superlattices with localized scattering centers is studied, and the role of the symmetry of the microscopic potential in the electron velocity is discussed. The computational methodologies target the parallel capabilities of heterogeneous multi-core CPU and multi-GPU environments and are built with the OpenCL parallel programming framework which provides a portable, vendor agnostic and high throughput-performance solution. The proposed heterogeneous multi-GPU implementation achieves speedup ratios up to 75x when compared to multi-thread and multi-core CPU execution, reducing simulation times from several hours to a couple of minutes.
Computer simulations and real-time control of ELT AO systems using graphical processing units
NASA Astrophysics Data System (ADS)
Wang, Lianqi; Ellerbroek, Brent
2012-07-01
The adaptive optics (AO) simulations at the Thirty Meter Telescope (TMT) have been carried out using the efficient, C based multi-threaded adaptive optics simulator (MAOS, http://github.com/lianqiw/maos). By porting time-critical parts of MAOS to graphical processing units (GPU) using NVIDIA CUDA technology, we achieved a 10 fold speed up for each GTX 580 GPU used compared to a modern quad core CPU. Each time step of full scale end to end simulation for the TMT narrow field infrared AO system (NFIRAOS) takes only 0.11 second in a desktop with two GTX 580s. We also demonstrate that the TMT minimum variance reconstructor can be assembled in matrix vector multiply (MVM) format in 8 seconds with 8 GTX 580 GPUs, meeting the TMT requirement for updating the reconstructor. Analysis show that it is also possible to apply the MVM using 8 GTX 580s within the required latency.
Basic Requirements for Systems Software Research and Development
NASA Technical Reports Server (NTRS)
Kuszmaul, Chris; Nitzberg, Bill
1996-01-01
Our success over the past ten years evaluating and developing advanced computing technologies has been due to a simple research and development (R/D) model. Our model has three phases: (a) evaluating the state-of-the-art, (b) identifying problems and creating innovations, and (c) developing solutions, improving the state- of-the-art. This cycle has four basic requirements: a large production testbed with real users, a diverse collection of state-of-the-art hardware, facilities for evalua- tion of emerging technologies and development of innovations, and control over system management on these testbeds. Future research will be irrelevant and future products will not work if any of these requirements is eliminated. In order to retain our effectiveness, the numerical aerospace simulator (NAS) must replace out-of-date production testbeds in as timely a fashion as possible, and cannot afford to ignore innovative designs such as new distributed shared memory machines, clustered commodity-based computers, and multi-threaded architectures.
Exact and Approximate Probabilistic Symbolic Execution
NASA Technical Reports Server (NTRS)
Luckow, Kasper; Pasareanu, Corina S.; Dwyer, Matthew B.; Filieri, Antonio; Visser, Willem
2014-01-01
Probabilistic software analysis seeks to quantify the likelihood of reaching a target event under uncertain environments. Recent approaches compute probabilities of execution paths using symbolic execution, but do not support nondeterminism. Nondeterminism arises naturally when no suitable probabilistic model can capture a program behavior, e.g., for multithreading or distributed systems. In this work, we propose a technique, based on symbolic execution, to synthesize schedulers that resolve nondeterminism to maximize the probability of reaching a target event. To scale to large systems, we also introduce approximate algorithms to search for good schedulers, speeding up established random sampling and reinforcement learning results through the quantification of path probabilities based on symbolic execution. We implemented the techniques in Symbolic PathFinder and evaluated them on nondeterministic Java programs. We show that our algorithms significantly improve upon a state-of- the-art statistical model checking algorithm, originally developed for Markov Decision Processes.
Tomo3D 2.0--exploitation of advanced vector extensions (AVX) for 3D reconstruction.
Agulleiro, Jose-Ignacio; Fernandez, Jose-Jesus
2015-02-01
Tomo3D is a program for fast tomographic reconstruction on multicore computers. Its high speed stems from code optimization, vectorization with Streaming SIMD Extensions (SSE), multithreading and optimization of disk access. Recently, Advanced Vector eXtensions (AVX) have been introduced in the x86 processor architecture. Compared to SSE, AVX double the number of simultaneous operations, thus pointing to a potential twofold gain in speed. However, in practice, achieving this potential is extremely difficult. Here, we provide a technical description and an assessment of the optimizations included in Tomo3D to take advantage of AVX instructions. Tomo3D 2.0 allows huge reconstructions to be calculated in standard computers in a matter of minutes. Thus, it will be a valuable tool for electron tomography studies with increasing resolution needs. Copyright © 2014 Elsevier Inc. All rights reserved.
ImpulseDE: detection of differentially expressed genes in time series data using impulse models.
Sander, Jil; Schultze, Joachim L; Yosef, Nir
2017-03-01
Perturbations in the environment lead to distinctive gene expression changes within a cell. Observed over time, those variations can be characterized by single impulse-like progression patterns. ImpulseDE is an R package suited to capture these patterns in high throughput time series datasets. By fitting a representative impulse model to each gene, it reports differentially expressed genes across time points from a single or between two time courses from two experiments. To optimize running time, the code uses clustering and multi-threading. By applying ImpulseDE , we demonstrate its power to represent underlying biology of gene expression in microarray and RNA-Seq data. ImpulseDE is available on Bioconductor ( https://bioconductor.org/packages/ImpulseDE/ ). niryosef@berkeley.edu. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yoginath, Srikanth B; Perumalla, Kalyan S; Henz, Brian J
2012-01-01
In prior work (Yoginath and Perumalla, 2011; Yoginath, Perumalla and Henz, 2012), the motivation, challenges and issues were articulated in favor of virtual time ordering of Virtual Machines (VMs) in network simulations hosted on multi-core machines. Two major components in the overall virtualization challenge are (1) virtual timeline establishment and scheduling of VMs, and (2) virtualization of inter-VM communication. Here, we extend prior work by presenting scaling results for the first component, with experiment results on up to 128 VMs scheduled in virtual time order on a single 12-core host. We also explore the solution space of design alternatives formore » the second component, and present performance results from a multi-threaded, multi-queue implementation of inter-VM network control for synchronized execution with VM scheduling, incorporated in our NetWarp simulation system.« less
Optimizing Integrated Terminal Airspace Operations Under Uncertainty
NASA Technical Reports Server (NTRS)
Bosson, Christabelle; Xue, Min; Zelinski, Shannon
2014-01-01
In the terminal airspace, integrated departures and arrivals have the potential to increase operations efficiency. Recent research has developed geneticalgorithm- based schedulers for integrated arrival and departure operations under uncertainty. This paper presents an alternate method using a machine jobshop scheduling formulation to model the integrated airspace operations. A multistage stochastic programming approach is chosen to formulate the problem and candidate solutions are obtained by solving sample average approximation problems with finite sample size. Because approximate solutions are computed, the proposed algorithm incorporates the computation of statistical bounds to estimate the optimality of the candidate solutions. A proof-ofconcept study is conducted on a baseline implementation of a simple problem considering a fleet mix of 14 aircraft evolving in a model of the Los Angeles terminal airspace. A more thorough statistical analysis is also performed to evaluate the impact of the number of scenarios considered in the sampled problem. To handle extensive sampling computations, a multithreading technique is introduced.
The CMS TierO goes Cloud and Grid for LHC Run 2
NASA Astrophysics Data System (ADS)
Hufnagel, Dirk
2015-12-01
In 2015, CMS will embark on a new era of collecting LHC collisions at unprecedented rates and complexity. This will put a tremendous stress on our computing systems. Prompt Processing of the raw data by the Tier-0 infrastructure will no longer be constrained to CERN alone due to the significantly increased resource requirements. In LHC Run 2, we will need to operate it as a distributed system utilizing both the CERN Cloud-based Agile Infrastructure and a significant fraction of the CMS Tier-1 Grid resources. In another big change for LHC Run 2, we will process all data using the multi-threaded framework to deal with the increased event complexity and to ensure efficient use of the resources. This contribution will cover the evolution of the Tier-0 infrastructure and present scale testing results and experiences from the first data taking in 2015.
GPU-Powered Coherent Beamforming
NASA Astrophysics Data System (ADS)
Magro, A.; Adami, K. Zarb; Hickish, J.
2015-03-01
Graphics processing units (GPU)-based beamforming is a relatively unexplored area in radio astronomy, possibly due to the assumption that any such system will be severely limited by the PCIe bandwidth required to transfer data to the GPU. We have developed a CUDA-based GPU implementation of a coherent beamformer, specifically designed and optimized for deployment at the BEST-2 array which can generate an arbitrary number of synthesized beams for a wide range of parameters. It achieves ˜1.3 TFLOPs on an NVIDIA Tesla K20, approximately 10x faster than an optimized, multithreaded CPU implementation. This kernel has been integrated into two real-time, GPU-based time-domain software pipelines deployed at the BEST-2 array in Medicina: a standalone beamforming pipeline and a transient detection pipeline. We present performance benchmarks for the beamforming kernel as well as the transient detection pipeline with beamforming capabilities as well as results of test observation.
Gilgamesh: A Multithreaded Processor-In-Memory Architecture for Petaflops Computing
NASA Technical Reports Server (NTRS)
Sterling, T. L.; Zima, H. P.
2002-01-01
Processor-in-Memory (PIM) architectures avoid the von Neumann bottleneck in conventional machines by integrating high-density DRAM and CMOS logic on the same chip. Parallel systems based on this new technology are expected to provide higher scalability, adaptability, robustness, fault tolerance and lower power consumption than current MPPs or commodity clusters. In this paper we describe the design of Gilgamesh, a PIM-based massively parallel architecture, and elements of its execution model. Gilgamesh extends existing PIM capabilities by incorporating advanced mechanisms for virtualizing tasks and data and providing adaptive resource management for load balancing and latency tolerance. The Gilgamesh execution model is based on macroservers, a middleware layer which supports object-based runtime management of data and threads allowing explicit and dynamic control of locality and load balancing. The paper concludes with a discussion of related research activities and an outlook to future work.
Real-Time Processing System for the JET Hard X-Ray and Gamma-Ray Profile Monitor Enhancement
NASA Astrophysics Data System (ADS)
Fernandes, Ana M.; Pereira, Rita C.; Neto, André; Valcárcel, Daniel F.; Alves, Diogo; Sousa, Jorge; Carvalho, Bernardo B.; Kiptily, Vasily; Syme, Brian; Blanchard, Patrick; Murari, Andrea; Correia, Carlos M. B. A.; Varandas, Carlos A. F.; Gonçalves, Bruno
2014-06-01
The Joint European Torus (JET) is currently undertaking an enhancement program which includes tests of relevant diagnostics with real-time processing capabilities for the International Thermonuclear Experimental Reactor (ITER). Accordingly, a new real-time processing system was developed and installed at JET for the gamma-ray and hard X-ray profile monitor diagnostic. The new system is connected to 19 CsI(Tl) photodiodes in order to obtain the line-integrated profiles of the gamma-ray and hard X-ray emissions. Moreover, it was designed to overcome the former data acquisition (DAQ) limitations while exploiting the required real-time features. The new DAQ hardware, based on the Advanced Telecommunication Computer Architecture (ATCA) standard, includes reconfigurable digitizer modules with embedded field-programmable gate array (FPGA) devices capable of acquiring and simultaneously processing data in real-time from the 19 detectors. A suitable algorithm was developed and implemented in the FPGAs, which are able to deliver the corresponding energy of the acquired pulses. The processed data is sent periodically, during the discharge, through the JET real-time network and stored in the JET scientific databases at the end of the pulse. The interface between the ATCA digitizers, the JET control and data acquisition system (CODAS), and the JET real-time network is provided by the Multithreaded Application Real-Time executor (MARTe). The work developed allowed attaining two of the major milestones required by next fusion devices: the ability to process and simultaneously supply high volume data rates in real-time.
FPGA accelerator for protein secondary structure prediction based on the GOR algorithm
2011-01-01
Background Protein is an important molecule that performs a wide range of functions in biological systems. Recently, the protein folding attracts much more attention since the function of protein can be generally derived from its molecular structure. The GOR algorithm is one of the most successful computational methods and has been widely used as an efficient analysis tool to predict secondary structure from protein sequence. However, the execution time is still intolerable with the steep growth in protein database. Recently, FPGA chips have emerged as one promising application accelerator to accelerate bioinformatics algorithms by exploiting fine-grained custom design. Results In this paper, we propose a complete fine-grained parallel hardware implementation on FPGA to accelerate the GOR-IV package for 2D protein structure prediction. To improve computing efficiency, we partition the parameter table into small segments and access them in parallel. We aggressively exploit data reuse schemes to minimize the need for loading data from external memory. The whole computation structure is carefully pipelined to overlap the sequence loading, computing and back-writing operations as much as possible. We implemented a complete GOR desktop system based on an FPGA chip XC5VLX330. Conclusions The experimental results show a speedup factor of more than 430x over the original GOR-IV version and 110x speedup over the optimized version with multi-thread SIMD implementation running on a PC platform with AMD Phenom 9650 Quad CPU for 2D protein structure prediction. However, the power consumption is only about 30% of that of current general-propose CPUs. PMID:21342582
NASA Astrophysics Data System (ADS)
Liu, Tianyu; Wolfe, Noah; Lin, Hui; Zieb, Kris; Ji, Wei; Caracappa, Peter; Carothers, Christopher; Xu, X. George
2017-09-01
This paper contains two parts revolving around Monte Carlo transport simulation on Intel Many Integrated Core coprocessors (MIC, also known as Xeon Phi). (1) MCNP 6.1 was recompiled into multithreading (OpenMP) and multiprocessing (MPI) forms respectively without modification to the source code. The new codes were tested on a 60-core 5110P MIC. The test case was FS7ONNi, a radiation shielding problem used in MCNP's verification and validation suite. It was observed that both codes became slower on the MIC than on a 6-core X5650 CPU, by a factor of 4 for the MPI code and, abnormally, 20 for the OpenMP code, and both exhibited limited capability of strong scaling. (2) We have recently added a Constructive Solid Geometry (CSG) module to our ARCHER code to provide better support for geometry modelling in radiation shielding simulation. The functions of this module are frequently called in the particle random walk process. To identify the performance bottleneck we developed a CSG proxy application and profiled the code using the geometry data from FS7ONNi. The profiling data showed that the code was primarily memory latency bound on the MIC. This study suggests that despite low initial porting e_ort, Monte Carlo codes do not naturally lend themselves to the MIC platform — just like to the GPUs, and that the memory latency problem needs to be addressed in order to achieve decent performance gain.
Developing a customized multiple interview for dental school admissions.
Gardner, Karen M
2014-04-01
From the early 1980s until recently, the University of British Columbia Faculty of Dentistry had employed the Canadian Dental Association (CDA) Structured Interview in its Phase 2 admissions process (with those applicants invited for interviews). While this structured interview had demonstrated reliability and validity, the Faculty of Dentistry came to believe that a multiple interview process using scenarios would help it better identify applicants who would match its mission. After a literature review that investigated such interview protocols as unstructured, semi-structured, computerized, and telephone formats, a multiple interview format was chosen. This format was seen as an emerging trend, with evidence that it has been deemed fairer by applicants, more reliable by interviewers, more difficult for applicants to provide set answers for the scenarios, and not to require as many interviewers as other formats. This article describes the process undertaken to implement a customized multiple interview format for admissions and reports these outcomes of the process: a smoothly running multiple interview; effective training protocols for staff, interviewers, and applicants; and reports from successful applicants and interviewers that they felt the multiple interview was a more reliable and fairer recruiting tool than other models.
Code of Federal Regulations, 2010 CFR
2010-07-01
... 34 Education 1 2010-07-01 2010-07-01 false What are the procedures for using a multiple tier... applications received. (d) The Secretary may, in any tier— (1) Use more than one group of experts to gain... procedures for using a multiple tier review process to evaluate applications? (a) The Secretary may use a...
Automatic multiple applicator electrophoresis
NASA Technical Reports Server (NTRS)
Grunbaum, B. W.
1977-01-01
Easy-to-use, economical device permits electrophoresis on all known supporting media. System includes automatic multiple-sample applicator, sample holder, and electrophoresis apparatus. System has potential applicability to fields of taxonomy, immunology, and genetics. Apparatus is also used for electrofocusing.
NASA Astrophysics Data System (ADS)
Aboulbanine, Zakaria; El Khayati, Naïma
2018-04-01
The use of phase space in medical linear accelerator Monte Carlo (MC) simulations significantly improves the execution time and leads to results comparable to those obtained from full calculations. The classical representation of phase space stores directly the information of millions of particles, producing bulky files. This paper presents a virtual source model (VSM) based on a reconstruction algorithm, taking as input a compressed file of roughly 800 kb derived from phase space data freely available in the International Atomic Energy Agency (IAEA) database. This VSM includes two main components; primary and scattered particle sources, with a specific reconstruction method developed for each. Energy spectra and other relevant variables were extracted from IAEA phase space and stored in the input description data file for both sources. The VSM was validated for three photon beams: Elekta Precise 6 MV/10 MV and a Varian TrueBeam 6 MV. Extensive calculations in water and comparisons between dose distributions of the VSM and IAEA phase space were performed to estimate the VSM precision. The Geant4 MC toolkit in multi-threaded mode (Geant4-[mt]) was used for fast dose calculations and optimized memory use. Four field configurations were chosen for dose calculation validation to test field size and symmetry effects, , , and for squared fields, and for an asymmetric rectangular field. Good agreement in terms of formalism, for 3%/3 mm and 2%/3 mm criteria, for each evaluated radiation field and photon beam was obtained within a computation time of 60 h on a single WorkStation for a 3 mm voxel matrix. Analyzing the VSM’s precision in high dose gradient regions, using the distance to agreement concept (DTA), showed also satisfactory results. In all investigated cases, the mean DTA was less than 1 mm in build-up and penumbra regions. In regards to calculation efficiency, the event processing speed is six times faster using Geant4-[mt] compared to sequential Geant4, when running the same simulation code for both. The developed VSM for 6 MV/10 MV beams widely used, is a general concept easy to adapt in order to reconstruct comparable beam qualities for various linac configurations, facilitating its integration for MC treatment planning purposes.
NASA Astrophysics Data System (ADS)
Walters, D. M.; Venarsky, M. P.; Hall, R. O., Jr.; Herdrich, A.; Livers, B.; Winkelman, D.; Wohl, E.
2014-12-01
Forest age and local valley morphometry strongly influence the form and function of mountain streams in Colorado. Streams in valleys with old growth forest (>350 years) have extensive log jam complexes that create multi-thread channel reaches with extensive pool habitat and large depositional areas. Streams in younger unmanaged forests (e.g., 120 years old) and intensively managed forests have much fewer log jams and lower wood loads. These are single-thread streams dominated by riffles and with little depositional habitat. We hypothesized that log jam streams would retain more organic matter and have higher metabolism, leading to greater production of stream macroinvertebrates and trout. Log jam reaches should also have greater emergence of adult aquatic insects, and consequently have higher densities of riparian spiders taking advantage of these prey. Surficial organic matter was 3-fold higher in old-growth streams, and these streams had much higher ecosystem respiration. Insect production (g m2 y-1) was similar among forest types, but fish density was four times higher in old-growth streams with copious log jams. However, at the valley scale, insect production (g m-1 valley-1) and trout density (number m-1 valley-1) was 2-fold and 10-fold higher, respectively, in old growth streams. This finding is because multi-thread reaches created by log jams have much greater stream area and stream length per meter of valley than single-thread channels. The more limited response of macroinvertebrates may be related to fish predation. Trout in old growth streams had similar growth rates and higher fat content than fish in other streams in spite of occurring at higher densities and higher elevation/colder temperatures. This suggests that the positive fish effect observed in old growth streams is related to greater availability of invertebrate prey, which is consistent with our original hypothesis. Preliminary analyses suggest that spider densities do not respond strongly to differences in stream morphology, but rather to changes in elevation and associated air temperatures. These results demonstrate strong indirect effects of forest age and valley morphometry on organic matter storage and animal secondary production in streams that is mediated by direct effects associated with the presence or absence of logjams.
Multiresource allocation and scheduling for periodic soft real-time applications
NASA Astrophysics Data System (ADS)
Gopalan, Kartik; Chiueh, Tzi-cker
2001-12-01
Real-time applications that utilize multiple system resources, such as CPU, disks, and network links, require coordinated scheduling of these resources in order to meet their end-to-end performance requirements. Most state-of-the-art operating systems support independent resource allocation and deadline-driven scheduling but lack coordination among multiple heterogeneous resources. This paper describes the design and implementation of an Integrated Real-time Resource Scheduler (IRS) that performs coordinated allocation and scheduling of multiple heterogeneous resources on the same machine for periodic soft real-time application. The principal feature of IRS is a heuristic multi-resource allocation algorithm that reserves multiple resources for real-time applications in a manner that can maximize the number of applications admitted into the system in the long run. At run-time, a global scheduler dispatches the tasks of the soft real-time application to individual resource schedulers according to the precedence constraints between tasks. The individual resource schedulers, which could be any deadline based schedulers, can make scheduling decisions locally and yet collectively satisfy a real-time application's performance requirements. The tightness of overall timing guarantees is ultimately determined by the properties of individual resource schedulers. However, IRS maximizes overall system resource utilization efficiency by coordinating deadline assignment across multiple tasks in a soft real-time application.
ERIC Educational Resources Information Center
Yurt, Eyüp; Polat, Seyat
2015-01-01
The purpose of this study was to examine the effectiveness of multiple intelligence applications on academic achievement in Turkey. Accordingly, findings of independent research studies aimed to find out effectiveness of multiple intelligence applications are gathered in a meta-analysis. Total of 71 studies, 66 dissertations and 7 articles were…
Telerobotic management system: coordinating multiple human operators with multiple robots
NASA Astrophysics Data System (ADS)
King, Jamie W.; Pretty, Raymond; Brothers, Brendan; Gosine, Raymond G.
2003-09-01
This paper describes an application called the Tele-robotic management system (TMS) for coordinating multiple operators with multiple robots for applications such as underground mining. TMS utilizes several graphical interfaces to allow the user to define a partially ordered plan for multiple robots. This plan is then converted to a Petri net for execution and monitoring. TMS uses a distributed framework to allow robots and operators to easily integrate with the applications. This framework allows robots and operators to join the network and advertise their capabilities through services. TMS then decides whether tasks should be dispatched to a robot or a remote operator based on the services offered by the robots and operators.
CMOS imager for pointing and tracking applications
NASA Technical Reports Server (NTRS)
Sun, Chao (Inventor); Pain, Bedabrata (Inventor); Yang, Guang (Inventor); Heynssens, Julie B. (Inventor)
2006-01-01
Systems and techniques to realize pointing and tracking applications with CMOS imaging devices. In general, in one implementation, the technique includes: sampling multiple rows and multiple columns of an active pixel sensor array into a memory array (e.g., an on-chip memory array), and reading out the multiple rows and multiple columns sampled in the memory array to provide image data with reduced motion artifact. Various operation modes may be provided, including TDS, CDS, CQS, a tracking mode to read out multiple windows, and/or a mode employing a sample-first-read-later readout scheme. The tracking mode can take advantage of a diagonal switch array. The diagonal switch array, the active pixel sensor array and the memory array can be integrated onto a single imager chip with a controller. This imager device can be part of a larger imaging system for both space-based applications and terrestrial applications.
Sonnemann, Eckart
2008-10-01
The introduction of sequentially rejective multiple test procedures (Einot and Gabriel, 1975; Naik, 1975; Holm, 1977; Holm, 1979) has caused considerable progress in the theory of multiple comparisons. Emphasizing the closure of multiple tests we give a survey of the general theory and its recent results in applications. Some new applications are given including a discussion of the connection with the theory of confidence regions.
NASA Astrophysics Data System (ADS)
Helsy, I.; Maryamah; Farida, I.; Ramdhani, M. A.
2017-09-01
This study aimed to describe the application of teaching materials, analyze the increase in the ability of students to connect the three levels of representation and student responses after application of multiple representations based teaching materials chemistry. The method used quasi one-group pretest-posttest design to 71 students. The results showed the application of teaching materials carried 88% with very good category. A significant increase ability to connect the three levels of representation of students after the application of multiple representations based teaching materials chemistry with t-value > t-crit (11.402 > 1.991). Recapitulation N-gain pretest and posttest showed relatively similar for all groups is 0.6 criterion being achievement. Students gave a positive response to the application of multiple representations based teaching materials chemistry. Students agree teaching materials used in teaching chemistry (88%), and agrees teaching materials to provide convenience in connecting the three levels of representation (95%).
Analytical group decision making in natural resources: methodology and application
Daniel L. Schmoldt; David L. Peterson
2000-01-01
Group decision making is becoming increasingly important in natural resource management and associated scientific applications, because multiple values are treated coincidentally in time and space, multiple resource specialists are needed, and multiple stakeholders must be included in the decision process. Decades of social science research on decision making in groups...
24 CFR 17.158 - Application of offset funds: Multiple debts.
Code of Federal Regulations, 2010 CFR
2010-04-01
...: Multiple debts. 17.158 Section 17.158 Housing and Urban Development Office of the Secretary, Department of... Government Irs Tax Refund and Federal Payment Offset Provisions and Administrative Wage Garnishment § 17.158 Application of offset funds: Multiple debts. The Secretary will use the procedures set out in § 17.157 for the...
Bednarska, Agnieszka J; Wyżga, Bartłomiej; Mikuś, Paweł; Kędzior, Renata
2018-01-01
Effects of passive restoration of mountain rivers on the organisms inhabiting exposed riverine sediments are considerably less understood than those concerning aquatic biota. Thus, the effects of a recovery of the Raba River after abandonment of maintenance of its channelization scheme on ground beetle (Coleoptera: Carabidae) communities were investigated by comparing 6 unmanaged cross-sections and 6 cross-sections from adjacent channelized reaches. In each cross-section, ground beetles were collected from 12 sampling sites in spring, summer, and autumn, and 8 habitat parameters characterizing the cross-sections and sampling sites were determined. Within a few years after abandonment of the Raba River channelization scheme, the width of this gravel-bed river increased up to three times and its multi-thread pattern became re-established. Consequently, unmanaged river cross-sections had significantly larger channel width and more low-flow channels and eroding cutbanks than channelized cross-sections. Moreover, sampling sites in the unmanaged cross-sections were typified by significantly steeper average surface slope and larger average distance from low-flow channels than the sites in channelized cross-sections. In total, 3992 individuals from 78 taxa were collected during the study. The ground beetle assemblages were significantly more abundant and richer in species in the unmanaged than in the channelized cross-sections but no significant differences in carabid diversity indices between the two cross-section types were recorded. Redundancy Analysis indicated active river zone width as the only variable explaining differences in abundance and species richness among the cross-sections. Multiple regression analysis indicated species diversity to predominantly depend on the degree of plant cover and substrate grain size. The study showed that increased availability of exposed sediments in the widened river reaches allowed ground beetles to increase their abundance and species richness within a few years after the onset of river restoration, but more time may be needed for development of more diverse carabid communities. Copyright © 2017 Elsevier B.V. All rights reserved.
NASA Technical Reports Server (NTRS)
Riedel, Joseph E.; Grasso, Christopher A.
2012-01-01
VML (Virtual Machine Language) is an advanced computing environment that allows spacecraft to operate using mechanisms ranging from simple, time-oriented sequencing to advanced, multicomponent reactive systems. VML has developed in four evolutionary stages. VML 0 is a core execution capability providing multi-threaded command execution, integer data types, and rudimentary branching. VML 1 added named parameterized procedures, extensive polymorphism, data typing, branching, looping issuance of commands using run-time parameters, and named global variables. VML 2 added for loops, data verification, telemetry reaction, and an open flight adaptation architecture. VML 2.1 contains major advances in control flow capabilities for executable state machines. On the resource requirements front, VML 2.1 features a reduced memory footprint in order to fit more capability into modestly sized flight processors, and endian-neutral data access for compatibility with Intel little-endian processors. Sequence packaging has been improved with object-oriented programming constructs and the use of implicit (rather than explicit) time tags on statements. Sequence event detection has been significantly enhanced with multi-variable waiting, which allows a sequence to detect and react to conditions defined by complex expressions with multiple global variables. This multi-variable waiting serves as the basis for implementing parallel rule checking, which in turn, makes possible executable state machines. The new state machine feature in VML 2.1 allows the creation of sophisticated autonomous reactive systems without the need to develop expensive flight software. Users specify named states and transitions, along with the truth conditions required, before taking transitions. Transitions with the same signal name allow separate state machines to coordinate actions: the conditions distributed across all state machines necessary to arm a particular signal are evaluated, and once found true, that signal is raised. The selected signal then causes all identically named transitions in all present state machines to be taken simultaneously. VML 2.1 has relevance to all potential space missions, both manned and unmanned. It was under consideration for use on Orion.
SIPSMetGen: It's Not Just For Aircraft Data and ECS Anymore.
NASA Astrophysics Data System (ADS)
Schwab, M.
2015-12-01
The SIPSMetGen utility, developed for the NASA EOSDIS project, under the EED contract, simplified the creation of file level metadata for the ECS System. The utility has been enhanced for ease of use, efficiency, speed and increased flexibility. The SIPSMetGen utility was originally created as a means of generating file level spatial metadata for Operation IceBridge. The first version created only ODL metadata, specific for ingest into ECS. The core strength of the utility was, and continues to be, its ability to take complex shapes and patterns of data collection point clouds from aircraft flights and simplify them to a relatively simple concave hull geo-polygon. It has been found to be a useful and easy to use tool for creating file level metadata for many other missions, both aircraft and satellite. While the original version was useful it had its limitations. In 2014 Raytheon was tasked to make enhancements to SIPSMetGen, this resulted a new version of SIPSMetGen which can create ISO Compliant XML metadata; provides optimization and streamlining of the algorithm for creating the spatial metadata; a quicker runtime with more consistent results; a utility that can be configured to run multi-threaded on systems with multiple processors. The utility comes with a java based graphical user interface to aid in configuration and running of the utility. The enhanced SIPSMetGen allows more diverse data sets to be archived with file level metadata. The advantage of archiving data with file level metadata is that it makes it easier for data users, and scientists to find relevant data. File level metadata unlocks the power of existing archives and metadata repositories such as ECS and CMR and search and discovery utilities like Reverb and Earth Data Search. Current missions now using SIPSMetGen include: Aquarius, Measures, ARISE, and Nimbus.
NASA Astrophysics Data System (ADS)
García-Flores, Agustín.; Paz-Gallardo, Abel; Plaza, Antonio; Li, Jun
2016-10-01
This paper describes a new web platform dedicated to the classification of satellite images called Hypergim. The current implementation of this platform enables users to perform classification of satellite images from any part of the world thanks to the worldwide maps provided by Google Maps. To perform this classification, Hypergim uses unsupervised algorithms like Isodata and K-means. Here, we present an extension of the original platform in which we adapt Hypergim in order to use supervised algorithms to improve the classification results. This involves a significant modification of the user interface, providing the user with a way to obtain samples of classes present in the images to use in the training phase of the classification process. Another main goal of this development is to improve the runtime of the image classification process. To achieve this goal, we use a parallel implementation of the Random Forest classification algorithm. This implementation is a modification of the well-known CURFIL software package. The use of this type of algorithms to perform image classification is widespread today thanks to its precision and ease of training. The actual implementation of Random Forest was developed using CUDA platform, which enables us to exploit the potential of several models of NVIDIA graphics processing units using them to execute general purpose computing tasks as image classification algorithms. As well as CUDA, we use other parallel libraries as Intel Boost, taking advantage of the multithreading capabilities of modern CPUs. To ensure the best possible results, the platform is deployed in a cluster of commodity graphics processing units (GPUs), so that multiple users can use the tool in a concurrent way. The experimental results indicate that this new algorithm widely outperform the previous unsupervised algorithms implemented in Hypergim, both in runtime as well as precision of the actual classification of the images.
Cosmological Particle Data Compression in Practice
NASA Astrophysics Data System (ADS)
Zeyen, M.; Ahrens, J.; Hagen, H.; Heitmann, K.; Habib, S.
2017-12-01
In cosmological simulations trillions of particles are handled and several terabytes of unstructured particle data are generated in each time step. Transferring this data directly from memory to disk in an uncompressed way results in a massive load on I/O and storage systems. Hence, one goal of domain scientists is to compress the data before storing it to disk while minimizing the loss of information. To prevent reading back uncompressed data from disk, this can be done in an in-situ process. Since the simulation continuously generates data, the available time for the compression of one time step is limited. Therefore, the evaluation of compression techniques has shifted from only focusing on compression rates to include run-times and scalability.In recent years several compression techniques for cosmological data have become available. These techniques can be either lossy or lossless, depending on the technique. For both cases, this study aims to evaluate and compare the state of the art compression techniques for unstructured particle data. This study focuses on the techniques available in the Blosc framework with its multi-threading support, the XZ Utils toolkit with the LZMA algorithm that achieves high compression rates, and the widespread FPZIP and ZFP methods for lossy compressions.For the investigated compression techniques, quantitative performance indicators such as compression rates, run-time/throughput, and reconstruction errors are measured. Based on these factors, this study offers a comprehensive analysis of the individual techniques and discusses their applicability for in-situ compression. In addition, domain specific measures are evaluated on the reconstructed data sets, and the relative error rates and statistical properties are analyzed and compared. Based on this study future challenges and directions in the compression of unstructured cosmological particle data were identified.
A parallel program for numerical simulation of discrete fracture network and groundwater flow
NASA Astrophysics Data System (ADS)
Huang, Ting-Wei; Liou, Tai-Sheng; Kalatehjari, Roohollah
2017-04-01
The ability of modeling fluid flow in Discrete Fracture Network (DFN) is critical to various applications such as exploration of reserves in geothermal and petroleum reservoirs, geological sequestration of carbon dioxide and final disposal of spent nuclear fuels. Although several commerical or acdametic DFN flow simulators are already available (e.g., FracMan and DFNWORKS), challenges in terms of computational efficiency and three-dimensional visualization still remain, which therefore motivates this study for developing a new DFN and flow simulator. A new DFN and flow simulator, DFNbox, was written in C++ under a cross-platform software development framework provided by Qt. DFNBox integrates the following capabilities into a user-friendly drop-down menu interface: DFN simulation and clipping, 3D mesh generation, fracture data analysis, connectivity analysis, flow path analysis and steady-state grounwater flow simulation. All three-dimensional visualization graphics were developed using the free OpenGL API. Similar to other DFN simulators, fractures are conceptualized as random point process in space, with stochastic characteristics represented by orientation, size, transmissivity and aperture. Fracture meshing was implemented by Delaunay triangulation for visualization but not flow simulation purposes. Boundary element method was used for flow simulations such that only unknown head or flux along exterior and interection bounaries are needed for solving the flow field in the DFN. Parallel compuation concept was taken into account in developing DFNbox for calculations that such concept is possible. For example, the time-consuming seqential code for fracture clipping calculations has been completely replaced by a highly efficient parallel one. This can greatly enhance compuational efficiency especially on multi-thread platforms. Furthermore, DFNbox have been successfully tested in Windows and Linux systems with equally-well performance.
Alber, Adrien; Piégay, Hervé
2017-11-01
An increased awareness by river managers of the importance of river channel migration to sediment dynamics, habitat complexity and other ecosystem functions has led to an advance in the science and practice of identifying, protecting or restoring specific erodible corridors across which rivers are free to migrate. One current challenge is the application of these watershed-specific goals at the regional planning scales (e.g., the European Water Framework Directive). This study provides a GIS-based spatial analysis of the channel migration rates at the regional-scale. As a case study, 99 reaches were sampled in the French part of the Rhône Basin and nearby tributaries of the Mediterranean Sea (111,300 km 2 ). We explored the spatial correlation between the channel migration rate and a set of simple variables (e.g., watershed area, channel slope, stream power, active channel width). We found that the spatial variability of the channel migration rates was primary explained by the gross stream power (R 2 = 0.48) and more surprisingly by the active channel width scaled by the watershed area. The relationship between the absolute migration rate and the gross stream power is generally consistent with the published empirical models for freely meandering rivers, whereas it is less significant for the multi-thread reaches. The discussion focused on methodological constraints for a regional-scale modelling of the migration rates, and the interpretation of the empirical models. We hypothesize that the active channel width scaled by the watershed area is a surrogate for the sediment supply which may be a more critical factor than the bank resistance for explaining the regional-scale variability of the migration rates. Copyright © 2016 Elsevier Ltd. All rights reserved.
Support for Taverna workflows in the VPH-Share cloud platform.
Kasztelnik, Marek; Coto, Ernesto; Bubak, Marian; Malawski, Maciej; Nowakowski, Piotr; Arenas, Juan; Saglimbeni, Alfredo; Testi, Debora; Frangi, Alejandro F
2017-07-01
To address the increasing need for collaborative endeavours within the Virtual Physiological Human (VPH) community, the VPH-Share collaborative cloud platform allows researchers to expose and share sequences of complex biomedical processing tasks in the form of computational workflows. The Taverna Workflow System is a very popular tool for orchestrating complex biomedical & bioinformatics processing tasks in the VPH community. This paper describes the VPH-Share components that support the building and execution of Taverna workflows, and explains how they interact with other VPH-Share components to improve the capabilities of the VPH-Share platform. Taverna workflow support is delivered by the Atmosphere cloud management platform and the VPH-Share Taverna plugin. These components are explained in detail, along with the two main procedures that were developed to enable this seamless integration: workflow composition and execution. 1) Seamless integration of VPH-Share with other components and systems. 2) Extended range of different tools for workflows. 3) Successful integration of scientific workflows from other VPH projects. 4) Execution speed improvement for medical applications. The presented workflow integration provides VPH-Share users with a wide range of different possibilities to compose and execute workflows, such as desktop or online composition, online batch execution, multithreading, remote execution, etc. The specific advantages of each supported tool are presented, as are the roles of Atmosphere and the VPH-Share plugin within the VPH-Share project. The combination of the VPH-Share plugin and Atmosphere engenders the VPH-Share infrastructure with far more flexible, powerful and usable capabilities for the VPH-Share community. As both components can continue to evolve and improve independently, we acknowledge that further improvements are still to be developed and will be described. Copyright © 2017 Elsevier B.V. All rights reserved.
Cellular response of pulp fibroblast to single or multiple photobiomodulation applications
NASA Astrophysics Data System (ADS)
Fernandes, Amanda; Lourenço Neto, Natalino; Teixeira Marques, Nadia Carolina; Lourenço Ribeiro Vitor, Luciana; Tavares Oliveira Prado, Mariel; Cardoso Oliveira, Rodrigo; Moreira Machado, Maria Aparecida Andrade; Marchini Oliveira, Thais
2018-06-01
This study aimed to evaluate in vitro the effects of single or multiple photobiomodulation (PBM) applications on the viability and proliferation of pulp fibroblasts. Pulp fibroblasts from human deciduous teeth were obtained from a biorepository, plated into 96-well plates, and irradiated according to the experimental groups. At 24 h, 48 h, and 72 h after irradiation, cell viability and proliferation were assessed through MTT and Crystal Violet assays, respectively. The intragroup comparison revealed statistically significant differences for 2.5 J cm‑2 (3×) with increasing viability at 72 h over 48 h (p = 0.027). The intergroup analysis showed a greater viability of the multiple PBM applications 2.5 J cm‑2 (3×) over the single application 7.5 J cm‑2 (1×) at 72 h. The application of 5 J cm‑2 (1×) exhibited greater proliferation than the application of 7.5 J cm‑2 (1×), 2.5 J cm‑2 (2×) and 2.5 J cm‑2 (3×). Single or multiple PBM applications demonstration different stimulatory effects on pulp fibroblast. The results show that the group submitted to multiple irradiation presented significantly higher cell viability than the groups with single irradiation at 72 h. However, the photobiomodulation therapy with single irradiations was more effective on cell proliferation at 24 h.
Application-Level Interoperability Across Grids and Clouds
NASA Astrophysics Data System (ADS)
Jha, Shantenu; Luckow, Andre; Merzky, Andre; Erdely, Miklos; Sehgal, Saurabh
Application-level interoperability is defined as the ability of an application to utilize multiple distributed heterogeneous resources. Such interoperability is becoming increasingly important with increasing volumes of data, multiple sources of data as well as resource types. The primary aim of this chapter is to understand different ways in which application-level interoperability can be provided across distributed infrastructure. We achieve this by (i) using the canonical wordcount application, based on an enhanced version of MapReduce that scales-out across clusters, clouds, and HPC resources, (ii) establishing how SAGA enables the execution of wordcount application using MapReduce and other programming models such as Sphere concurrently, and (iii) demonstrating the scale-out of ensemble-based biomolecular simulations across multiple resources. We show user-level control of the relative placement of compute and data and also provide simple performance measures and analysis of SAGA-MapReduce when using multiple, different, heterogeneous infrastructures concurrently for the same problem instance. Finally, we discuss Azure and some of the system-level abstractions that it provides and show how it is used to support ensemble-based biomolecular simulations.
Multiple Object Based RFID System Using Security Level
NASA Astrophysics Data System (ADS)
Kim, Jiyeon; Jung, Jongjin; Ryu, Ukjae; Ko, Hoon; Joe, Susan; Lee, Yongjun; Kim, Boyeon; Chang, Yunseok; Lee, Kyoonha
2007-12-01
RFID systems are increasingly applied for operational convenience in wide range of industries and individual life. However, it is uneasy for a person to control many tags because common RFID systems have the restriction that a tag used to identify just a single object. In addition, RFID systems can make some serious problems in violation of privacy and security because of their radio frequency communication. In this paper, we propose a multiple object RFID tag which can keep multiple object identifiers for different applications in a same tag. The proposed tag allows simultaneous access for their pair applications. We also propose an authentication protocol for multiple object tag to prevent serious problems of security and privacy in RFID applications. Especially, we focus on efficiency of the authentication protocol by considering security levels of applications. In the proposed protocol, the applications go through different authentication procedures according to security level of the object identifier stored in the tag. We implemented the proposed RFID scheme and made experimental results about efficiency and stability for the scheme.
Gaudi Evolution for Future Challenges
NASA Astrophysics Data System (ADS)
Clemencic, M.; Hegner, B.; Leggett, C.
2017-10-01
The LHCb Software Framework Gaudi was initially designed and developed almost twenty years ago, when computing was very different from today. It has also been used by a variety of other experiments, including ATLAS, Daya Bay, GLAST, HARP, LZ, and MINERVA. Although it has been always actively developed all these years, stability and backward compatibility have been favoured, reducing the possibilities of adopting new techniques, like multithreaded processing. R&D efforts like GaudiHive have however shown its potential to cope with the new challenges. In view of the LHC second Long Shutdown approaching and to prepare for the computing challenges for the Upgrade of the collider and the detectors, now is a perfect moment to review the design of Gaudi and plan future developments of the project. To do this LHCb, ATLAS and the Future Circular Collider community joined efforts to bring Gaudi forward and prepare it for the upcoming needs of the experiments. We present here how Gaudi will evolve in the next years and the long term development plans.
High Performance Descriptive Semantic Analysis of Semantic Graph Databases
DOE Office of Scientific and Technical Information (OSTI.GOV)
Joslyn, Cliff A.; Adolf, Robert D.; al-Saffar, Sinan
As semantic graph database technology grows to address components ranging from extant large triple stores to SPARQL endpoints over SQL-structured relational databases, it will become increasingly important to be able to understand their inherent semantic structure, whether codified in explicit ontologies or not. Our group is researching novel methods for what we call descriptive semantic analysis of RDF triplestores, to serve purposes of analysis, interpretation, visualization, and optimization. But data size and computational complexity makes it increasingly necessary to bring high performance computational resources to bear on this task. Our research group built a novel high performance hybrid system comprisingmore » computational capability for semantic graph database processing utilizing the large multi-threaded architecture of the Cray XMT platform, conventional servers, and large data stores. In this paper we describe that architecture and our methods, and present the results of our analyses of basic properties, connected components, namespace interaction, and typed paths such for the Billion Triple Challenge 2010 dataset.« less
Kinetic turbulence simulations at extreme scale on leadership-class systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, Bei; Ethier, Stephane; Tang, William
2013-01-01
Reliable predictive simulation capability addressing confinement properties in magnetically confined fusion plasmas is critically-important for ITER, a 20 billion dollar international burning plasma device under construction in France. The complex study of kinetic turbulence, which can severely limit the energy confinement and impact the economic viability of fusion systems, requires simulations at extreme scale for such an unprecedented device size. Our newly optimized, global, ab initio particle-in-cell code solving the nonlinear equations underlying gyrokinetic theory achieves excellent performance with respect to "time to solution" at the full capacity of the IBM Blue Gene/Q on 786,432 cores of Mira at ALCFmore » and recently of the 1,572,864 cores of Sequoia at LLNL. Recent multithreading and domain decomposition optimizations in the new GTC-P code represent critically important software advances for modern, low memory per core systems by enabling routine simulations at unprecedented size (130 million grid points ITER-scale) and resolution (65 billion particles).« less
BioProspecting: novel marker discovery obtained by mining the bibleome.
Elkin, Peter L; Tuttle, Mark S; Trusko, Brett E; Brown, Steven H
2009-02-05
BioProspecting is a novel approach that enabled our team to mine data related to genetic markers from the New England Journal of Medicine (NEJM) utilizing SNOMED CT and the Human Gene Onotology (HUGO). The Biomedical Informatics Research Collaborative was able to link genes and disorders using the Multi-threaded Clinical Vocabulary Server (MCVS) and natural language processing engine, whose output creates an ontology-network using the semantic encodings of the literature that is organized by these two terminologies. We identified relationships between (genes or proteins) and (diseases or drugs) as linked by metabolic functions and identified potentially novel functional relationships between, for example, genes and diseases (e.g. Article #1 ([Gene - IL27] = > {Enzyme - Dipeptidyl Carboxypeptidase 1}) and Article #2 ({Enzyme - Dipeptidyl Carboxypeptidase 1} < = [Disorder - Type II DM]) showing a metabolic link between IL27 and Type II DM). In this manuscript we describe our method for developing the database and its content as well as its potential to assist in the discovery of novel markers and drugs.
HNBody: A Simulation Package for Hierarchical N-Body Systems
NASA Astrophysics Data System (ADS)
Rauch, Kevin P.
2018-04-01
HNBody (http://www.hnbody.org/) is an extensible software package forintegrating the dynamics of N-body systems. Although general purpose, itincorporates several features and algorithms particularly well-suited tosystems containing a hierarchy (wide dynamic range) of masses. HNBodyversion 1 focused heavily on symplectic integration of nearly-Kepleriansystems. Here I describe the capabilities of the redesigned and expandedpackage version 2, which includes: symplectic integrators up to eighth order(both leap frog and Wisdom-Holman type methods), with symplectic corrector andclose encounter support; variable-order, variable-timestep Bulirsch-Stoer andStörmer integrators; post-Newtonian and multipole physics options; advancedround-off control for improved long-term stability; multi-threading and SIMDvectorization enhancements; seamless availability of extended precisionarithmetic for all calculations; extremely flexible configuration andoutput. Tests of the physical correctness of the algorithms are presentedusing JPL Horizons ephemerides (https://ssd.jpl.nasa.gov/?horizons) andpreviously published results for reference. The features and performanceof HNBody are also compared to several other freely available N-body codes,including MERCURY (Chambers), SWIFT (Levison & Duncan) and WHFAST (Rein &Tamayo).
Do, Hongdo; Molania, Ramyar
2017-01-01
The identification of genomic rearrangements with high sensitivity and specificity using massively parallel sequencing remains a major challenge, particularly in precision medicine and cancer research. Here, we describe a new method for detecting rearrangements, GRIDSS (Genome Rearrangement IDentification Software Suite). GRIDSS is a multithreaded structural variant (SV) caller that performs efficient genome-wide break-end assembly prior to variant calling using a novel positional de Bruijn graph-based assembler. By combining assembly, split read, and read pair evidence using a probabilistic scoring, GRIDSS achieves high sensitivity and specificity on simulated, cell line, and patient tumor data, recently winning SV subchallenge #5 of the ICGC-TCGA DREAM8.5 Somatic Mutation Calling Challenge. On human cell line data, GRIDSS halves the false discovery rate compared to other recent methods while matching or exceeding their sensitivity. GRIDSS identifies nontemplate sequence insertions, microhomologies, and large imperfect homologies, estimates a quality score for each breakpoint, stratifies calls into high or low confidence, and supports multisample analysis. PMID:29097403
JAMSS: proteomics mass spectrometry simulation in Java.
Smith, Rob; Prince, John T
2015-03-01
Countless proteomics data processing algorithms have been proposed, yet few have been critically evaluated due to lack of labeled data (data with known identities and quantities). Although labeling techniques exist, they are limited in terms of confidence and accuracy. In silico simulators have recently been used to create complex data with known identities and quantities. We propose Java Mass Spectrometry Simulator (JAMSS): a fast, self-contained in silico simulator capable of generating simulated MS and LC-MS runs while providing meta information on the provenance of each generated signal. JAMSS improves upon previous in silico simulators in terms of its ease to install, minimal parameters, graphical user interface, multithreading capability, retention time shift model and reproducibility. The simulator creates mzML 1.1.0. It is open source software licensed under the GPLv3. The software and source are available at https://github.com/optimusmoose/JAMSS. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
rTANDEM, an R/Bioconductor package for MS/MS protein identification.
Fournier, Frédéric; Joly Beauparlant, Charles; Paradis, René; Droit, Arnaud
2014-08-01
rTANDEM is an R/Bioconductor package that interfaces the X!Tandem protein identification algorithm. The package can run the multi-threaded algorithm on proteomic data files directly from R. It also provides functions to convert search parameters and results to/from R as well as functions to manipulate parameters and automate searches. An associated R package, shinyTANDEM, provides a web-based graphical interface to visualize and interpret the results. Together, those two packages form an entry point for a general MS/MS-based proteomic pipeline in R/Bioconductor. rTANDEM and shinyTANDEM are distributed in R/Bioconductor, http://bioconductor.org/packages/release/bioc/. The packages are under open licenses (GPL-3 and Artistice-1.0). frederic.fournier@crchuq.ulaval.ca or arnaud.droit@crchuq.ulaval.ca Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Development and realization of the open fault diagnosis system based on XPE
NASA Astrophysics Data System (ADS)
Deng, Hui; Wang, TaiYong; He, HuiLong; Xu, YongGang; Zeng, JuXiang
2005-12-01
To make the complex mechanical equipment work in good service, the technology for realizing an embedded open system is introduced systematically, including open hardware configuration, customized embedded operation system and open software structure. The ETX technology is adopted in this system, integrating the CPU main-board functions, and achieving the quick, real-time signal acquisition and intelligent data analysis with applying DSP and CPLD data acquisition card. Under the open configuration, the signal bus mode such as PCI, ISA and PC/104 can be selected and the styles of the signals can be chosen too. In addition, through customizing XPE system, adopting the EWF (Enhanced Write Filter), and realizing the open system authentically, the stability of the system is enhanced. Multi-thread and multi-task programming techniques are adopted in the software programming process. Interconnecting with the remote fault diagnosis center via the net interface, cooperative diagnosis is conducted and the intelligent degree of the fault diagnosis is improved.
Precise and Efficient Static Array Bound Checking for Large Embedded C Programs
NASA Technical Reports Server (NTRS)
Venet, Arnaud
2004-01-01
In this paper we describe the design and implementation of a static array-bound checker for a family of embedded programs: the flight control software of recent Mars missions. These codes are large (up to 250 KLOC), pointer intensive, heavily multithreaded and written in an object-oriented style, which makes their analysis very challenging. We designed a tool called C Global Surveyor (CGS) that can analyze the largest code in a couple of hours with a precision of 80%. The scalability and precision of the analyzer are achieved by using an incremental framework in which a pointer analysis and a numerical analysis of array indices mutually refine each other. CGS has been designed so that it can distribute the analysis over several processors in a cluster of machines. To the best of our knowledge this is the first distributed implementation of static analysis algorithms. Throughout the paper we will discuss the scalability setbacks that we encountered during the construction of the tool and their impact on the initial design decisions.
Multiple emulsions: an overview.
Khan, Azhar Yaqoob; Talegaonkar, Sushama; Iqbal, Zeenat; Ahmed, Farhan Jalees; Khar, Roop Krishan
2006-10-01
Multiple emulsions are complex polydispersed systems where both oil in water and water in oil emulsion exists simultaneously which are stabilized by lipophillic and hydrophilic surfactants respectively. The ratio of these surfactants is important in achieving stable multiple emulsions. Among water-in-oil-in-water (w/o/w) and oil-in-water-in-oil (o/w/o) type multiple emulsions, the former has wider areas of application and hence are studied in great detail. Formulation, preparation techniques and in vitro characterization methods for multiple emulsions are reviewed. Various factors affecting the stability of multiple emulsions and the stabilization approaches with specific reference to w/o/w type multiple emulsions are discussed in detail. Favorable drug release mechanisms and/or rate along with in vivo fate of multiple emulsions make them a versatile carrier. It finds wide range of applications in controlled or sustained drug delivery, targeted delivery, taste masking, bioavailability enhancement, enzyme immobilization, etc. Multiple emulsions have also been employed as intermediate step in the microencapsulation process and are the systems of increasing interest for the oral delivery of hydrophilic drugs, which are unstable in gastrointestinal tract like proteins and peptides. With the advancement in techniques for preparation, stabilization and rheological characterization of multiple emulsions, it will be able to provide a novel carrier system for drugs, cosmetics and pharmaceutical agents. In this review, emphasis is laid down on formulation, stabilization techniques and potential applications of multiple emulsion system.
Optical Multiple Access Network (OMAN) for advanced processing satellite applications
NASA Technical Reports Server (NTRS)
Mendez, Antonio J.; Gagliardi, Robert M.; Park, Eugene; Ivancic, William D.; Sherman, Bradley D.
1991-01-01
An OMAN breadboard for exploring advanced processing satellite circuit switch applications is introduced. Network architecture, hardware trade offs, and multiple user interference issues are presented. The breadboard test set up and experimental results are discussed.
Sugimoto, Masanori; Toda, Yoshihisa; Hori, Miyuki; Mitani, Akiko; Ichihara, Takahiro; Sekine, Shingo; Kaku, Shinsuke; Otsuka, Noboru; Matsumoto, Hideo
2016-06-01
Preclinical Research The aim of this study was to evaluate the efficacy of multiple applications of S(+)-flurbiprofen plaster (SFPP), a novel Nonsteroidal anti-inflammatory drug (NSAID) patch, for the alleviation of inflammatory pain and edema in rat adjuvant-induced arthritis (AIA) model as compared to other NSAID patches. The AIA model was induced by the injection of Mycobacterium butyricum and rats were treated with a patch (1.0 cm × 0.88 cm) containing each NSAID (SFP, ketoprofen, loxoprofen, diclofenac, felbinac, flurbiprofen, or indomethacin) applied to the paw for 6 h per day for 5 days. The pain threshold was evaluated using a flexion test of the ankle joint, and the inflamed paw edema was evaluated using a plethysmometer. cyclooxygenase (COX)-1 and COX-2 inhibition was evaluated using human recombinant proteins. Multiple applications of SFPP exerted a significant analgesic effect from the first day of application as compared to the other NSAID patches. In terms of paw edema, SFPP decreased edema from the second day after application, Multiple applications of SFPP were superior to those of other NSAID patches, in terms of the analgesic effect with multiple applications. These results suggest that SFPP may be a beneficial patch for providing analgesic and anti-inflammatory effects clinically. Drug Dev Res 77 : 206-211, 2016. © 2016 The Authors Drug Development Research Published by Wiley Periodicals, Inc. © 2016 The Authors Drug Development Research Published by Wiley Periodicals, Inc.
Toda, Yoshihisa; Hori, Miyuki; Mitani, Akiko; Ichihara, Takahiro; Sekine, Shingo; Kaku, Shinsuke; Otsuka, Noboru; Matsumoto, Hideo
2016-01-01
Abstract Preclinical Research The aim of this study was to evaluate the efficacy of multiple applications of S(+)‐flurbiprofen plaster (SFPP), a novel Nonsteroidal anti‐inflammatory drug (NSAID) patch, for the alleviation of inflammatory pain and edema in rat adjuvant‐induced arthritis (AIA) model as compared to other NSAID patches. The AIA model was induced by the injection of Mycobacterium butyricum and rats were treated with a patch (1.0 cm × 0.88 cm) containing each NSAID (SFP, ketoprofen, loxoprofen, diclofenac, felbinac, flurbiprofen, or indomethacin) applied to the paw for 6 h per day for 5 days. The pain threshold was evaluated using a flexion test of the ankle joint, and the inflamed paw edema was evaluated using a plethysmometer. cyclooxygenase (COX)−1 and COX‐2 inhibition was evaluated using human recombinant proteins. Multiple applications of SFPP exerted a significant analgesic effect from the first day of application as compared to the other NSAID patches. In terms of paw edema, SFPP decreased edema from the second day after application, Multiple applications of SFPP were superior to those of other NSAID patches, in terms of the analgesic effect with multiple applications. These results suggest that SFPP may be a beneficial patch for providing analgesic and anti‐inflammatory effects clinically. Drug Dev Res 77 : 206–211, 2016. © 2016 The Authors Drug Development Research Published by Wiley Periodicals, Inc. PMID:27241582
Educational and Scientific Applications of Climate Model Diagnostic Analyzer
NASA Astrophysics Data System (ADS)
Lee, S.; Pan, L.; Zhai, C.; Tang, B.; Kubar, T. L.; Zhang, J.; Bao, Q.
2016-12-01
Climate Model Diagnostic Analyzer (CMDA) is a web-based information system designed for the climate modeling and model analysis community to analyze climate data from models and observations. CMDA provides tools to diagnostically analyze climate data for model validation and improvement, and to systematically manage analysis provenance for sharing results with other investigators. CMDA utilizes cloud computing resources, multi-threading computing, machine-learning algorithms, web service technologies, and provenance-supporting technologies to address technical challenges that the Earth science modeling and model analysis community faces in evaluating and diagnosing climate models. As CMDA infrastructure and technology have matured, we have developed the educational and scientific applications of CMDA. Educationally, CMDA supported the summer school of the JPL Center for Climate Sciences for three years since 2014. In the summer school, the students work on group research projects where CMDA provide datasets and analysis tools. Each student is assigned to a virtual machine with CMDA installed in Amazon Web Services. A provenance management system for CMDA is developed to keep track of students' usages of CMDA, and to recommend datasets and analysis tools for their research topic. The provenance system also allows students to revisit their analysis results and share them with their group. Scientifically, we have developed several science use cases of CMDA covering various topics, datasets, and analysis types. Each use case developed is described and listed in terms of a scientific goal, datasets used, the analysis tools used, scientific results discovered from the use case, an analysis result such as output plots and data files, and a link to the exact analysis service call with all the input arguments filled. For example, one science use case is the evaluation of NCAR CAM5 model with MODIS total cloud fraction. The analysis service used is Difference Plot Service of Two Variables, and the datasets used are NCAR CAM total cloud fraction and MODIS total cloud fraction. The scientific highlight of the use case is that the CAM5 model overall does a fairly decent job at simulating total cloud cover, though simulates too few clouds especially near and offshore of the eastern ocean basins where low clouds are dominant.
2014-01-01
The emergence of massive datasets in a clinical setting presents both challenges and opportunities in data storage and analysis. This so called “big data” challenges traditional analytic tools and will increasingly require novel solutions adapted from other fields. Advances in information and communication technology present the most viable solutions to big data analysis in terms of efficiency and scalability. It is vital those big data solutions are multithreaded and that data access approaches be precisely tailored to large volumes of semi-structured/unstructured data. The MapReduce programming framework uses two tasks common in functional programming: Map and Reduce. MapReduce is a new parallel processing framework and Hadoop is its open-source implementation on a single computing node or on clusters. Compared with existing parallel processing paradigms (e.g. grid computing and graphical processing unit (GPU)), MapReduce and Hadoop have two advantages: 1) fault-tolerant storage resulting in reliable data processing by replicating the computing tasks, and cloning the data chunks on different computing nodes across the computing cluster; 2) high-throughput data processing via a batch processing framework and the Hadoop distributed file system (HDFS). Data are stored in the HDFS and made available to the slave nodes for computation. In this paper, we review the existing applications of the MapReduce programming framework and its implementation platform Hadoop in clinical big data and related medical health informatics fields. The usage of MapReduce and Hadoop on a distributed system represents a significant advance in clinical big data processing and utilization, and opens up new opportunities in the emerging era of big data analytics. The objective of this paper is to summarize the state-of-the-art efforts in clinical big data analytics and highlight what might be needed to enhance the outcomes of clinical big data analytics tools. This paper is concluded by summarizing the potential usage of the MapReduce programming framework and Hadoop platform to process huge volumes of clinical data in medical health informatics related fields. PMID:25383096
Mohammed, Emad A; Far, Behrouz H; Naugler, Christopher
2014-01-01
The emergence of massive datasets in a clinical setting presents both challenges and opportunities in data storage and analysis. This so called "big data" challenges traditional analytic tools and will increasingly require novel solutions adapted from other fields. Advances in information and communication technology present the most viable solutions to big data analysis in terms of efficiency and scalability. It is vital those big data solutions are multithreaded and that data access approaches be precisely tailored to large volumes of semi-structured/unstructured data. THE MAPREDUCE PROGRAMMING FRAMEWORK USES TWO TASKS COMMON IN FUNCTIONAL PROGRAMMING: Map and Reduce. MapReduce is a new parallel processing framework and Hadoop is its open-source implementation on a single computing node or on clusters. Compared with existing parallel processing paradigms (e.g. grid computing and graphical processing unit (GPU)), MapReduce and Hadoop have two advantages: 1) fault-tolerant storage resulting in reliable data processing by replicating the computing tasks, and cloning the data chunks on different computing nodes across the computing cluster; 2) high-throughput data processing via a batch processing framework and the Hadoop distributed file system (HDFS). Data are stored in the HDFS and made available to the slave nodes for computation. In this paper, we review the existing applications of the MapReduce programming framework and its implementation platform Hadoop in clinical big data and related medical health informatics fields. The usage of MapReduce and Hadoop on a distributed system represents a significant advance in clinical big data processing and utilization, and opens up new opportunities in the emerging era of big data analytics. The objective of this paper is to summarize the state-of-the-art efforts in clinical big data analytics and highlight what might be needed to enhance the outcomes of clinical big data analytics tools. This paper is concluded by summarizing the potential usage of the MapReduce programming framework and Hadoop platform to process huge volumes of clinical data in medical health informatics related fields.
2000-06-23
conductivity ( NDC ) effects in double barrier resonant tunneling structures (DBRTS) prove the extremely fast frequency response of charge transport (less...UNCLASSIFIED Defense Technical Information Center Compilation Part Notice ADP013131 TITLE: Multiple-Barrier Resonant Tunneling Structures for...Institute Multiple-barrier resonant tunneling structures for application in a microwave generator stabilized by microstrip resonator S. V. Evstigneev, A. L
Aboulbanine, Zakaria; El Khayati, Naïma
2018-04-13
The use of phase space in medical linear accelerator Monte Carlo (MC) simulations significantly improves the execution time and leads to results comparable to those obtained from full calculations. The classical representation of phase space stores directly the information of millions of particles, producing bulky files. This paper presents a virtual source model (VSM) based on a reconstruction algorithm, taking as input a compressed file of roughly 800 kb derived from phase space data freely available in the International Atomic Energy Agency (IAEA) database. This VSM includes two main components; primary and scattered particle sources, with a specific reconstruction method developed for each. Energy spectra and other relevant variables were extracted from IAEA phase space and stored in the input description data file for both sources. The VSM was validated for three photon beams: Elekta Precise 6 MV/10 MV and a Varian TrueBeam 6 MV. Extensive calculations in water and comparisons between dose distributions of the VSM and IAEA phase space were performed to estimate the VSM precision. The Geant4 MC toolkit in multi-threaded mode (Geant4-[mt]) was used for fast dose calculations and optimized memory use. Four field configurations were chosen for dose calculation validation to test field size and symmetry effects, [Formula: see text] [Formula: see text], [Formula: see text] [Formula: see text], and [Formula: see text] [Formula: see text] for squared fields, and [Formula: see text] [Formula: see text] for an asymmetric rectangular field. Good agreement in terms of [Formula: see text] formalism, for 3%/3 mm and 2%/3 mm criteria, for each evaluated radiation field and photon beam was obtained within a computation time of 60 h on a single WorkStation for a 3 mm voxel matrix. Analyzing the VSM's precision in high dose gradient regions, using the distance to agreement concept (DTA), showed also satisfactory results. In all investigated cases, the mean DTA was less than 1 mm in build-up and penumbra regions. In regards to calculation efficiency, the event processing speed is six times faster using Geant4-[mt] compared to sequential Geant4, when running the same simulation code for both. The developed VSM for 6 MV/10 MV beams widely used, is a general concept easy to adapt in order to reconstruct comparable beam qualities for various linac configurations, facilitating its integration for MC treatment planning purposes.
USDA-ARS?s Scientific Manuscript database
Spray programs comprising multiple or single foliar applications of the fungal pathogen Beauveria bassiana strain GHA (Bb) made during morning (AM) vs. evening (PM) hours were tested against Colorado potato beetle Leptinotarsa decemlineata (CPB) in small research plots of potatoes over multiple fiel...
Writing and applications of fiber Bragg grating arrays
NASA Astrophysics Data System (ADS)
LaRochelle, Sophie; Cortes, Pierre-Yves; Fathallah, H.; Rusch, Leslie A.; Jaafar, H. B.
2000-12-01
Multiple Bragg gratings are written in a single fibre strand with accurate positioning to achieve predetermined time delays between optical channels. Applications of fibre Bragg grating arrays include encoders/decoders with series of identical gratings for optical code-division multiple access.
NASA Technical Reports Server (NTRS)
Chi, J. Y.; Gatos, H. C.; Mao, B. Y.
1980-01-01
Multiple p-n junctions have been prepared in as-grown Czochralski p-type silicon through overcompensation near the oxygen periodic concentration maxima by oxygen thermal donors generated during heat treatment at 450 C. Application of the multiple p-n-junction configuration to photovoltaic energy conversion has been investigated. A new solar-cell structure based on multiple p-n-junctions was developed. Theoretical analysis showed that a significant increase in collection efficiency over the conventional solar cells can be achieved.
Session management for web-based healthcare applications.
Wei, L.; Sengupta, S.
1999-01-01
In health care systems, users may access multiple applications during one session of interaction with the system. However, users must sign on to each application individually, and it is difficult to maintain a common context among these applications. We are developing a session management system for web-based applications using LDAP directory service, which will allow single sign-on to multiple web-based applications, and maintain a common context among those applications for the user. This paper discusses the motivations for building this system, the system architecture, and the challenges of our approach, such as the session objects management for the user, and session security. PMID:10566511
On Certain Wronskians of Multiple Orthogonal Polynomials
NASA Astrophysics Data System (ADS)
Zhang, Lun; Filipuk, Galina
2014-11-01
We consider determinants of Wronskian type whose entries are multiple orthogonal polynomials associated with a path connecting two multi-indices. By assuming that the weight functions form an algebraic Chebyshev (AT) system, we show that the polynomials represented by the Wronskians keep a constant sign in some cases, while in some other cases oscillatory behavior appears, which generalizes classical results for orthogonal polynomials due to Karlin and Szegő. There are two applications of our results. The first application arises from the observation that the m-th moment of the average characteristic polynomials for multiple orthogonal polynomial ensembles can be expressed as a Wronskian of the type II multiple orthogonal polynomials. Hence, it is straightforward to obtain the distinct behavior of the moments for odd and even m in a special multiple orthogonal ensemble - the AT ensemble. As the second application, we derive some Turán type inequalities for m! ultiple Hermite and multiple Laguerre polynomials (of two kinds). Finally, we study numerically the geometric configuration of zeros for the Wronskians of these multiple orthogonal polynomials. We observe that the zeros have regular configurations in the complex plane, which might be of independent interest.
DePasse, J Mason; Daniels, Alan H; Durand, Wesley; Kingrey, Brandon; Prodromo, John; Mulcahey, Mary K
2018-01-01
Orthopedic surgeons have become increasingly subspecialized, and recent studies have shown that American Board of Orthopaedic Surgery (ABOS) Step II applicants are performing a higher percentage of their cases within their chosen subspecialties. However, these studies focused exclusively on surgeons who have completed a single fellowship; little data exist on those who pursue a second fellowship. All applicants to the ABOS Part II examination from 2004 to 2016 were classified by their self-reported fellowship training history using the ABOS Part II examination database. Trends in the number of applicants completing multiple fellowships and the types of fellowships combined were analyzed. In addition, cases performed by applicants who had performed multiple fellowships were analyzed to determine what percentage were within their chosen subspecialties. A total of 9776 applicants to ABOS Part II were included in the database from 2004 to 2016, including 444 (4.5%) applicants who completed more than one fellowship. There were 43 different combinations of fellowships; the most common additional fellowships were trauma (40.1%), sports medicine (38.7%), and joints (30.4%). The most common combinations were joints and sports medicine (10.6%) and foot and ankle and sports medicine (10.1%). A significant increase occurred in physicians training in both pediatric orthopedics and sports medicine (P=.02). The percentage of cases within the applicants' chosen specialties ranged from 91.4% in sports to 73.6% in tumor. Multiple fellowship applicants represent a small percentage of all applicants, and although subspecialization in orthopedics is increasing, no increasing trend toward multiple fellowships within this dataset was observed. However, the significant increase in applicants who combined pediatric orthopedic and sports medicine fellowships suggests an increasing interest in treating this increasing patient population in addition to social and economic factors. [Orthopedics. 2018; 41(1):e33-e37.]. Copyright 2017, SLACK Incorporated.
Deploy Nalu/Kokkos algorithmic infrastructure with performance benchmarking.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Domino, Stefan P.; Ananthan, Shreyas; Knaus, Robert C.
The former Nalu interior heterogeneous algorithm design, which was originally designed to manage matrix assembly operations over all elemental topology types, has been modified to operate over homogeneous collections of mesh entities. This newly templated kernel design allows for removal of workset variable resize operations that were formerly required at each loop over a Sierra ToolKit (STK) bucket (nominally, 512 entities in size). Extensive usage of the Standard Template Library (STL) std::vector has been removed in favor of intrinsic Kokkos memory views. In this milestone effort, the transition to Kokkos as the underlying infrastructure to support performance and portability onmore » many-core architectures has been deployed for key matrix algorithmic kernels. A unit-test driven design effort has developed a homogeneous entity algorithm that employs a team-based thread parallelism construct. The STK Single Instruction Multiple Data (SIMD) infrastructure is used to interleave data for improved vectorization. The collective algorithm design, which allows for concurrent threading and SIMD management, has been deployed for the core low-Mach element- based algorithm. Several tests to ascertain SIMD performance on Intel KNL and Haswell architectures have been carried out. The performance test matrix includes evaluation of both low- and higher-order methods. The higher-order low-Mach methodology builds on polynomial promotion of the core low-order control volume nite element method (CVFEM). Performance testing of the Kokkos-view/SIMD design indicates low-order matrix assembly kernel speed-up ranging between two and four times depending on mesh loading and node count. Better speedups are observed for higher-order meshes (currently only P=2 has been tested) especially on KNL. The increased workload per element on higher-order meshes bene ts from the wide SIMD width on KNL machines. Combining multiple threads with SIMD on KNL achieves a 4.6x speedup over the baseline, with assembly timings faster than that observed on Haswell architecture. The computational workload of higher-order meshes, therefore, seems ideally suited for the many-core architecture and justi es further exploration of higher-order on NGP platforms. A Trilinos/Tpetra-based multi-threaded GMRES preconditioned by symmetric Gauss Seidel (SGS) represents the core solver infrastructure for the low-Mach advection/diffusion implicit solves. The threaded solver stack has been tested on small problems on NREL's Peregrine system using the newly developed and deployed Kokkos-view/SIMD kernels. fforts are underway to deploy the Tpetra-based solver stack on NERSC Cori system to benchmark its performance at scale on KNL machines.« less
Twin-Mirrored-Galvanometer Laser-Light-Sheet Generator
NASA Technical Reports Server (NTRS)
Rhodes, David B.; Franke, John M.; Jones, Stephen B.; Leighty, Bradley D.
1991-01-01
Multiple, rotating laser-light sheets generated to illuminate flows in wind tunnels. Designed and developed to provide flexibility and adaptability to wide range of applications. Design includes capability to control size and location of laser-light sheet in real time, to generate horizontal or vertical sheets, to sweep sheet repeatedly through volume, to generate multiple sheets with controllable separation, and to rotate single or multiple laser-light sheets. Includes electronic equipment and laser mounted on adjustable-height platform. Twin-mirrored galvanometer unit supported by tripod to reduce vibration. Other possible applications include use in construction industry to align beams of building. Artistic or display applications also possible.
47 CFR 73.3555 - Multiple ownership.
Code of Federal Regulations, 2010 CFR
2010-10-01
... vertical ownership chain and application of the relevant attribution benchmark to the resulting product, except that wherever the ownership percentage for any link in the chain exceeds 50%, it shall not be... multiplication of the ownership percentages for each link in the vertical ownership chain and application of the...
47 CFR 73.3555 - Multiple ownership.
Code of Federal Regulations, 2011 CFR
2011-10-01
... vertical ownership chain and application of the relevant attribution benchmark to the resulting product, except that wherever the ownership percentage for any link in the chain exceeds 50%, it shall not be... multiplication of the ownership percentages for each link in the vertical ownership chain and application of the...
47 CFR 73.3555 - Multiple ownership.
Code of Federal Regulations, 2014 CFR
2014-10-01
... vertical ownership chain and application of the relevant attribution benchmark to the resulting product, except that wherever the ownership percentage for any link in the chain exceeds 50%, it shall not be... multiplication of the ownership percentages for each link in the vertical ownership chain and application of the...
47 CFR 73.3555 - Multiple ownership.
Code of Federal Regulations, 2012 CFR
2012-10-01
... vertical ownership chain and application of the relevant attribution benchmark to the resulting product, except that wherever the ownership percentage for any link in the chain exceeds 50%, it shall not be... multiplication of the ownership percentages for each link in the vertical ownership chain and application of the...
47 CFR 73.3555 - Multiple ownership.
Code of Federal Regulations, 2013 CFR
2013-10-01
... vertical ownership chain and application of the relevant attribution benchmark to the resulting product, except that wherever the ownership percentage for any link in the chain exceeds 50%, it shall not be... multiplication of the ownership percentages for each link in the vertical ownership chain and application of the...
Scientific Visualization Made Easy for the Scientist
NASA Astrophysics Data System (ADS)
Westerhoff, M.; Henderson, B.
2002-12-01
amirar is an application program used in creating 3D visualizations and geometric models of 3D image data sets from various application areas, e.g. medicine, biology, biochemistry, chemistry, physics, and engineering. It has demonstrated significant adoption in the market place since becoming commercially available in 2000. The rapid adoption has expanded the features being requested by the user base and broadened the scope of the amira product offering. The amira product offering includes amira Standard, amiraDevT, used to extend the product capabilities by users, amiraMolT, used for molecular visualization, amiraDeconvT, used to improve quality of image data, and amiraVRT, used in immersive VR environments. amira allows the user to construct a visualization tailored to his or her needs without requiring any programming knowledge. It also allows 3D objects to be represented as grids suitable for numerical simulations, notably as triangular surfaces and volumetric tetrahedral grids. The amira application also provides methods to generate such grids from voxel data representing an image volume, and it includes a general-purpose interactive 3D viewer. amiraDev provides an application-programming interface (API) that allows the user to add new components by C++ programming. amira supports many import formats including a 'raw' format allowing immediate access to your native uniform data sets. amira uses the power and speed of the OpenGLr and Open InventorT graphics libraries and 3D graphics accelerators to allow you to access over 145 modules, enabling you to process, probe, analyze and visualize your data. The amiraMolT extension adds powerful tools for molecular visualization to the existing amira platform. amiraMolT contains support for standard molecular file formats, tools for visualization and analysis of static molecules as well as molecular trajectories (time series). amiraDeconv adds tools for the deconvolution of 3D microscopic images. Deconvolution is the process of increasing image quality and resolution by computationally compensating artifacts of the recording process. amiraDeconv supports 3D wide field microscopy as well as 3D confocal microscopy. It offers both non-blind and blind image deconvolution algorithms. Non-blind deconvolution uses an individual measured point spread function, while non-blind algorithms work on the basis of only a few recording parameters (like numerical aperture or zoom factor). amiraVR is a specialized and extended version of the amira visualization system which is dedicated for use in immersive installations, such as large-screen stereoscopic projections, CAVEr or Holobenchr systems. Among others, it supports multi-threaded multi-pipe rendering, head-tracking, advanced 3D interaction concepts, and 3D menus allowing interaction with any amira object in the same way as on the desktop. With its unique set of features, amiraVR represents both a VR (Virtual Reality) ready application for scientific and medical visualization in immersive environments, and a development platform that allows building VR applications.
Application-Driven No-Reference Quality Assessment for Dermoscopy Images With Multiple Distortions.
Xie, Fengying; Lu, Yanan; Bovik, Alan C; Jiang, Zhiguo; Meng, Rusong
2016-06-01
Dermoscopy images often suffer from blur and uneven illumination distortions that occur during acquisition, which can adversely influence consequent automatic image analysis results on potential lesion objects. The purpose of this paper is to deploy an algorithm that can automatically assess the quality of dermoscopy images. Such an algorithm could be used to direct image recapture or correction. We describe an application-driven no-reference image quality assessment (IQA) model for dermoscopy images affected by possibly multiple distortions. For this purpose, we created a multiple distortion dataset of dermoscopy images impaired by varying degrees of blur and uneven illumination. The basis of this model is two single distortion IQA metrics that are sensitive to blur and uneven illumination, respectively. The outputs of these two metrics are combined to predict the quality of multiply distorted dermoscopy images using a fuzzy neural network. Unlike traditional IQA algorithms, which use human subjective score as ground truth, here ground truth is driven by the application, and generated according to the degree of influence of the distortions on lesion analysis. The experimental results reveal that the proposed model delivers accurate and stable quality prediction results for dermoscopy images impaired by multiple distortions. The proposed model is effective for quality assessment of multiple distorted dermoscopy images. An application-driven concept for IQA is introduced, and at the same time, a solution framework for the IQA of multiple distortions is proposed.
An omnibus test for family-based association studies with multiple SNPs and multiple phenotypes.
Lasky-Su, Jessica; Murphy, Amy; McQueen, Matthew B; Weiss, Scott; Lange, Christoph
2010-06-01
We propose an omnibus family-based association test (MFBAT) that can be applied to multiple markers and multiple phenotypes and that has only one degree of freedom. The proposed test statistic extends current FBAT methodology to incorporate multiple markers as well as multiple phenotypes. Using simulation studies, power estimates for the proposed methodology are compared with the standard methodologies. On the basis of these simulations, we find that MFBAT substantially outperforms other methods, including haplotypic approaches and doing multiple tests with single single-nucleotide polymorphisms (SNPs) and single phenotypes. The practical relevance of the approach is illustrated by an application to asthma in which SNP/phenotype combinations are identified and reach overall significance that would not have been identified using other approaches. This methodology is directly applicable to cases in which there are multiple SNPs, such as candidate gene studies, cases in which there are multiple phenotypes, such as expression data, and cases in which there are multiple phenotypes and genotypes, such as genome-wide association studies that incorporate expression profiles as phenotypes. This program is available in the PBAT analysis package.
Salient features of MACA and CMACA systems and their applications
NASA Astrophysics Data System (ADS)
Ratnam, C.; Goud, S. L.; Rao, V. Lakshmana
2007-09-01
The Fourier Analytical Investigation results of the Performance of the Multiple Annuli Coded Aperture (MACA) and Complementary Multiple Annuli Coded Aperture Systems (CMACA) are summarised and the probable application of these systems in Astronomy, High energy radiation Imaging, optical filters, and in the field of metallurgy, are suggested.
Mixed Membership Distributions with Applications to Modeling Multiple Strategy Usage
ERIC Educational Resources Information Center
Galyardt, April
2012-01-01
This dissertation examines two related questions. "How do mixed membership models work?" and "Can mixed membership be used to model how students use multiple strategies to solve problems?". Mixed membership models have been used in thousands of applications from text and image processing to genetic microarray analysis. Yet…
Systems and methods to control multiple peripherals with a single-peripheral application code
Ransom, Ray M.
2013-06-11
Methods and apparatus are provided for enhancing the BIOS of a hardware peripheral device to manage multiple peripheral devices simultaneously without modifying the application software of the peripheral device. The apparatus comprises a logic control unit and a memory in communication with the logic control unit. The memory is partitioned into a plurality of ranges, each range comprising one or more blocks of memory, one range being associated with each instance of the peripheral application and one range being reserved for storage of a data pointer related to each peripheral application of the plurality. The logic control unit is configured to operate multiple instances of the control application by duplicating one instance of the peripheral application for each peripheral device of the plurality and partitioning a memory device into partitions comprising one or more blocks of memory, one partition being associated with each instance of the peripheral application. The method then reserves a range of memory addresses for storage of a data pointer related to each peripheral device of the plurality, and initializes each of the plurality of peripheral devices.
Electronic Submissions of Pesticide Applications
Applications for pesticide registration can be submitted electronically, including forms, studies, and draft product labeling. Applicants need not submit multiple electronic copies of any pieces of their applications.
Products of multiple Fourier series with application to the multiblade transformation
NASA Technical Reports Server (NTRS)
Kunz, D. L.
1981-01-01
A relatively simple and systematic method for forming the products of multiple Fourier series using tensor like operations is demonstrated. This symbolic multiplication can be performed for any arbitrary number of series, and the coefficients of a set of linear differential equations with periodic coefficients from a rotating coordinate system to a nonrotating system is also demonstrated. It is shown that using Fourier operations to perform this transformation make it easily understood, simple to apply, and generally applicable.
Quadratures with multiple nodes, power orthogonality, and moment-preserving spline approximation
NASA Astrophysics Data System (ADS)
Milovanovic, Gradimir V.
2001-01-01
Quadrature formulas with multiple nodes, power orthogonality, and some applications of such quadratures to moment-preserving approximation by defective splines are considered. An account on power orthogonality (s- and [sigma]-orthogonal polynomials) and generalized Gaussian quadratures with multiple nodes, including stable algorithms for numerical construction of the corresponding polynomials and Cotes numbers, are given. In particular, the important case of Chebyshev weight is analyzed. Finally, some applications in moment-preserving approximation of functions by defective splines are discussed.
Applications of multiple-constraint matrix updates to the optimal control of large structures
NASA Technical Reports Server (NTRS)
Smith, S. W.; Walcott, B. L.
1992-01-01
Low-authority control or vibration suppression in large, flexible space structures can be formulated as a linear feedback control problem requiring computation of displacement and velocity feedback gain matrices. To ensure stability in the uncontrolled modes, these gain matrices must be symmetric and positive definite. In this paper, efficient computation of symmetric, positive-definite feedback gain matrices is accomplished through the use of multiple-constraint matrix update techniques originally developed for structural identification applications. Two systems were used to illustrate the application: a simple spring-mass system and a planar truss. From these demonstrations, use of this multiple-constraint technique is seen to provide a straightforward approach for computing the low-authority gains.
Kato, Hironari; Kawamoto, Hirofumi; Noma, Yasuhiro; Sonoyama, Takayuki; Tsutsumi, Koichiro; Fujii, Masakuni; Okada, Hiroyuki; Yamamoto, Kazuhide
2013-01-01
The endoscopic management of malignant hilar biliary strictures using multiple metallic stents (MS) is technically demanding, in the initial deployment of MS and the recovery from MS occlusion with deployment of multiple plastic stents (PS). We evaluated the outcomes of the application of a Soehendra stent retriever (SSR) as a dilator of intractable strictures. Fifty-nine patients with malignant hilar biliary strictures had multiple MS inserted using a partial stent-in-stent procedure. When we encountered intractable strictures, we adopted SSR to dilate the stricture and the interstice of the MS. We evaluated the success rate of MS or PS deployment after SSR application and procedural complications. Five of 59 patients (8%) were subjected to SSR application for the initial MS deployment. MS were successfully deployed in all of these patients (100%). MS occlusion was noted in 27 patients. We applied SSR to seven patients (26%) for the deployment of multiple PS after MS occlusion. In five patients (71%), successful PS deployment was achieved after the SSR application. No complications related to dilatation using SSR occurred in any patient. SSR proved to be a potent dilator of difficult strictures in the management of malignant hilar biliary strictures.
40 CFR 437.46 - Pretreatment standards for existing sources (PSES)
Code of Federal Regulations, 2010 CFR
2010-07-01
... from subparts A, B, or C of this part may be subject to Multiple Wastestream Subcategory pretreatment standards representing the application of PSES set forth in paragraphs (b), (c), (d), or (e) of this section... applicable Multiple Wastestream Subcategory standards set forth in paragraphs (b), (c), (d) or (e) of this...
SWPS3 - fast multi-threaded vectorized Smith-Waterman for IBM Cell/B.E. and x86/SSE2.
Szalkowski, Adam; Ledergerber, Christian; Krähenbühl, Philipp; Dessimoz, Christophe
2008-10-29
We present swps3, a vectorized implementation of the Smith-Waterman local alignment algorithm optimized for both the Cell/BE and x86 architectures. The paper describes swps3 and compares its performances with several other implementations. Our benchmarking results show that swps3 is currently the fastest implementation of a vectorized Smith-Waterman on the Cell/BE, outperforming the only other known implementation by a factor of at least 4: on a Playstation 3, it achieves up to 8.0 billion cell-updates per second (GCUPS). Using the SSE2 instruction set, a quad-core Intel Pentium can reach 15.7 GCUPS. We also show that swps3 on this CPU is faster than a recent GPU implementation. Finally, we note that under some circumstances, alignments are computed at roughly the same speed as BLAST, a heuristic method. The Cell/BE can be a powerful platform to align biological sequences. Besides, the performance gap between exact and heuristic methods has almost disappeared, especially for long protein sequences.
SWPS3 – fast multi-threaded vectorized Smith-Waterman for IBM Cell/B.E. and ×86/SSE2
Szalkowski, Adam; Ledergerber, Christian; Krähenbühl, Philipp; Dessimoz, Christophe
2008-01-01
Background We present swps3, a vectorized implementation of the Smith-Waterman local alignment algorithm optimized for both the Cell/BE and ×86 architectures. The paper describes swps3 and compares its performances with several other implementations. Findings Our benchmarking results show that swps3 is currently the fastest implementation of a vectorized Smith-Waterman on the Cell/BE, outperforming the only other known implementation by a factor of at least 4: on a Playstation 3, it achieves up to 8.0 billion cell-updates per second (GCUPS). Using the SSE2 instruction set, a quad-core Intel Pentium can reach 15.7 GCUPS. We also show that swps3 on this CPU is faster than a recent GPU implementation. Finally, we note that under some circumstances, alignments are computed at roughly the same speed as BLAST, a heuristic method. Conclusion The Cell/BE can be a powerful platform to align biological sequences. Besides, the performance gap between exact and heuristic methods has almost disappeared, especially for long protein sequences. PMID:18959793
NASA Astrophysics Data System (ADS)
Efthimiou, N.; Papadimitroulas, P.; Kostou, T.; Loudos, G.
2015-09-01
Commercial clinical and preclinical PET scanners rely on the full cylindrical geometry for whole body scans as well as for dedicated organs. In this study we propose the construction of a low cost dual-head C-shaped PET system dedicated for small animal brain imaging. Monte Carlo simulation studies were performed using GATE toolkit to evaluate the optimum design in terms of sensitivity, distortions in the FOV and spatial resolution. The PET model is based on SiPMs and BGO pixelated arrays. Four different configurations with C- angle 0°, 15°, 30° and 45° within the modules, were considered. Geometrical phantoms were used for the evaluation process. STIR software, extended by an efficient multi-threaded ray tracing technique, was used for the image reconstruction. The algorithm automatically adjusts the size of the FOV according to the shape of the detector's geometry. The results showed improvement in sensitivity of ∼15% in case of 45° C-angle compared to the 0° case. The spatial resolution was found 2 mm for 45° C-angle.
F2Dock: Fast Fourier Protein-Protein Docking
Bajaj, Chandrajit; Chowdhury, Rezaul; Siddavanahalli, Vinay
2009-01-01
The functions of proteins is often realized through their mutual interactions. Determining a relative transformation for a pair of proteins and their conformations which form a stable complex, reproducible in nature, is known as docking. It is an important step in drug design, structure determination and understanding function and structure relationships. In this paper we extend our non-uniform fast Fourier transform docking algorithm to include an adaptive search phase (both translational and rotational) and thereby speed up its execution. We have also implemented a multithreaded version of the adaptive docking algorithm for even faster execution on multicore machines. We call this protein-protein docking code F2Dock (F2 = Fast Fourier). We have calibrated F2Dock based on an extensive experimental study on a list of benchmark complexes and conclude that F2Dock works very well in practice. Though all docking results reported in this paper use shape complementarity and Coulombic potential based scores only, F2Dock is structured to incorporate Lennard-Jones potential and re-ranking docking solutions based on desolvation energy. PMID:21071796
Real-time chirp-coded imaging with a programmable ultrasound biomicroscope.
Bosisio, Mattéo R; Hasquenoph, Jean-Michel; Sandrin, Laurent; Laugier, Pascal; Bridal, S Lori; Yon, Sylvain
2010-03-01
Ultrasound biomicroscopy (UBM) of mice can provide a testing ground for new imaging strategies. The UBM system presented in this paper facilitates the development of imaging and measurement methods with programmable design, arbitrary waveform coding, broad bandwidth (2-80 MHz), digital filtering, programmable processing, RF data acquisition, multithread/multicore real-time display, and rapid mechanical scanning (
The AGINAO Self-Programming Engine
NASA Astrophysics Data System (ADS)
Skaba, Wojciech
2013-01-01
The AGINAO is a project to create a human-level artificial general intelligence system (HL AGI) embodied in the Aldebaran Robotics' NAO humanoid robot. The dynamical and open-ended cognitive engine of the robot is represented by an embedded and multi-threaded control program, that is self-crafted rather than hand-crafted, and is executed on a simulated Universal Turing Machine (UTM). The actual structure of the cognitive engine emerges as a result of placing the robot in a natural preschool-like environment and running a core start-up system that executes self-programming of the cognitive layer on top of the core layer. The data from the robot's sensory devices supplies the training samples for the machine learning methods, while the commands sent to actuators enable testing hypotheses and getting a feedback. The individual self-created subroutines are supposed to reflect the patterns and concepts of the real world, while the overall program structure reflects the spatial and temporal hierarchy of the world dependencies. This paper focuses on the details of the self-programming approach, limiting the discussion of the applied cognitive architecture to a necessary minimum.
A mobile mapping system for spatial information based on DGPS/EGIS
NASA Astrophysics Data System (ADS)
Pei, Ling; Wang, Qing; Gu, Juan
2007-11-01
With the rapid developments of mobile device and wireless communication, it brings a new challenge for acquiring the spatial information. A mobile mapping system based on differential global position system (DGPS) integrated with embedded geographic information system (EGIS) is designed. A mobile terminal adapts to various GPS differential environments such as single base mode and network GPS mode like Virtual Reference Station (VRS) and Master- Auxiliary Concept (MAC) by the mobile communication technology. The spatial information collected through DGPS is organized in an EGIS running in the embedded device. A set of mobile terminal in real-time DGPS based on GPRS adopting multithreading technique of serial port in manner of simulating overlapped I/O operating is developed, further more, the GPS message analysis and checkout based on Strategy Pattern for various receivers are included in the process of development. A mobile terminal accesses to the GPS network successfully by NTRIP (Networked Transport of RTCM via Internet Protocol) compliance. Finally, the accuracy and reliability of the mobile mapping system are proved by a lot of testing in 9 provinces all over the country.
Supercomputers ready for use as discovery machines for neuroscience.
Helias, Moritz; Kunkel, Susanne; Masumoto, Gen; Igarashi, Jun; Eppler, Jochen Martin; Ishii, Shin; Fukai, Tomoki; Morrison, Abigail; Diesmann, Markus
2012-01-01
NEST is a widely used tool to simulate biological spiking neural networks. Here we explain the improvements, guided by a mathematical model of memory consumption, that enable us to exploit for the first time the computational power of the K supercomputer for neuroscience. Multi-threaded components for wiring and simulation combine 8 cores per MPI process to achieve excellent scaling. K is capable of simulating networks corresponding to a brain area with 10(8) neurons and 10(12) synapses in the worst case scenario of random connectivity; for larger networks of the brain its hierarchical organization can be exploited to constrain the number of communicating computer nodes. We discuss the limits of the software technology, comparing maximum filling scaling plots for K and the JUGENE BG/P system. The usability of these machines for network simulations has become comparable to running simulations on a single PC. Turn-around times in the range of minutes even for the largest systems enable a quasi interactive working style and render simulations on this scale a practical tool for computational neuroscience.
Methods For Self-Organizing Software
Bouchard, Ann M.; Osbourn, Gordon C.
2005-10-18
A method for dynamically self-assembling and executing software is provided, containing machines that self-assemble execution sequences and data structures. In addition to ordered functions calls (found commonly in other software methods), mutual selective bonding between bonding sites of machines actuates one or more of the bonding machines. Two or more machines can be virtually isolated by a construct, called an encapsulant, containing a population of machines and potentially other encapsulants that can only bond with each other. A hierarchical software structure can be created using nested encapsulants. Multi-threading is implemented by populations of machines in different encapsulants that are interacting concurrently. Machines and encapsulants can move in and out of other encapsulants, thereby changing the functionality. Bonding between machines' sites can be deterministic or stochastic with bonding triggering a sequence of actions that can be implemented by each machine. A self-assembled execution sequence occurs as a sequence of stochastic binding between machines followed by their deterministic actuation. It is the sequence of bonding of machines that determines the execution sequence, so that the sequence of instructions need not be contiguous in memory.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Amadio, G.; et al.
An intensive R&D and programming effort is required to accomplish new challenges posed by future experimental high-energy particle physics (HEP) programs. The GeantV project aims to narrow the gap between the performance of the existing HEP detector simulation software and the ideal performance achievable, exploiting latest advances in computing technology. The project has developed a particle detector simulation prototype capable of transporting in parallel particles in complex geometries exploiting instruction level microparallelism (SIMD and SIMT), task-level parallelism (multithreading) and high-level parallelism (MPI), leveraging both the multi-core and the many-core opportunities. We present preliminary verification results concerning the electromagnetic (EM) physicsmore » models developed for parallel computing architectures within the GeantV project. In order to exploit the potential of vectorization and accelerators and to make the physics model effectively parallelizable, advanced sampling techniques have been implemented and tested. In this paper we introduce a set of automated statistical tests in order to verify the vectorized models by checking their consistency with the corresponding Geant4 models and to validate them against experimental data.« less
Adaptive and mobile ground sensor array.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Holzrichter, Michael Warren; O'Rourke, William T.; Zenner, Jennifer
The goal of this LDRD was to demonstrate the use of robotic vehicles for deploying and autonomously reconfiguring seismic and acoustic sensor arrays with high (centimeter) accuracy to obtain enhancement of our capability to locate and characterize remote targets. The capability to accurately place sensors and then retrieve and reconfigure them allows sensors to be placed in phased arrays in an initial monitoring configuration and then to be reconfigured in an array tuned to the specific frequencies and directions of the selected target. This report reviews the findings and accomplishments achieved during this three-year project. This project successfully demonstrated autonomousmore » deployment and retrieval of a payload package with an accuracy of a few centimeters using differential global positioning system (GPS) signals. It developed an autonomous, multisensor, temporally aligned, radio-frequency communication and signal processing capability, and an array optimization algorithm, which was implemented on a digital signal processor (DSP). Additionally, the project converted the existing single-threaded, monolithic robotic vehicle control code into a multi-threaded, modular control architecture that enhances the reuse of control code in future projects.« less
Evaluation of a Multicore-Optimized Implementation for Tomographic Reconstruction
Agulleiro, Jose-Ignacio; Fernández, José Jesús
2012-01-01
Tomography allows elucidation of the three-dimensional structure of an object from a set of projection images. In life sciences, electron microscope tomography is providing invaluable information about the cell structure at a resolution of a few nanometres. Here, large images are required to combine wide fields of view with high resolution requirements. The computational complexity of the algorithms along with the large image size then turns tomographic reconstruction into a computationally demanding problem. Traditionally, high-performance computing techniques have been applied to cope with such demands on supercomputers, distributed systems and computer clusters. In the last few years, the trend has turned towards graphics processing units (GPUs). Here we present a detailed description and a thorough evaluation of an alternative approach that relies on exploitation of the power available in modern multicore computers. The combination of single-core code optimization, vector processing, multithreading and efficient disk I/O operations succeeds in providing fast tomographic reconstructions on standard computers. The approach turns out to be competitive with the fastest GPU-based solutions thus far. PMID:23139768
BCM: toolkit for Bayesian analysis of Computational Models using samplers.
Thijssen, Bram; Dijkstra, Tjeerd M H; Heskes, Tom; Wessels, Lodewyk F A
2016-10-21
Computational models in biology are characterized by a large degree of uncertainty. This uncertainty can be analyzed with Bayesian statistics, however, the sampling algorithms that are frequently used for calculating Bayesian statistical estimates are computationally demanding, and each algorithm has unique advantages and disadvantages. It is typically unclear, before starting an analysis, which algorithm will perform well on a given computational model. We present BCM, a toolkit for the Bayesian analysis of Computational Models using samplers. It provides efficient, multithreaded implementations of eleven algorithms for sampling from posterior probability distributions and for calculating marginal likelihoods. BCM includes tools to simplify the process of model specification and scripts for visualizing the results. The flexible architecture allows it to be used on diverse types of biological computational models. In an example inference task using a model of the cell cycle based on ordinary differential equations, BCM is significantly more efficient than existing software packages, allowing more challenging inference problems to be solved. BCM represents an efficient one-stop-shop for computational modelers wishing to use sampler-based Bayesian statistics.
Spectral Analysis Tool 6.2 for Windows
NASA Technical Reports Server (NTRS)
Morgan, Feiming; Sue, Miles; Peng, Ted; Tan, Harry; Liang, Robert; Kinman, Peter
2006-01-01
Spectral Analysis Tool 6.2 is the latest version of a computer program that assists in analysis of interference between radio signals of the types most commonly used in Earth/spacecraft radio communications. [An earlier version was reported in Software for Analyzing Earth/Spacecraft Radio Interference (NPO-20422), NASA Tech Briefs, Vol. 25, No. 4 (April 2001), page 52.] SAT 6.2 calculates signal spectra, bandwidths, and interference effects for several families of modulation schemes. Several types of filters can be modeled, and the program calculates and displays signal spectra after filtering by any of the modeled filters. The program accommodates two simultaneous signals: a desired signal and an interferer. The interference-to-signal power ratio can be calculated for the filtered desired and interfering signals. Bandwidth-occupancy and link-budget calculators are included for the user s convenience. SAT 6.2 has a new software structure and provides a new user interface that is both intuitive and convenient. SAT 6.2 incorporates multi-tasking, multi-threaded execution, virtual memory management, and a dynamic link library. SAT 6.2 is designed for use on 32- bit computers employing Microsoft Windows operating systems.
Finding Feasible Abstract Counter-Examples
NASA Technical Reports Server (NTRS)
Pasareanu, Corina S.; Dwyer, Matthew B.; Visser, Willem; Clancy, Daniel (Technical Monitor)
2002-01-01
A strength of model checking is its ability to automate the detection of subtle system errors and produce traces that exhibit those errors. Given the high computational cost of model checking most researchers advocate the use of aggressive property-preserving abstractions. Unfortunately, the more aggressively a system is abstracted the more infeasible behavior it will have. Thus, while abstraction enables efficient model checking it also threatens the usefulness of model checking as a defect detection tool, since it may be difficult to determine whether a counter-example is feasible and hence worth developer time to analyze. We have explored several strategies for addressing this problem by extending an explicit-state model checker, Java PathFinder (JPF), to search for and analyze counter-examples in the presence of abstractions. We demonstrate that these techniques effectively preserve the defect detection ability of model checking in the presence of aggressive abstraction by applying them to check properties of several abstracted multi-threaded Java programs. These new capabilities are not specific to JPF and can be easily adapted to other model checking frameworks; we describe how this was done for the Bandera toolset.
General Aviation Data Framework
NASA Technical Reports Server (NTRS)
Blount, Elaine M.; Chung, Victoria I.
2006-01-01
The Flight Research Services Directorate at the NASA Langley Research Center (LaRC) provides development and operations services associated with three general aviation (GA) aircraft used for research experiments. The GA aircraft includes a Cessna 206X Stationair, a Lancair Colombia 300X, and a Cirrus SR22X. Since 2004, the GA Data Framework software was designed and implemented to gather data from a varying set of hardware and software sources as well as enable transfer of the data to other computers or devices. The key requirements for the GA Data Framework software include platform independence, the ability to reuse the framework for different projects without changing the framework code, graphics display capabilities, and the ability to vary the interfaces and their performance. Data received from the various devices is stored in shared memory. This paper concentrates on the object oriented software design patterns within the General Aviation Data Framework, and how they enable the construction of project specific software without changing the base classes. The issues of platform independence and multi-threading which enable interfaces to run at different frame rates are also discussed in this paper.
Avoiding and tolerating latency in large-scale next-generation shared-memory multiprocessors
NASA Technical Reports Server (NTRS)
Probst, David K.
1993-01-01
A scalable solution to the memory-latency problem is necessary to prevent the large latencies of synchronization and memory operations inherent in large-scale shared-memory multiprocessors from reducing high performance. We distinguish latency avoidance and latency tolerance. Latency is avoided when data is brought to nearby locales for future reference. Latency is tolerated when references are overlapped with other computation. Latency-avoiding locales include: processor registers, data caches used temporally, and nearby memory modules. Tolerating communication latency requires parallelism, allowing the overlap of communication and computation. Latency-tolerating techniques include: vector pipelining, data caches used spatially, prefetching in various forms, and multithreading in various forms. Relaxing the consistency model permits increased use of avoidance and tolerance techniques. Each model is a mapping from the program text to sets of partial orders on program operations; it is a convention about which temporal precedences among program operations are necessary. Information about temporal locality and parallelism constrains the use of avoidance and tolerance techniques. Suitable architectural primitives and compiler technology are required to exploit the increased freedom to reorder and overlap operations in relaxed models.
Supercomputers Ready for Use as Discovery Machines for Neuroscience
Helias, Moritz; Kunkel, Susanne; Masumoto, Gen; Igarashi, Jun; Eppler, Jochen Martin; Ishii, Shin; Fukai, Tomoki; Morrison, Abigail; Diesmann, Markus
2012-01-01
NEST is a widely used tool to simulate biological spiking neural networks. Here we explain the improvements, guided by a mathematical model of memory consumption, that enable us to exploit for the first time the computational power of the K supercomputer for neuroscience. Multi-threaded components for wiring and simulation combine 8 cores per MPI process to achieve excellent scaling. K is capable of simulating networks corresponding to a brain area with 108 neurons and 1012 synapses in the worst case scenario of random connectivity; for larger networks of the brain its hierarchical organization can be exploited to constrain the number of communicating computer nodes. We discuss the limits of the software technology, comparing maximum filling scaling plots for K and the JUGENE BG/P system. The usability of these machines for network simulations has become comparable to running simulations on a single PC. Turn-around times in the range of minutes even for the largest systems enable a quasi interactive working style and render simulations on this scale a practical tool for computational neuroscience. PMID:23129998
Methods and devices for determining quality of services of storage systems
Seelam, Seetharami R [Yorktown Heights, NY; Teller, Patricia J [Las Cruces, NM
2012-01-17
Methods and systems for allowing access to computer storage systems. Multiple requests from multiple applications can be received and processed efficiently to allow traffic from multiple customers to access the storage system concurrently.
Gianazza, Erica; Tremoli, Elena; Banfi, Cristina
2014-12-01
Selected reaction monitoring, also known as multiple reaction monitoring, is a powerful targeted mass spectrometry approach for a confident quantitation of proteins/peptides in complex biological samples. In recent years, its optimization and application have become pivotal and of great interest in clinical research to derive useful outcomes for patient care. Thus, selected reaction monitoring/multiple reaction monitoring is now used as a highly sensitive and selective method for the evaluation of protein abundances and biomarker verification with potential applications in medical screening. This review describes technical aspects for the development of a robust multiplex assay and discussing its recent applications in cardiovascular proteomics: verification of promising disease candidates to select only the highest quality peptides/proteins for a preclinical validation, as well as quantitation of protein isoforms and post-translational modifications.
Dwivedi, Bhakti; Kowalski, Jeanne
2018-01-01
While many methods exist for integrating multi-omics data or defining gene sets, there is no one single tool that defines gene sets based on merging of multiple omics data sets. We present shinyGISPA, an open-source application with a user-friendly web-based interface to define genes according to their similarity in several molecular changes that are driving a disease phenotype. This tool was developed to help facilitate the usability of a previously published method, Gene Integrated Set Profile Analysis (GISPA), among researchers with limited computer-programming skills. The GISPA method allows the identification of multiple gene sets that may play a role in the characterization, clinical application, or functional relevance of a disease phenotype. The tool provides an automated workflow that is highly scalable and adaptable to applications that go beyond genomic data merging analysis. It is available at http://shinygispa.winship.emory.edu/shinyGISPA/.
Dwivedi, Bhakti
2018-01-01
While many methods exist for integrating multi-omics data or defining gene sets, there is no one single tool that defines gene sets based on merging of multiple omics data sets. We present shinyGISPA, an open-source application with a user-friendly web-based interface to define genes according to their similarity in several molecular changes that are driving a disease phenotype. This tool was developed to help facilitate the usability of a previously published method, Gene Integrated Set Profile Analysis (GISPA), among researchers with limited computer-programming skills. The GISPA method allows the identification of multiple gene sets that may play a role in the characterization, clinical application, or functional relevance of a disease phenotype. The tool provides an automated workflow that is highly scalable and adaptable to applications that go beyond genomic data merging analysis. It is available at http://shinygispa.winship.emory.edu/shinyGISPA/. PMID:29415010
ERIC Educational Resources Information Center
van der Kolk, Koos; Hartog, Rob; Beldman, Gerrit; Gruppen, Harry
2013-01-01
Increasingly, mobile applications appear on the market that can support students in chemistry laboratory classes. In a multiple app-supported laboratory, each of these applications covers one use-case. In practice, this leads to situations in which information is scattered over different screens and written materials. Such a multiple app-supported…
Method and system of integrating information from multiple sources
Alford, Francine A [Livermore, CA; Brinkerhoff, David L [Antioch, CA
2006-08-15
A system and method of integrating information from multiple sources in a document centric application system. A plurality of application systems are connected through an object request broker to a central repository. The information may then be posted on a webpage. An example of an implementation of the method and system is an online procurement system.
Teachers' Approaches to Proportional Relationship Problems in Multiple Measure Spaces
ERIC Educational Resources Information Center
Kazunga, Cathrine; Bansilal, Sarah
2016-01-01
Ratio and proportion have many daily life applications and hence form an important part of the Mathematical Literacy (ML) curriculum in South African schools. The purpose of this study was to explore ML teachers' application of ratio in an assessment task with multiple measure spaces set within the real-life context of the need to establish the…
Double-Pulsed 2-micron Laser Transmitter for Multiple Lidar Applications
NASA Technical Reports Server (NTRS)
Singh, Upendra N.; Yu, Jirong
2002-01-01
A high energy double-pulsed Ho:Tm:YLF 2-micron laser amplifier has been demonstrated. 600 mJ per pulse pair under Q-switch operation is achieved with the gain of 4.4. This solid-state laser source can be used as lidar transmitter for multiple lidar applications such as coherent wind and carbon dioxide measurements.
Improvement in HPC performance through HIPPI RAID storage
NASA Technical Reports Server (NTRS)
Homan, Blake
1993-01-01
In 1986, RAID (redundant array of inexpensive (or independent) disks) technology was introduced as a viable solution to the I/O bottleneck. A number of different RAID levels were defined in 1987 by the Computer Science Division (EECS) University of California, Berkeley, each with specific advantages and disadvantages. With multiple RAID options available, taking advantage of RAID technology required matching particular RAID levels with specific applications. It was not possible to use one RAID device to address all applications. Maximum Strategy's Gen 4 Storage Server addresses this issue with a new capability called programmable RAID level partitioning. This capability enables users to have multiple RAID levels coexist on the same disks, thereby providing the versatility necessary for multiple concurrent applications.
Identifying College Students' Multiple Intelligences to Enhance Motivation and Language Proficiency
ERIC Educational Resources Information Center
Madkour, Magda; Mohamed, Rafik Ahmed Abdel Moati
2016-01-01
While most research studies on the theory of multiple intelligences focused on the application of the multiple intelligences domains as separate components, this quasi-experimental research targeted the effect of multiple intelligences as integrated abilities for teaching and learning English at higher education. The purpose of this study was to…
Scheins, J J; Vahedipour, K; Pietrzyk, U; Shah, N J
2015-12-21
For high-resolution, iterative 3D PET image reconstruction the efficient implementation of forward-backward projectors is essential to minimise the calculation time. Mathematically, the projectors are summarised as a system response matrix (SRM) whose elements define the contribution of image voxels to lines-of-response (LORs). In fact, the SRM easily comprises billions of non-zero matrix elements to evaluate the tremendous number of LORs as provided by state-of-the-art PET scanners. Hence, the performance of iterative algorithms, e.g. maximum-likelihood-expectation-maximisation (MLEM), suffers from severe computational problems due to the intensive memory access and huge number of floating point operations. Here, symmetries occupy a key role in terms of efficient implementation. They reduce the amount of independent SRM elements, thus allowing for a significant matrix compression according to the number of exploitable symmetries. With our previous work, the PET REconstruction Software TOolkit (PRESTO), very high compression factors (>300) are demonstrated by using specific non-Cartesian voxel patterns involving discrete polar symmetries. In this way, a pre-calculated memory-resident SRM using complex volume-of-intersection calculations can be achieved. However, our original ray-driven implementation suffers from addressing voxels, projection data and SRM elements in disfavoured memory access patterns. As a consequence, a rather limited numerical throughput is observed due to the massive waste of memory bandwidth and inefficient usage of cache respectively. In this work, an advantageous symmetry-driven evaluation of the forward-backward projectors is proposed to overcome these inefficiencies. The polar symmetries applied in PRESTO suggest a novel organisation of image data and LOR projection data in memory to enable an efficient single instruction multiple data vectorisation, i.e. simultaneous use of any SRM element for symmetric LORs. In addition, the calculation time is further reduced by using simultaneous multi-threading (SMT). A global speedup factor of 11 without SMT and above 100 with SMT has been achieved for the improved CPU-based implementation while obtaining equivalent numerical results.
Employing machine learning for reliable miRNA target identification in plants
2011-01-01
Background miRNAs are ~21 nucleotide long small noncoding RNA molecules, formed endogenously in most of the eukaryotes, which mainly control their target genes post transcriptionally by interacting and silencing them. While a lot of tools has been developed for animal miRNA target system, plant miRNA target identification system has witnessed limited development. Most of them have been centered around exact complementarity match. Very few of them considered other factors like multiple target sites and role of flanking regions. Result In the present work, a Support Vector Regression (SVR) approach has been implemented for plant miRNA target identification, utilizing position specific dinucleotide density variation information around the target sites, to yield highly reliable result. It has been named as p-TAREF (plant-Target Refiner). Performance comparison for p-TAREF was done with other prediction tools for plants with utmost rigor and where p-TAREF was found better performing in several aspects. Further, p-TAREF was run over the experimentally validated miRNA targets from species like Arabidopsis, Medicago, Rice and Tomato, and detected them accurately, suggesting gross usability of p-TAREF for plant species. Using p-TAREF, target identification was done for the complete Rice transcriptome, supported by expression and degradome based data. miR156 was found as an important component of the Rice regulatory system, where control of genes associated with growth and transcription looked predominant. The entire methodology has been implemented in a multi-threaded parallel architecture in Java, to enable fast processing for web-server version as well as standalone version. This also makes it to run even on a simple desktop computer in concurrent mode. It also provides a facility to gather experimental support for predictions made, through on the spot expression data analysis, in its web-server version. Conclusion A machine learning multivariate feature tool has been implemented in parallel and locally installable form, for plant miRNA target identification. The performance was assessed and compared through comprehensive testing and benchmarking, suggesting a reliable performance and gross usability for transcriptome wide plant miRNA target identification. PMID:22206472
Employing machine learning for reliable miRNA target identification in plants.
Jha, Ashwani; Shankar, Ravi
2011-12-29
miRNAs are ~21 nucleotide long small noncoding RNA molecules, formed endogenously in most of the eukaryotes, which mainly control their target genes post transcriptionally by interacting and silencing them. While a lot of tools has been developed for animal miRNA target system, plant miRNA target identification system has witnessed limited development. Most of them have been centered around exact complementarity match. Very few of them considered other factors like multiple target sites and role of flanking regions. In the present work, a Support Vector Regression (SVR) approach has been implemented for plant miRNA target identification, utilizing position specific dinucleotide density variation information around the target sites, to yield highly reliable result. It has been named as p-TAREF (plant-Target Refiner). Performance comparison for p-TAREF was done with other prediction tools for plants with utmost rigor and where p-TAREF was found better performing in several aspects. Further, p-TAREF was run over the experimentally validated miRNA targets from species like Arabidopsis, Medicago, Rice and Tomato, and detected them accurately, suggesting gross usability of p-TAREF for plant species. Using p-TAREF, target identification was done for the complete Rice transcriptome, supported by expression and degradome based data. miR156 was found as an important component of the Rice regulatory system, where control of genes associated with growth and transcription looked predominant. The entire methodology has been implemented in a multi-threaded parallel architecture in Java, to enable fast processing for web-server version as well as standalone version. This also makes it to run even on a simple desktop computer in concurrent mode. It also provides a facility to gather experimental support for predictions made, through on the spot expression data analysis, in its web-server version. A machine learning multivariate feature tool has been implemented in parallel and locally installable form, for plant miRNA target identification. The performance was assessed and compared through comprehensive testing and benchmarking, suggesting a reliable performance and gross usability for transcriptome wide plant miRNA target identification.