efficient multi-core algorithm: Topics by Science.gov

Sample records for efficient multi-core algorithm

T-L Plane Abstraction-Based Energy-Efficient Real-Time Scheduling for Multi-Core Wireless Sensors.

PubMed

Kim, Youngmin; Lee, Ki-Seong; Pham, Ngoc-Son; Lee, Sun-Ro; Lee, Chan-Gun

2016-07-08

Energy efficiency is considered as a critical requirement for wireless sensor networks. As more wireless sensor nodes are equipped with multi-cores, there are emerging needs for energy-efficient real-time scheduling algorithms. The T-L plane-based scheme is known to be an optimal global scheduling technique for periodic real-time tasks on multi-cores. Unfortunately, there has been a scarcity of studies on extending T-L plane-based scheduling algorithms to exploit energy-saving techniques. In this paper, we propose a new T-L plane-based algorithm enabling energy-efficient real-time scheduling on multi-core sensor nodes with dynamic power management (DPM). Our approach addresses the overhead of processor mode transitions and reduces fragmentations of the idle time, which are inherent in T-L plane-based algorithms. Our experimental results show the effectiveness of the proposed algorithm compared to other energy-aware scheduling methods on T-L plane abstraction.
T-L Plane Abstraction-Based Energy-Efficient Real-Time Scheduling for Multi-Core Wireless Sensors

PubMed Central

Kim, Youngmin; Lee, Ki-Seong; Pham, Ngoc-Son; Lee, Sun-Ro; Lee, Chan-Gun

2016-01-01

Energy efficiency is considered as a critical requirement for wireless sensor networks. As more wireless sensor nodes are equipped with multi-cores, there are emerging needs for energy-efficient real-time scheduling algorithms. The T-L plane-based scheme is known to be an optimal global scheduling technique for periodic real-time tasks on multi-cores. Unfortunately, there has been a scarcity of studies on extending T-L plane-based scheduling algorithms to exploit energy-saving techniques. In this paper, we propose a new T-L plane-based algorithm enabling energy-efficient real-time scheduling on multi-core sensor nodes with dynamic power management (DPM). Our approach addresses the overhead of processor mode transitions and reduces fragmentations of the idle time, which are inherent in T-L plane-based algorithms. Our experimental results show the effectiveness of the proposed algorithm compared to other energy-aware scheduling methods on T-L plane abstraction. PMID:27399722
Improvement of Speckle Contrast Image Processing by an Efficient Algorithm.

PubMed

Steimers, A; Farnung, W; Kohl-Bareis, M

2016-01-01

We demonstrate an efficient algorithm for the temporal and spatial based calculation of speckle contrast for the imaging of blood flow by laser speckle contrast analysis (LASCA). It reduces the numerical complexity of necessary calculations, facilitates a multi-core and many-core implementation of the speckle analysis and enables an independence of temporal or spatial resolution and SNR. The new algorithm was evaluated for both spatial and temporal based analysis of speckle patterns with different image sizes and amounts of recruited pixels as sequential, multi-core and many-core code.
A highly efficient multi-core algorithm for clustering extremely large datasets

PubMed Central

2010-01-01

Background In recent years, the demand for computational power in computational biology has increased due to rapidly growing data sets from microarray and other high-throughput technologies. This demand is likely to increase. Standard algorithms for analyzing data, such as cluster algorithms, need to be parallelized for fast processing. Unfortunately, most approaches for parallelizing algorithms largely rely on network communication protocols connecting and requiring multiple computers. One answer to this problem is to utilize the intrinsic capabilities in current multi-core hardware to distribute the tasks among the different cores of one computer. Results We introduce a multi-core parallelization of the k-means and k-modes cluster algorithms based on the design principles of transactional memory for clustering gene expression microarray type data and categorial SNP data. Our new shared memory parallel algorithms show to be highly efficient. We demonstrate their computational power and show their utility in cluster stability and sensitivity analysis employing repeated runs with slightly changed parameters. Computation speed of our Java based algorithm was increased by a factor of 10 for large data sets while preserving computational accuracy compared to single-core implementations and a recently published network based parallelization. Conclusions Most desktop computers and even notebooks provide at least dual-core processors. Our multi-core algorithms show that using modern algorithmic concepts, parallelization makes it possible to perform even such laborious tasks as cluster sensitivity and cluster number estimation on the laboratory computer. PMID:20370922
VLBI-resolution radio-map algorithms: Performance analysis of different levels of data-sharing on multi-socket, multi-core architectures

NASA Astrophysics Data System (ADS)

Tabik, S.; Romero, L. F.; Mimica, P.; Plata, O.; Zapata, E. L.

2012-09-01

A broad area in astronomy focuses on simulating extragalactic objects based on Very Long Baseline Interferometry (VLBI) radio-maps. Several algorithms in this scope simulate what would be the observed radio-maps if emitted from a predefined extragalactic object. This work analyzes the performance and scaling of this kind of algorithms on multi-socket, multi-core architectures. In particular, we evaluate a sharing approach, a privatizing approach and a hybrid approach on systems with complex memory hierarchy that includes shared Last Level Cache (LLC). In addition, we investigate which manual processes can be systematized and then automated in future works. The experiments show that the data-privatizing model scales efficiently on medium scale multi-socket, multi-core systems (up to 48 cores) while regardless of algorithmic and scheduling optimizations, the sharing approach is unable to reach acceptable scalability on more than one socket. However, the hybrid model with a specific level of data-sharing provides the best scalability over all used multi-socket, multi-core systems.
Network Coding on Heterogeneous Multi-Core Processors for Wireless Sensor Networks

PubMed Central

Kim, Deokho; Park, Karam; Ro, Won W.

2011-01-01

While network coding is well known for its efficiency and usefulness in wireless sensor networks, the excessive costs associated with decoding computation and complexity still hinder its adoption into practical use. On the other hand, high-performance microprocessors with heterogeneous multi-cores would be used as processing nodes of the wireless sensor networks in the near future. To this end, this paper introduces an efficient network coding algorithm developed for the heterogenous multi-core processors. The proposed idea is fully tested on one of the currently available heterogeneous multi-core processors referred to as the Cell Broadband Engine. PMID:22164053
Virtual optical network mapping and core allocation in elastic optical networks using multi-core fibers

NASA Astrophysics Data System (ADS)

Xuan, Hejun; Wang, Yuping; Xu, Zhanqi; Hao, Shanshan; Wang, Xiaoli

2017-11-01

Virtualization technology can greatly improve the efficiency of the networks by allowing the virtual optical networks to share the resources of the physical networks. However, it will face some challenges, such as finding the efficient strategies for virtual nodes mapping, virtual links mapping and spectrum assignment. It is even more complex and challenging when the physical elastic optical networks using multi-core fibers. To tackle these challenges, we establish a constrained optimization model to determine the optimal schemes of optical network mapping, core allocation and spectrum assignment. To solve the model efficiently, tailor-made encoding scheme, crossover and mutation operators are designed. Based on these, an efficient genetic algorithm is proposed to obtain the optimal schemes of the virtual nodes mapping, virtual links mapping, core allocation. The simulation experiments are conducted on three widely used networks, and the experimental results show the effectiveness of the proposed model and algorithm.
Parallel Lattice Basis Reduction Using a Multi-threaded Schnorr-Euchner LLL Algorithm

NASA Astrophysics Data System (ADS)

Backes, Werner; Wetzel, Susanne

In this paper, we introduce a new parallel variant of the LLL lattice basis reduction algorithm. Our new, multi-threaded algorithm is the first to provide an efficient, parallel implementation of the Schorr-Euchner algorithm for today’s multi-processor, multi-core computer architectures. Experiments with sparse and dense lattice bases show a speed-up factor of about 1.8 for the 2-thread and about factor 3.2 for the 4-thread version of our new parallel lattice basis reduction algorithm in comparison to the traditional non-parallel algorithm.
MIMO signal progressing with RLSCMA algorithm for multi-mode multi-core optical transmission system

NASA Astrophysics Data System (ADS)

Bi, Yuan; Liu, Bo; Zhang, Li-jia; Xin, Xiang-jun; Zhang, Qi; Wang, Yong-jun; Tian, Qing-hua; Tian, Feng; Mao, Ya-ya

2018-01-01

In the process of transmitting signals of multi-mode multi-core fiber, there will be mode coupling between modes. The mode dispersion will also occur because each mode has different transmission speed in the link. Mode coupling and mode dispersion will cause damage to the useful signal in the transmission link, so the receiver needs to deal received signal with digital signal processing, and compensate the damage in the link. We first analyzes the influence of mode coupling and mode dispersion in the process of transmitting signals of multi-mode multi-core fiber, then presents the relationship between the coupling coefficient and dispersion coefficient. Then we carry out adaptive signal processing with MIMO equalizers based on recursive least squares constant modulus algorithm (RLSCMA). The MIMO equalization algorithm offers adaptive equalization taps according to the degree of crosstalk in cores or modes, which eliminates the interference among different modes and cores in space division multiplexing(SDM) transmission system. The simulation results show that the distorted signals are restored efficiently with fast convergence speed.
CQPSO scheduling algorithm for heterogeneous multi-core DAG task model

NASA Astrophysics Data System (ADS)

Zhai, Wenzheng; Hu, Yue-Li; Ran, Feng

2017-07-01

Efficient task scheduling is critical to achieve high performance in a heterogeneous multi-core computing environment. The paper focuses on the heterogeneous multi-core directed acyclic graph (DAG) task model and proposes a novel task scheduling method based on an improved chaotic quantum-behaved particle swarm optimization (CQPSO) algorithm. A task priority scheduling list was built. A processor with minimum cumulative earliest finish time (EFT) was acted as the object of the first task assignment. The task precedence relationships were satisfied and the total execution time of all tasks was minimized. The experimental results show that the proposed algorithm has the advantage of optimization abilities, simple and feasible, fast convergence, and can be applied to the task scheduling optimization for other heterogeneous and distributed environment.
Dynamic Voltage-Frequency and Workload Joint Scaling Power Management for Energy Harvesting Multi-Core WSN Node SoC

PubMed Central

Li, Xiangyu; Xie, Nijie; Tian, Xinyue

2017-01-01

This paper proposes a scheduling and power management solution for energy harvesting heterogeneous multi-core WSN node SoC such that the system continues to operate perennially and uses the harvested energy efficiently. The solution consists of a heterogeneous multi-core system oriented task scheduling algorithm and a low-complexity dynamic workload scaling and configuration optimization algorithm suitable for light-weight platforms. Moreover, considering the power consumption of most WSN applications have the characteristic of data dependent behavior, we introduce branches handling mechanism into the solution as well. The experimental result shows that the proposed algorithm can operate in real-time on a lightweight embedded processor (MSP430), and that it can make a system do more valuable works and make more than 99.9% use of the power budget. PMID:28208730
Dynamic Voltage-Frequency and Workload Joint Scaling Power Management for Energy Harvesting Multi-Core WSN Node SoC.

PubMed

Li, Xiangyu; Xie, Nijie; Tian, Xinyue

2017-02-08

This paper proposes a scheduling and power management solution for energy harvesting heterogeneous multi-core WSN node SoC such that the system continues to operate perennially and uses the harvested energy efficiently. The solution consists of a heterogeneous multi-core system oriented task scheduling algorithm and a low-complexity dynamic workload scaling and configuration optimization algorithm suitable for light-weight platforms. Moreover, considering the power consumption of most WSN applications have the characteristic of data dependent behavior, we introduce branches handling mechanism into the solution as well. The experimental result shows that the proposed algorithm can operate in real-time on a lightweight embedded processor (MSP430), and that it can make a system do more valuable works and make more than 99.9% use of the power budget.
The design of multi-core DSP parallel model based on message passing and multi-level pipeline

NASA Astrophysics Data System (ADS)

Niu, Jingyu; Hu, Jian; He, Wenjing; Meng, Fanrong; Li, Chuanrong

2017-10-01

Currently, the design of embedded signal processing system is often based on a specific application, but this idea is not conducive to the rapid development of signal processing technology. In this paper, a parallel processing model architecture based on multi-core DSP platform is designed, and it is mainly suitable for the complex algorithms which are composed of different modules. This model combines the ideas of multi-level pipeline parallelism and message passing, and summarizes the advantages of the mainstream model of multi-core DSP (the Master-Slave model and the Data Flow model), so that it has better performance. This paper uses three-dimensional image generation algorithm to validate the efficiency of the proposed model by comparing with the effectiveness of the Master-Slave and the Data Flow model.
Coding for parallel execution of hardware-in-the-loop millimeter-wave scene generation models on multicore SIMD processor architectures

NASA Astrophysics Data System (ADS)

Olson, Richard F.

2013-05-01

Rendering of point scatterer based radar scenes for millimeter wave (mmW) seeker tests in real-time hardware-in-the-loop (HWIL) scene generation requires efficient algorithms and vector-friendly computer architectures for complex signal synthesis. New processor technology from Intel implements an extended 256-bit vector SIMD instruction set (AVX, AVX2) in a multi-core CPU design providing peak execution rates of hundreds of GigaFLOPS (GFLOPS) on one chip. Real world mmW scene generation code can approach peak SIMD execution rates only after careful algorithm and source code design. An effective software design will maintain high computing intensity emphasizing register-to-register SIMD arithmetic operations over data movement between CPU caches or off-chip memories. Engineers at the U.S. Army Aviation and Missile Research, Development and Engineering Center (AMRDEC) applied two basic parallel coding methods to assess new 256-bit SIMD multi-core architectures for mmW scene generation in HWIL. These include use of POSIX threads built on vector library functions and more portable, highlevel parallel code based on compiler technology (e.g. OpenMP pragmas and SIMD autovectorization). Since CPU technology is rapidly advancing toward high processor core counts and TeraFLOPS peak SIMD execution rates, it is imperative that coding methods be identified which produce efficient and maintainable parallel code. This paper describes the algorithms used in point scatterer target model rendering, the parallelization of those algorithms, and the execution performance achieved on an AVX multi-core machine using the two basic parallel coding methods. The paper concludes with estimates for scale-up performance on upcoming multi-core technology.
Efficient Geometric Sound Propagation Using Visibility Culling

NASA Astrophysics Data System (ADS)

Chandak, Anish

2011-07-01

Simulating propagation of sound can improve the sense of realism in interactive applications such as video games and can lead to better designs in engineering applications such as architectural acoustics. In this thesis, we present geometric sound propagation techniques which are faster than prior methods and map well to upcoming parallel multi-core CPUs. We model specular reflections by using the image-source method and model finite-edge diffraction by using the well-known Biot-Tolstoy-Medwin (BTM) model. We accelerate the computation of specular reflections by applying novel visibility algorithms, FastV and AD-Frustum, which compute visibility from a point. We accelerate finite-edge diffraction modeling by applying a novel visibility algorithm which computes visibility from a region. Our visibility algorithms are based on frustum tracing and exploit recent advances in fast ray-hierarchy intersections, data-parallel computations, and scalable, multi-core algorithms. The AD-Frustum algorithm adapts its computation to the scene complexity and allows small errors in computing specular reflection paths for higher computational efficiency. FastV and our visibility algorithm from a region are general, object-space, conservative visibility algorithms that together significantly reduce the number of image sources compared to other techniques while preserving the same accuracy. Our geometric propagation algorithms are an order of magnitude faster than prior approaches for modeling specular reflections and two to ten times faster for modeling finite-edge diffraction. Our algorithms are interactive, scale almost linearly on multi-core CPUs, and can handle large, complex, and dynamic scenes. We also compare the accuracy of our sound propagation algorithms with other methods. Once sound propagation is performed, it is desirable to listen to the propagated sound in interactive and engineering applications. We can generate smooth, artifact-free output audio signals by applying efficient audio-processing algorithms. We also present the first efficient audio-processing algorithm for scenarios with simultaneously moving source and moving receiver (MS-MR) which incurs less than 25% overhead compared to static source and moving receiver (SS-MR) or moving source and static receiver (MS-SR) scenario.
A hybrid algorithm for parallel molecular dynamics simulations

NASA Astrophysics Data System (ADS)

Mangiardi, Chris M.; Meyer, R.

2017-10-01

This article describes algorithms for the hybrid parallelization and SIMD vectorization of molecular dynamics simulations with short-range forces. The parallelization method combines domain decomposition with a thread-based parallelization approach. The goal of the work is to enable efficient simulations of very large (tens of millions of atoms) and inhomogeneous systems on many-core processors with hundreds or thousands of cores and SIMD units with large vector sizes. In order to test the efficiency of the method, simulations of a variety of configurations with up to 74 million atoms have been performed. Results are shown that were obtained on multi-core systems with Sandy Bridge and Haswell processors as well as systems with Xeon Phi many-core processors.
LOD-based clustering techniques for efficient large-scale terrain storage and visualization

NASA Astrophysics Data System (ADS)

Bao, Xiaohong; Pajarola, Renato

2003-05-01

Large multi-resolution terrain data sets are usually stored out-of-core. To visualize terrain data at interactive frame rates, the data needs to be organized on disk, loaded into main memory part by part, then rendered efficiently. Many main-memory algorithms have been proposed for efficient vertex selection and mesh construction. Organization of terrain data on disk is quite difficult because the error, the triangulation dependency and the spatial location of each vertex all need to be considered. Previous terrain clustering algorithms did not consider the per-vertex approximation error of individual terrain data sets. Therefore, the vertex sequences on disk are exactly the same for any terrain. In this paper, we propose a novel clustering algorithm which introduces the level-of-detail (LOD) information to terrain data organization to map multi-resolution terrain data to external memory. In our approach the LOD parameters of the terrain elevation points are reflected during clustering. The experiments show that dynamic loading and paging of terrain data at varying LOD is very efficient and minimizes page faults. Additionally, the preprocessing of this algorithm is very fast and works from out-of-core.
Scalable and Power Efficient Data Analytics for Hybrid Exascale Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Choudhary, Alok; Samatova, Nagiza; Wu, Kesheng

This project developed a generic and optimized set of core data analytics functions. These functions organically consolidate a broad constellation of high performance analytical pipelines. As the architectures of emerging HPC systems become inherently heterogeneous, there is a need to design algorithms for data analysis kernels accelerated on hybrid multi-node, multi-core HPC architectures comprised of a mix of CPUs, GPUs, and SSDs. Furthermore, the power-aware trend drives the advances in our performance-energy tradeoff analysis framework which enables our data analysis kernels algorithms and software to be parameterized so that users can choose the right power-performance optimizations.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Trędak, Przemysław, E-mail: przemyslaw.tredak@fuw.edu.pl; Rudnicki, Witold R.; Interdisciplinary Centre for Mathematical and Computational Modelling, University of Warsaw, ul. Pawińskiego 5a, 02-106 Warsaw

The second generation Reactive Bond Order (REBO) empirical potential is commonly used to accurately model a wide range hydrocarbon materials. It is also extensible to other atom types and interactions. REBO potential assumes complex multi-body interaction model, that is difficult to represent efficiently in the SIMD or SIMT programming model. Hence, despite its importance, no efficient GPGPU implementation has been developed for this potential. Here we present a detailed description of a highly efficient GPGPU implementation of molecular dynamics algorithm using REBO potential. The presented algorithm takes advantage of rarely used properties of the SIMT architecture of a modern GPUmore » to solve difficult synchronizations issues that arise in computations of multi-body potential. Techniques developed for this problem may be also used to achieve efficient solutions of different problems. The performance of proposed algorithm is assessed using a range of model systems. It is compared to highly optimized CPU implementation (both single core and OpenMP) available in LAMMPS package. These experiments show up to 6x improvement in forces computation time using single processor of the NVIDIA Tesla K80 compared to high end 16-core Intel Xeon processor.« less
Stochastic Local Search for Core Membership Checking in Hedonic Games

NASA Astrophysics Data System (ADS)

Keinänen, Helena

Hedonic games have emerged as an important tool in economics and show promise as a useful formalism to model multi-agent coalition formation in AI as well as group formation in social networks. We consider a coNP-complete problem of core membership checking in hedonic coalition formation games. No previous algorithms to tackle the problem have been presented. In this work, we overcome this by developing two stochastic local search algorithms for core membership checking in hedonic games. We demonstrate the usefulness of the algorithms by showing experimentally that they find solutions efficiently, particularly for large agent societies.

Energy Efficient Image/Video Data Transmission on Commercial Multi-Core Processors

PubMed Central

Lee, Sungju; Kim, Heegon; Chung, Yongwha; Park, Daihee

2012-01-01

In transmitting image/video data over Video Sensor Networks (VSNs), energy consumption must be minimized while maintaining high image/video quality. Although image/video compression is well known for its efficiency and usefulness in VSNs, the excessive costs associated with encoding computation and complexity still hinder its adoption for practical use. However, it is anticipated that high-performance handheld multi-core devices will be used as VSN processing nodes in the near future. In this paper, we propose a way to improve the energy efficiency of image and video compression with multi-core processors while maintaining the image/video quality. We improve the compression efficiency at the algorithmic level or derive the optimal parameters for the combination of a machine and compression based on the tradeoff between the energy consumption and the image/video quality. Based on experimental results, we confirm that the proposed approach can improve the energy efficiency of the straightforward approach by a factor of 2∼5 without compromising image/video quality. PMID:23202181
Massively parallel algorithm and implementation of RI-MP2 energy calculation for peta-scale many-core supercomputers.

PubMed

Katouda, Michio; Naruse, Akira; Hirano, Yukihiko; Nakajima, Takahito

2016-11-15

A new parallel algorithm and its implementation for the RI-MP2 energy calculation utilizing peta-flop-class many-core supercomputers are presented. Some improvements from the previous algorithm (J. Chem. Theory Comput. 2013, 9, 5373) have been performed: (1) a dual-level hierarchical parallelization scheme that enables the use of more than 10,000 Message Passing Interface (MPI) processes and (2) a new data communication scheme that reduces network communication overhead. A multi-node and multi-GPU implementation of the present algorithm is presented for calculations on a central processing unit (CPU)/graphics processing unit (GPU) hybrid supercomputer. Benchmark results of the new algorithm and its implementation using the K computer (CPU clustering system) and TSUBAME 2.5 (CPU/GPU hybrid system) demonstrate high efficiency. The peak performance of 3.1 PFLOPS is attained using 80,199 nodes of the K computer. The peak performance of the multi-node and multi-GPU implementation is 514 TFLOPS using 1349 nodes and 4047 GPUs of TSUBAME 2.5. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Efficient implementation of the many-body Reactive Bond Order (REBO) potential on GPU

NASA Astrophysics Data System (ADS)

Trędak, Przemysław; Rudnicki, Witold R.; Majewski, Jacek A.

2016-09-01

The second generation Reactive Bond Order (REBO) empirical potential is commonly used to accurately model a wide range hydrocarbon materials. It is also extensible to other atom types and interactions. REBO potential assumes complex multi-body interaction model, that is difficult to represent efficiently in the SIMD or SIMT programming model. Hence, despite its importance, no efficient GPGPU implementation has been developed for this potential. Here we present a detailed description of a highly efficient GPGPU implementation of molecular dynamics algorithm using REBO potential. The presented algorithm takes advantage of rarely used properties of the SIMT architecture of a modern GPU to solve difficult synchronizations issues that arise in computations of multi-body potential. Techniques developed for this problem may be also used to achieve efficient solutions of different problems. The performance of proposed algorithm is assessed using a range of model systems. It is compared to highly optimized CPU implementation (both single core and OpenMP) available in LAMMPS package. These experiments show up to 6x improvement in forces computation time using single processor of the NVIDIA Tesla K80 compared to high end 16-core Intel Xeon processor.
Computational performance of a smoothed particle hydrodynamics simulation for shared-memory parallel computing

NASA Astrophysics Data System (ADS)

Nishiura, Daisuke; Furuichi, Mikito; Sakaguchi, Hide

2015-09-01

The computational performance of a smoothed particle hydrodynamics (SPH) simulation is investigated for three types of current shared-memory parallel computer devices: many integrated core (MIC) processors, graphics processing units (GPUs), and multi-core CPUs. We are especially interested in efficient shared-memory allocation methods for each chipset, because the efficient data access patterns differ between compute unified device architecture (CUDA) programming for GPUs and OpenMP programming for MIC processors and multi-core CPUs. We first introduce several parallel implementation techniques for the SPH code, and then examine these on our target computer architectures to determine the most effective algorithms for each processor unit. In addition, we evaluate the effective computing performance and power efficiency of the SPH simulation on each architecture, as these are critical metrics for overall performance in a multi-device environment. In our benchmark test, the GPU is found to produce the best arithmetic performance as a standalone device unit, and gives the most efficient power consumption. The multi-core CPU obtains the most effective computing performance. The computational speed of the MIC processor on Xeon Phi approached that of two Xeon CPUs. This indicates that using MICs is an attractive choice for existing SPH codes on multi-core CPUs parallelized by OpenMP, as it gains computational acceleration without the need for significant changes to the source code.
Fast parallel algorithm for slicing STL based on pipeline

NASA Astrophysics Data System (ADS)

Ma, Xulong; Lin, Feng; Yao, Bo

2016-05-01

In Additive Manufacturing field, the current researches of data processing mainly focus on a slicing process of large STL files or complicated CAD models. To improve the efficiency and reduce the slicing time, a parallel algorithm has great advantages. However, traditional algorithms can't make full use of multi-core CPU hardware resources. In the paper, a fast parallel algorithm is presented to speed up data processing. A pipeline mode is adopted to design the parallel algorithm. And the complexity of the pipeline algorithm is analyzed theoretically. To evaluate the performance of the new algorithm, effects of threads number and layers number are investigated by a serial of experiments. The experimental results show that the threads number and layers number are two remarkable factors to the speedup ratio. The tendency of speedup versus threads number reveals a positive relationship which greatly agrees with the Amdahl's law, and the tendency of speedup versus layers number also keeps a positive relationship agreeing with Gustafson's law. The new algorithm uses topological information to compute contours with a parallel method of speedup. Another parallel algorithm based on data parallel is used in experiments to show that pipeline parallel mode is more efficient. A case study at last shows a suspending performance of the new parallel algorithm. Compared with the serial slicing algorithm, the new pipeline parallel algorithm can make full use of the multi-core CPU hardware, accelerate the slicing process, and compared with the data parallel slicing algorithm, the new slicing algorithm in this paper adopts a pipeline parallel model, and a much higher speedup ratio and efficiency is achieved.
Rapid Calculation of Max-Min Fair Rates for Multi-Commodity Flows in Fat-Tree Networks

DOE PAGES

Mollah, Md Atiqul; Yuan, Xin; Pakin, Scott; ...

2017-08-29

Max-min fairness is often used in the performance modeling of interconnection networks. Existing methods to compute max-min fair rates for multi-commodity flows have high complexity and are computationally infeasible for large networks. In this paper, we show that by considering topological features, this problem can be solved efficiently for the fat-tree topology that is widely used in data centers and high performance compute clusters. Several efficient new algorithms are developed for this problem, including a parallel algorithm that can take advantage of multi-core and shared-memory architectures. Using these algorithms, we demonstrate that it is possible to find the max-min fairmore » rate allocation for multi-commodity flows in fat-tree networks that support tens of thousands of nodes. We evaluate the run-time performance of the proposed algorithms and show improvement in orders of magnitude over the previously best known method. Finally, we further demonstrate a new application of max-min fair rate allocation that is only computationally feasible using our new algorithms.« less
Rapid Calculation of Max-Min Fair Rates for Multi-Commodity Flows in Fat-Tree Networks

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mollah, Md Atiqul; Yuan, Xin; Pakin, Scott

Max-min fairness is often used in the performance modeling of interconnection networks. Existing methods to compute max-min fair rates for multi-commodity flows have high complexity and are computationally infeasible for large networks. In this paper, we show that by considering topological features, this problem can be solved efficiently for the fat-tree topology that is widely used in data centers and high performance compute clusters. Several efficient new algorithms are developed for this problem, including a parallel algorithm that can take advantage of multi-core and shared-memory architectures. Using these algorithms, we demonstrate that it is possible to find the max-min fairmore » rate allocation for multi-commodity flows in fat-tree networks that support tens of thousands of nodes. We evaluate the run-time performance of the proposed algorithms and show improvement in orders of magnitude over the previously best known method. Finally, we further demonstrate a new application of max-min fair rate allocation that is only computationally feasible using our new algorithms.« less
Optimization of the coherence function estimation for multi-core central processing unit

NASA Astrophysics Data System (ADS)

Cheremnov, A. G.; Faerman, V. A.; Avramchuk, V. S.

2017-02-01

The paper considers use of parallel processing on multi-core central processing unit for optimization of the coherence function evaluation arising in digital signal processing. Coherence function along with other methods of spectral analysis is commonly used for vibration diagnosis of rotating machinery and its particular nodes. An algorithm is given for the function evaluation for signals represented with digital samples. The algorithm is analyzed for its software implementation and computational problems. Optimization measures are described, including algorithmic, architecture and compiler optimization, their results are assessed for multi-core processors from different manufacturers. Thus, speeding-up of the parallel execution with respect to sequential execution was studied and results are presented for Intel Core i7-4720HQ и AMD FX-9590 processors. The results show comparatively high efficiency of the optimization measures taken. In particular, acceleration indicators and average CPU utilization have been significantly improved, showing high degree of parallelism of the constructed calculating functions. The developed software underwent state registration and will be used as a part of a software and hardware solution for rotating machinery fault diagnosis and pipeline leak location with acoustic correlation method.
Multi-Core Programming Design Patterns: Stream Processing Algorithms for Dynamic Scene Perceptions

DTIC Science & Technology

2014-05-01

processor developed by IBM and other companies , incorpo- rates the verb—POWER5— processor as the Power Processor Element (PPE), one of the early general...deliver an power efficient single-precision peak performance of more than 256 GFlops. Substantially more raw power became available later, when nVIDIA ...algorithms, including IBM’s Cell/B.E., GPUs from NVidia and AMD and many-core CPUs from Intel.27 The vast growth of digital video content has been a
Energy-aware Thread and Data Management in Heterogeneous Multi-core, Multi-memory Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Su, Chun-Yi

By 2004, microprocessor design focused on multicore scaling—increasing the number of cores per die in each generation—as the primary strategy for improving performance. These multicore processors typically equip multiple memory subsystems to improve data throughput. In addition, these systems employ heterogeneous processors such as GPUs and heterogeneous memories like non-volatile memory to improve performance, capacity, and energy efficiency. With the increasing volume of hardware resources and system complexity caused by heterogeneity, future systems will require intelligent ways to manage hardware resources. Early research to improve performance and energy efficiency on heterogeneous, multi-core, multi-memory systems focused on tuning a single primitivemore » or at best a few primitives in the systems. The key limitation of past efforts is their lack of a holistic approach to resource management that balances the tradeoff between performance and energy consumption. In addition, the shift from simple, homogeneous systems to these heterogeneous, multicore, multi-memory systems requires in-depth understanding of efficient resource management for scalable execution, including new models that capture the interchange between performance and energy, smarter resource management strategies, and novel low-level performance/energy tuning primitives and runtime systems. Tuning an application to control available resources efficiently has become a daunting challenge; managing resources in automation is still a dark art since the tradeoffs among programming, energy, and performance remain insufficiently understood. In this dissertation, I have developed theories, models, and resource management techniques to enable energy-efficient execution of parallel applications through thread and data management in these heterogeneous multi-core, multi-memory systems. I study the effect of dynamic concurrent throttling on the performance and energy of multi-core, non-uniform memory access (NUMA) systems. I use critical path analysis to quantify memory contention in the NUMA memory system and determine thread mappings. In addition, I implement a runtime system that combines concurrent throttling and a novel thread mapping algorithm to manage thread resources and improve energy efficient execution in multi-core, NUMA systems.« less
Highly efficient spatial data filtering in parallel using the opensource library CPPPO

NASA Astrophysics Data System (ADS)

Municchi, Federico; Goniva, Christoph; Radl, Stefan

2016-10-01

CPPPO is a compilation of parallel data processing routines developed with the aim to create a library for "scale bridging" (i.e. connecting different scales by mean of closure models) in a multi-scale approach. CPPPO features a number of parallel filtering algorithms designed for use with structured and unstructured Eulerian meshes, as well as Lagrangian data sets. In addition, data can be processed on the fly, allowing the collection of relevant statistics without saving individual snapshots of the simulation state. Our library is provided with an interface to the widely-used CFD solver OpenFOAM®, and can be easily connected to any other software package via interface modules. Also, we introduce a novel, extremely efficient approach to parallel data filtering, and show that our algorithms scale super-linearly on multi-core clusters. Furthermore, we provide a guideline for choosing the optimal Eulerian cell selection algorithm depending on the number of CPU cores used. Finally, we demonstrate the accuracy and the parallel scalability of CPPPO in a showcase focusing on heat and mass transfer from a dense bed of particles.
Interaction sorting method for molecular dynamics on multi-core SIMD CPU architecture.

PubMed

Matvienko, Sergey; Alemasov, Nikolay; Fomin, Eduard

2015-02-01

Molecular dynamics (MD) is widely used in computational biology for studying binding mechanisms of molecules, molecular transport, conformational transitions, protein folding, etc. The method is computationally expensive; thus, the demand for the development of novel, much more efficient algorithms is still high. Therefore, the new algorithm designed in 2007 and called interaction sorting (IS) clearly attracted interest, as it outperformed the most efficient MD algorithms. In this work, a new IS modification is proposed which allows the algorithm to utilize SIMD processor instructions. This paper shows that the improvement provides an additional gain in performance, 9% to 45% in comparison to the original IS method.
CMS Readiness for Multi-Core Workload Scheduling

DOE Office of Scientific and Technical Information (OSTI.GOV)

Perez-Calero Yzquierdo, A.; Balcas, J.; Hernandez, J.

In the present run of the LHC, CMS data reconstruction and simulation algorithms benefit greatly from being executed as multiple threads running on several processor cores. The complexity of the Run 2 events requires parallelization of the code to reduce the memory-per- core footprint constraining serial execution programs, thus optimizing the exploitation of present multi-core processor architectures. The allocation of computing resources for multi-core tasks, however, becomes a complex problem in itself. The CMS workload submission infrastructure employs multi-slot partitionable pilots, built on HTCondor and GlideinWMS native features, to enable scheduling of single and multi-core jobs simultaneously. This provides amore » solution for the scheduling problem in a uniform way across grid sites running a diversity of gateways to compute resources and batch system technologies. This paper presents this strategy and the tools on which it has been implemented. The experience of managing multi-core resources at the Tier-0 and Tier-1 sites during 2015, along with the deployment phase to Tier-2 sites during early 2016 is reported. The process of performance monitoring and optimization to achieve efficient and flexible use of the resources is also described.« less
CMS readiness for multi-core workload scheduling

NASA Astrophysics Data System (ADS)

Perez-Calero Yzquierdo, A.; Balcas, J.; Hernandez, J.; Aftab Khan, F.; Letts, J.; Mason, D.; Verguilov, V.

2017-10-01

In the present run of the LHC, CMS data reconstruction and simulation algorithms benefit greatly from being executed as multiple threads running on several processor cores. The complexity of the Run 2 events requires parallelization of the code to reduce the memory-per- core footprint constraining serial execution programs, thus optimizing the exploitation of present multi-core processor architectures. The allocation of computing resources for multi-core tasks, however, becomes a complex problem in itself. The CMS workload submission infrastructure employs multi-slot partitionable pilots, built on HTCondor and GlideinWMS native features, to enable scheduling of single and multi-core jobs simultaneously. This provides a solution for the scheduling problem in a uniform way across grid sites running a diversity of gateways to compute resources and batch system technologies. This paper presents this strategy and the tools on which it has been implemented. The experience of managing multi-core resources at the Tier-0 and Tier-1 sites during 2015, along with the deployment phase to Tier-2 sites during early 2016 is reported. The process of performance monitoring and optimization to achieve efficient and flexible use of the resources is also described.
Hybrid Parallelism for Volume Rendering on Large-, Multi-, and Many-Core Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Howison, Mark; Bethel, E. Wes; Childs, Hank

2012-01-01

With the computing industry trending towards multi- and many-core processors, we study how a standard visualization algorithm, ray-casting volume rendering, can benefit from a hybrid parallelism approach. Hybrid parallelism provides the best of both worlds: using distributed-memory parallelism across a large numbers of nodes increases available FLOPs and memory, while exploiting shared-memory parallelism among the cores within each node ensures that each node performs its portion of the larger calculation as efficiently as possible. We demonstrate results from weak and strong scaling studies, at levels of concurrency ranging up to 216,000, and with datasets as large as 12.2 trillion cells.more » The greatest benefit from hybrid parallelism lies in the communication portion of the algorithm, the dominant cost at higher levels of concurrency. We show that reducing the number of participants with a hybrid approach significantly improves performance.« less
Scalable Triadic Analysis of Large-Scale Graphs: Multi-Core vs. Multi-Processor vs. Multi-Threaded Shared Memory Architectures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chin, George; Marquez, Andres; Choudhury, Sutanay

2012-09-01

Triadic analysis encompasses a useful set of graph mining methods that is centered on the concept of a triad, which is a subgraph of three nodes and the configuration of directed edges across the nodes. Such methods are often applied in the social sciences as well as many other diverse fields. Triadic methods commonly operate on a triad census that counts the number of triads of every possible edge configuration in a graph. Like other graph algorithms, triadic census algorithms do not scale well when graphs reach tens of millions to billions of nodes. To enable the triadic analysis ofmore » large-scale graphs, we developed and optimized a triad census algorithm to efficiently execute on shared memory architectures. We will retrace the development and evolution of a parallel triad census algorithm. Over the course of several versions, we continually adapted the code’s data structures and program logic to expose more opportunities to exploit parallelism on shared memory that would translate into improved computational performance. We will recall the critical steps and modifications that occurred during code development and optimization. Furthermore, we will compare the performances of triad census algorithm versions on three specific systems: Cray XMT, HP Superdome, and AMD multi-core NUMA machine. These three systems have shared memory architectures but with markedly different hardware capabilities to manage parallelism.« less
A privacy-preserving parallel and homomorphic encryption scheme

NASA Astrophysics Data System (ADS)

Min, Zhaoe; Yang, Geng; Shi, Jingqi

2017-04-01

In order to protect data privacy whilst allowing efficient access to data in multi-nodes cloud environments, a parallel homomorphic encryption (PHE) scheme is proposed based on the additive homomorphism of the Paillier encryption algorithm. In this paper we propose a PHE algorithm, in which plaintext is divided into several blocks and blocks are encrypted with a parallel mode. Experiment results demonstrate that the encryption algorithm can reach a speed-up ratio at about 7.1 in the MapReduce environment with 16 cores and 4 nodes.
Scalable Methods for Uncertainty Quantification, Data Assimilation and Target Accuracy Assessment for Multi-Physics Advanced Simulation of Light Water Reactors

NASA Astrophysics Data System (ADS)

Khuwaileh, Bassam

High fidelity simulation of nuclear reactors entails large scale applications characterized with high dimensionality and tremendous complexity where various physics models are integrated in the form of coupled models (e.g. neutronic with thermal-hydraulic feedback). Each of the coupled modules represents a high fidelity formulation of the first principles governing the physics of interest. Therefore, new developments in high fidelity multi-physics simulation and the corresponding sensitivity/uncertainty quantification analysis are paramount to the development and competitiveness of reactors achieved through enhanced understanding of the design and safety margins. Accordingly, this dissertation introduces efficient and scalable algorithms for performing efficient Uncertainty Quantification (UQ), Data Assimilation (DA) and Target Accuracy Assessment (TAA) for large scale, multi-physics reactor design and safety problems. This dissertation builds upon previous efforts for adaptive core simulation and reduced order modeling algorithms and extends these efforts towards coupled multi-physics models with feedback. The core idea is to recast the reactor physics analysis in terms of reduced order models. This can be achieved via identifying the important/influential degrees of freedom (DoF) via the subspace analysis, such that the required analysis can be recast by considering the important DoF only. In this dissertation, efficient algorithms for lower dimensional subspace construction have been developed for single physics and multi-physics applications with feedback. Then the reduced subspace is used to solve realistic, large scale forward (UQ) and inverse problems (DA and TAA). Once the elite set of DoF is determined, the uncertainty/sensitivity/target accuracy assessment and data assimilation analysis can be performed accurately and efficiently for large scale, high dimensional multi-physics nuclear engineering applications. Hence, in this work a Karhunen-Loeve (KL) based algorithm previously developed to quantify the uncertainty for single physics models is extended for large scale multi-physics coupled problems with feedback effect. Moreover, a non-linear surrogate based UQ approach is developed, used and compared to performance of the KL approach and brute force Monte Carlo (MC) approach. On the other hand, an efficient Data Assimilation (DA) algorithm is developed to assess information about model's parameters: nuclear data cross-sections and thermal-hydraulics parameters. Two improvements are introduced in order to perform DA on the high dimensional problems. First, a goal-oriented surrogate model can be used to replace the original models in the depletion sequence (MPACT -- COBRA-TF - ORIGEN). Second, approximating the complex and high dimensional solution space with a lower dimensional subspace makes the sampling process necessary for DA possible for high dimensional problems. Moreover, safety analysis and design optimization depend on the accurate prediction of various reactor attributes. Predictions can be enhanced by reducing the uncertainty associated with the attributes of interest. Accordingly, an inverse problem can be defined and solved to assess the contributions from sources of uncertainty; and experimental effort can be subsequently directed to further improve the uncertainty associated with these sources. In this dissertation a subspace-based gradient-free and nonlinear algorithm for inverse uncertainty quantification namely the Target Accuracy Assessment (TAA) has been developed and tested. The ideas proposed in this dissertation were first validated using lattice physics applications simulated using SCALE6.1 package (Pressurized Water Reactor (PWR) and Boiling Water Reactor (BWR) lattice models). Ultimately, the algorithms proposed her were applied to perform UQ and DA for assembly level (CASL progression problem number 6) and core wide problems representing Watts Bar Nuclear 1 (WBN1) for cycle 1 of depletion (CASL Progression Problem Number 9) modeled via simulated using VERA-CS which consists of several multi-physics coupled models. The analysis and algorithms developed in this dissertation were encoded and implemented in a newly developed tool kit algorithms for Reduced Order Modeling based Uncertainty/Sensitivity Estimator (ROMUSE).
Fast data reconstructed method of Fourier transform imaging spectrometer based on multi-core CPU

NASA Astrophysics Data System (ADS)

Yu, Chunchao; Du, Debiao; Xia, Zongze; Song, Li; Zheng, Weijian; Yan, Min; Lei, Zhenggang

2017-10-01

Imaging spectrometer can gain two-dimensional space image and one-dimensional spectrum at the same time, which shows high utility in color and spectral measurements, the true color image synthesis, military reconnaissance and so on. In order to realize the fast reconstructed processing of the Fourier transform imaging spectrometer data, the paper designed the optimization reconstructed algorithm with OpenMP parallel calculating technology, which was further used for the optimization process for the HyperSpectral Imager of `HJ-1' Chinese satellite. The results show that the method based on multi-core parallel computing technology can control the multi-core CPU hardware resources competently and significantly enhance the calculation of the spectrum reconstruction processing efficiency. If the technology is applied to more cores workstation in parallel computing, it will be possible to complete Fourier transform imaging spectrometer real-time data processing with a single computer.
Efficient Scalable Median Filtering Using Histogram-Based Operations.

PubMed

Green, Oded

2018-05-01

Median filtering is a smoothing technique for noise removal in images. While there are various implementations of median filtering for a single-core CPU, there are few implementations for accelerators and multi-core systems. Many parallel implementations of median filtering use a sorting algorithm for rearranging the values within a filtering window and taking the median of the sorted value. While using sorting algorithms allows for simple parallel implementations, the cost of the sorting becomes prohibitive as the filtering windows grow. This makes such algorithms, sequential and parallel alike, inefficient. In this work, we introduce the first software parallel median filtering that is non-sorting-based. The new algorithm uses efficient histogram-based operations. These reduce the computational requirements of the new algorithm while also accessing the image fewer times. We show an implementation of our algorithm for both the CPU and NVIDIA's CUDA supported graphics processing unit (GPU). The new algorithm is compared with several other leading CPU and GPU implementations. The CPU implementation has near perfect linear scaling with a speedup on a quad-core system. The GPU implementation is several orders of magnitude faster than the other GPU implementations for mid-size median filters. For small kernels, and , comparison-based approaches are preferable as fewer operations are required. Lastly, the new algorithm is open-source and can be found in the OpenCV library.

2 × 2 MIMO OFDM/OQAM radio signals over an elliptical core few-mode fiber.

PubMed

Mo, Qi; He, Jiale; Yu, Dawei; Deng, Lei; Fu, Songnian; Tang, Ming; Liu, Deming

2016-10-01

We experimentally demonstrate a 4.46 Gb/s2×2 multi-input multi-output (MIMO) orthogonal frequency division multiplexing (OFDM)/OQAM radio signal over a 2 km elliptical core 3-mode fiber, together with 0.4 m wireless transmission. Meanwhile, to cope with differential channel delay (DCD) among involved MIMO channels, we propose a time-offset crosstalk cancellation algorithm to extend the DCD tolerance from 10 to 60 ns without using a circle prefix (CP), leading to an 18.7% improvement of spectral efficiency. For the purpose of comparison, we also examine the transmission performance of CP-OFDM signals with different lengths of CPs, under the same system configuration. The proposed algorithm is also effective for the DCD compensation of a radio signal over a 2 km 7-core fiber. These results not only demonstrate the feasibility of space division multiplexing for RoF application but also validate that the elliptical core few-mode fiber can provide the same independent channels as the multicore fiber.
An Efficient VLSI Architecture for Multi-Channel Spike Sorting Using a Generalized Hebbian Algorithm

PubMed Central

Chen, Ying-Lun; Hwang, Wen-Jyi; Ke, Chi-En

2015-01-01

A novel VLSI architecture for multi-channel online spike sorting is presented in this paper. In the architecture, the spike detection is based on nonlinear energy operator (NEO), and the feature extraction is carried out by the generalized Hebbian algorithm (GHA). To lower the power consumption and area costs of the circuits, all of the channels share the same core for spike detection and feature extraction operations. Each channel has dedicated buffers for storing the detected spikes and the principal components of that channel. The proposed circuit also contains a clock gating system supplying the clock to only the buffers of channels currently using the computation core to further reduce the power consumption. The architecture has been implemented by an application-specific integrated circuit (ASIC) with 90-nm technology. Comparisons to the existing works show that the proposed architecture has lower power consumption and hardware area costs for real-time multi-channel spike detection and feature extraction. PMID:26287193
An Efficient VLSI Architecture for Multi-Channel Spike Sorting Using a Generalized Hebbian Algorithm.

PubMed

Chen, Ying-Lun; Hwang, Wen-Jyi; Ke, Chi-En

2015-08-13

A novel VLSI architecture for multi-channel online spike sorting is presented in this paper. In the architecture, the spike detection is based on nonlinear energy operator (NEO), and the feature extraction is carried out by the generalized Hebbian algorithm (GHA). To lower the power consumption and area costs of the circuits, all of the channels share the same core for spike detection and feature extraction operations. Each channel has dedicated buffers for storing the detected spikes and the principal components of that channel. The proposed circuit also contains a clock gating system supplying the clock to only the buffers of channels currently using the computation core to further reduce the power consumption. The architecture has been implemented by an application-specific integrated circuit (ASIC) with 90-nm technology. Comparisons to the existing works show that the proposed architecture has lower power consumption and hardware area costs for real-time multi-channel spike detection and feature extraction.
Options for Parallelizing a Planning and Scheduling Algorithm

NASA Technical Reports Server (NTRS)

Clement, Bradley J.; Estlin, Tara A.; Bornstein, Benjamin D.

2011-01-01

Space missions have a growing interest in putting multi-core processors onboard spacecraft. For many missions processing power significantly slows operations. We investigate how continual planning and scheduling algorithms can exploit multi-core processing and outline different potential design decisions for a parallelized planning architecture. This organization of choices and challenges helps us with an initial design for parallelizing the CASPER planning system for a mesh multi-core processor. This work extends that presented at another workshop with some preliminary results.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Luszczek, Piotr R; Tomov, Stanimire Z; Dongarra, Jack J

We present an efficient and scalable programming model for the development of linear algebra in heterogeneous multi-coprocessor environments. The model incorporates some of the current best design and implementation practices for the heterogeneous acceleration of dense linear algebra (DLA). Examples are given as the basis for solving linear systems' algorithms - the LU, QR, and Cholesky factorizations. To generate the extreme level of parallelism needed for the efficient use of coprocessors, algorithms of interest are redesigned and then split into well-chosen computational tasks. The tasks execution is scheduled over the computational components of a hybrid system of multi-core CPUs andmore » coprocessors using a light-weight runtime system. The use of lightweight runtime systems keeps scheduling overhead low, while enabling the expression of parallelism through otherwise sequential code. This simplifies the development efforts and allows the exploration of the unique strengths of the various hardware components.« less
Efficiency of static core turn-off in a system-on-a-chip with variation

DOEpatents

Cher, Chen-Yong; Coteus, Paul W; Gara, Alan; Kursun, Eren; Paulsen, David P; Schuelke, Brian A; Sheets, II, John E; Tian, Shurong

2013-10-29

A processor-implemented method for improving efficiency of a static core turn-off in a multi-core processor with variation, the method comprising: conducting via a simulation a turn-off analysis of the multi-core processor at the multi-core processor's design stage, wherein the turn-off analysis of the multi-core processor at the multi-core processor's design stage includes a first output corresponding to a first multi-core processor core to turn off; conducting a turn-off analysis of the multi-core processor at the multi-core processor's testing stage, wherein the turn-off analysis of the multi-core processor at the multi-core processor's testing stage includes a second output corresponding to a second multi-core processor core to turn off; comparing the first output and the second output to determine if the first output is referring to the same core to turn off as the second output; outputting a third output corresponding to the first multi-core processor core if the first output and the second output are both referring to the same core to turn off.
Multicore and GPU algorithms for Nussinov RNA folding

PubMed Central

2014-01-01

Background One segment of a RNA sequence might be paired with another segment of the same RNA sequence due to the force of hydrogen bonds. This two-dimensional structure is called the RNA sequence's secondary structure. Several algorithms have been proposed to predict an RNA sequence's secondary structure. These algorithms are referred to as RNA folding algorithms. Results We develop cache efficient, multicore, and GPU algorithms for RNA folding using Nussinov's algorithm. Conclusions Our cache efficient algorithm provides a speedup between 1.6 and 3.0 relative to a naive straightforward single core code. The multicore version of the cache efficient single core algorithm provides a speedup, relative to the naive single core algorithm, between 7.5 and 14.0 on a 6 core hyperthreaded CPU. Our GPU algorithm for the NVIDIA C2050 is up to 1582 times as fast as the naive single core algorithm and between 5.1 and 11.2 times as fast as the fastest previously known GPU algorithm for Nussinov RNA folding. PMID:25082539
Multipoint to multipoint routing and wavelength assignment in multi-domain optical networks

NASA Astrophysics Data System (ADS)

Qin, Panke; Wu, Jingru; Li, Xudong; Tang, Yongli

2018-01-01

In multi-point to multi-point (MP2MP) routing and wavelength assignment (RWA) problems, researchers usually assume the optical networks to be a single domain. However, the optical networks develop toward to multi-domain and larger scale in practice. In this context, multi-core shared tree (MST)-based MP2MP RWA are introduced problems including optimal multicast domain sequence selection, core nodes belonging in which domains and so on. In this letter, we focus on MST-based MP2MP RWA problems in multi-domain optical networks, mixed integer linear programming (MILP) formulations to optimally construct MP2MP multicast trees is presented. A heuristic algorithm base on network virtualization and weighted clustering algorithm (NV-WCA) is proposed. Simulation results show that, under different traffic patterns, the proposed algorithm achieves significant improvement on network resources occupation and multicast trees setup latency in contrast with the conventional algorithms which were proposed base on a single domain network environment.
Parallel transformation of K-SVD solar image denoising algorithm

NASA Astrophysics Data System (ADS)

Liang, Youwen; Tian, Yu; Li, Mei

2017-02-01

The images obtained by observing the sun through a large telescope always suffered with noise due to the low SNR. K-SVD denoising algorithm can effectively remove Gauss white noise. Training dictionaries for sparse representations is a time consuming task, due to the large size of the data involved and to the complexity of the training algorithms. In this paper, an OpenMP parallel programming language is proposed to transform the serial algorithm to the parallel version. Data parallelism model is used to transform the algorithm. Not one atom but multiple atoms updated simultaneously is the biggest change. The denoising effect and acceleration performance are tested after completion of the parallel algorithm. Speedup of the program is 13.563 in condition of using 16 cores. This parallel version can fully utilize the multi-core CPU hardware resources, greatly reduce running time and easily to transplant in multi-core platform.
Incremental k-core decomposition: Algorithms and evaluation

DOE PAGES

Sariyuce, Ahmet Erdem; Gedik, Bugra; Jacques-SIlva, Gabriela; ...

2016-02-01

A k-core of a graph is a maximal connected subgraph in which every vertex is connected to at least k vertices in the subgraph. k-core decomposition is often used in large-scale network analysis, such as community detection, protein function prediction, visualization, and solving NP-hard problems on real networks efficiently, like maximal clique finding. In many real-world applications, networks change over time. As a result, it is essential to develop efficient incremental algorithms for dynamic graph data. In this paper, we propose a suite of incremental k-core decomposition algorithms for dynamic graph data. These algorithms locate a small subgraph that ismore » guaranteed to contain the list of vertices whose maximum k-core values have changed and efficiently process this subgraph to update the k-core decomposition. We present incremental algorithms for both insertion and deletion operations, and propose auxiliary vertex state maintenance techniques that can further accelerate these operations. Our results show a significant reduction in runtime compared to non-incremental alternatives. We illustrate the efficiency of our algorithms on different types of real and synthetic graphs, at varying scales. Furthermore, for a graph of 16 million vertices, we observe relative throughputs reaching a million times, relative to the non-incremental algorithms.« less
New algorithm for tensor contractions on multi-core CPUs, GPUs, and accelerators enables CCSD and EOM-CCSD calculations with over 1000 basis functions on a single compute node.

PubMed

Kaliman, Ilya A; Krylov, Anna I

2017-04-30

A new hardware-agnostic contraction algorithm for tensors of arbitrary symmetry and sparsity is presented. The algorithm is implemented as a stand-alone open-source code libxm. This code is also integrated with general tensor library libtensor and with the Q-Chem quantum-chemistry package. An overview of the algorithm, its implementation, and benchmarks are presented. Similarly to other tensor software, the algorithm exploits efficient matrix multiplication libraries and assumes that tensors are stored in a block-tensor form. The distinguishing features of the algorithm are: (i) efficient repackaging of the individual blocks into large matrices and back, which affords efficient graphics processing unit (GPU)-enabled calculations without modifications of higher-level codes; (ii) fully asynchronous data transfer between disk storage and fast memory. The algorithm enables canonical all-electron coupled-cluster and equation-of-motion coupled-cluster calculations with single and double substitutions (CCSD and EOM-CCSD) with over 1000 basis functions on a single quad-GPU machine. We show that the algorithm exhibits predicted theoretical scaling for canonical CCSD calculations, O(N 6 ), irrespective of the data size on disk. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
Directly data processing algorithm for multi-wavelength pyrometer (MWP).

PubMed

Xing, Jian; Peng, Bo; Ma, Zhao; Guo, Xin; Dai, Li; Gu, Weihong; Song, Wenlong

2017-11-27

Data processing of multi-wavelength pyrometer (MWP) is a difficult problem because unknown emissivity. So far some solutions developed generally assumed particular mathematical relations for emissivity versus wavelength or emissivity versus temperature. Due to the deviation between the hypothesis and actual situation, the inversion results can be seriously affected. So directly data processing algorithm of MWP that does not need to assume the spectral emissivity model in advance is main aim of the study. Two new data processing algorithms of MWP, Gradient Projection (GP) algorithm and Internal Penalty Function (IPF) algorithm, each of which does not require to fix emissivity model in advance, are proposed. The novelty core idea is that data processing problem of MWP is transformed into constraint optimization problem, then it can be solved by GP or IPF algorithms. By comparison of simulation results for some typical spectral emissivity models, it is found that IPF algorithm is superior to GP algorithm in terms of accuracy and efficiency. Rocket nozzle temperature experiment results show that true temperature inversion results from IPF algorithm agree well with the theoretical design temperature as well. So the proposed combination IPF algorithm with MWP is expected to be a directly data processing algorithm to clear up the unknown emissivity obstacle for MWP.
Parallel and Preemptable Dynamically Dimensioned Search Algorithms for Single and Multi-objective Optimization in Water Resources

NASA Astrophysics Data System (ADS)

Tolson, B.; Matott, L. S.; Gaffoor, T. A.; Asadzadeh, M.; Shafii, M.; Pomorski, P.; Xu, X.; Jahanpour, M.; Razavi, S.; Haghnegahdar, A.; Craig, J. R.

2015-12-01

We introduce asynchronous parallel implementations of the Dynamically Dimensioned Search (DDS) family of algorithms including DDS, discrete DDS, PA-DDS and DDS-AU. These parallel algorithms are unique from most existing parallel optimization algorithms in the water resources field in that parallel DDS is asynchronous and does not require an entire population (set of candidate solutions) to be evaluated before generating and then sending a new candidate solution for evaluation. One key advance in this study is developing the first parallel PA-DDS multi-objective optimization algorithm. The other key advance is enhancing the computational efficiency of solving optimization problems (such as model calibration) by combining a parallel optimization algorithm with the deterministic model pre-emption concept. These two efficiency techniques can only be combined because of the asynchronous nature of parallel DDS. Model pre-emption functions to terminate simulation model runs early, prior to completely simulating the model calibration period for example, when intermediate results indicate the candidate solution is so poor that it will definitely have no influence on the generation of further candidate solutions. The computational savings of deterministic model preemption available in serial implementations of population-based algorithms (e.g., PSO) disappear in synchronous parallel implementations as these algorithms. In addition to the key advances above, we implement the algorithms across a range of computation platforms (Windows and Unix-based operating systems from multi-core desktops to a supercomputer system) and package these for future modellers within a model-independent calibration software package called Ostrich as well as MATLAB versions. Results across multiple platforms and multiple case studies (from 4 to 64 processors) demonstrate the vast improvement over serial DDS-based algorithms and highlight the important role model pre-emption plays in the performance of parallel, pre-emptable DDS algorithms. Case studies include single- and multiple-objective optimization problems in water resources model calibration and in many cases linear or near linear speedups are observed.
Efficiently Scheduling Multi-core Guest Virtual Machines on Multi-core Hosts in Network Simulation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yoginath, Srikanth B; Perumalla, Kalyan S

2011-01-01

Virtual machine (VM)-based simulation is a method used by network simulators to incorporate realistic application behaviors by executing actual VMs as high-fidelity surrogates for simulated end-hosts. A critical requirement in such a method is the simulation time-ordered scheduling and execution of the VMs. Prior approaches such as time dilation are less efficient due to the high degree of multiplexing possible when multiple multi-core VMs are simulated on multi-core host systems. We present a new simulation time-ordered scheduler to efficiently schedule multi-core VMs on multi-core real hosts, with a virtual clock realized on each virtual core. The distinguishing features of ourmore » approach are: (1) customizable granularity of the VM scheduling time unit on the simulation time axis, (2) ability to take arbitrary leaps in virtual time by VMs to maximize the utilization of host (real) cores when guest virtual cores idle, and (3) empirically determinable optimality in the tradeoff between total execution (real) time and time-ordering accuracy levels. Experiments show that it is possible to get nearly perfect time-ordered execution, with a slight cost in total run time, relative to optimized non-simulation VM schedulers. Interestingly, with our time-ordered scheduler, it is also possible to reduce the time-ordering error from over 50% of non-simulation scheduler to less than 1% realized by our scheduler, with almost the same run time efficiency as that of the highly efficient non-simulation VM schedulers.« less
Experiments with a Parallel Multi-Objective Evolutionary Algorithm for Scheduling

NASA Technical Reports Server (NTRS)

Brown, Matthew; Johnston, Mark D.

2013-01-01

Evolutionary multi-objective algorithms have great potential for scheduling in those situations where tradeoffs among competing objectives represent a key requirement. One challenge, however, is runtime performance, as a consequence of evolving not just a single schedule, but an entire population, while attempting to sample the Pareto frontier as accurately and uniformly as possible. The growing availability of multi-core processors in end user workstations, and even laptops, has raised the question of the extent to which such hardware can be used to speed up evolutionary algorithms. In this paper we report on early experiments in parallelizing a Generalized Differential Evolution (GDE) algorithm for scheduling long-range activities on NASA's Deep Space Network. Initial results show that significant speedups can be achieved, but that performance does not necessarily improve as more cores are utilized. We describe our preliminary results and some initial suggestions from parallelizing the GDE algorithm. Directions for future work are outlined.
Energy Efficient Real-Time Scheduling Using DPM on Mobile Sensors with a Uniform Multi-Cores

PubMed Central

Kim, Youngmin; Lee, Chan-Gun

2017-01-01

In wireless sensor networks (WSNs), sensor nodes are deployed for collecting and analyzing data. These nodes use limited energy batteries for easy deployment and low cost. The use of limited energy batteries is closely related to the lifetime of the sensor nodes when using wireless sensor networks. Efficient-energy management is important to extending the lifetime of the sensor nodes. Most effort for improving power efficiency in tiny sensor nodes has focused mainly on reducing the power consumed during data transmission. However, recent emergence of sensor nodes equipped with multi-cores strongly requires attention to be given to the problem of reducing power consumption in multi-cores. In this paper, we propose an energy efficient scheduling method for sensor nodes supporting a uniform multi-cores. We extend the proposed T-Ler plane based scheduling for global optimal scheduling of a uniform multi-cores and multi-processors to enable power management using dynamic power management. In the proposed approach, processor selection for a scheduling and mapping method between the tasks and processors is proposed to efficiently utilize dynamic power management. Experiments show the effectiveness of the proposed approach compared to other existing methods. PMID:29240695
Multi-threaded Sparse Matrix Sparse Matrix Multiplication for Many-Core and GPU Architectures.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Deveci, Mehmet; Trott, Christian Robert; Rajamanickam, Sivasankaran

Sparse Matrix-Matrix multiplication is a key kernel that has applications in several domains such as scientific computing and graph analysis. Several algorithms have been studied in the past for this foundational kernel. In this paper, we develop parallel algorithms for sparse matrix- matrix multiplication with a focus on performance portability across different high performance computing architectures. The performance of these algorithms depend on the data structures used in them. We compare different types of accumulators in these algorithms and demonstrate the performance difference between these data structures. Furthermore, we develop a meta-algorithm, kkSpGEMM, to choose the right algorithm and datamore » structure based on the characteristics of the problem. We show performance comparisons on three architectures and demonstrate the need for the community to develop two phase sparse matrix-matrix multiplication implementations for efficient reuse of the data structures involved.« less
Multi-threaded Sparse Matrix-Matrix Multiplication for Many-Core and GPU Architectures.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Deveci, Mehmet; Rajamanickam, Sivasankaran; Trott, Christian Robert

Sparse Matrix-Matrix multiplication is a key kernel that has applications in several domains such as scienti c computing and graph analysis. Several algorithms have been studied in the past for this foundational kernel. In this paper, we develop parallel algorithms for sparse matrix-matrix multiplication with a focus on performance portability across different high performance computing architectures. The performance of these algorithms depend on the data structures used in them. We compare different types of accumulators in these algorithms and demonstrate the performance difference between these data structures. Furthermore, we develop a meta-algorithm, kkSpGEMM, to choose the right algorithm and datamore » structure based on the characteristics of the problem. We show performance comparisons on three architectures and demonstrate the need for the community to develop two phase sparse matrix-matrix multiplication implementations for efficient reuse of the data structures involved.« less
Maximal clique enumeration with data-parallel primitives

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lessley, Brenton; Perciano, Talita; Mathai, Manish

The enumeration of all maximal cliques in an undirected graph is a fundamental problem arising in several research areas. We consider maximal clique enumeration on shared-memory, multi-core architectures and introduce an approach consisting entirely of data-parallel operations, in an effort to achieve efficient and portable performance across different architectures. We study the performance of the algorithm via experiments varying over benchmark graphs and architectures. Overall, we observe that our algorithm achieves up to a 33-time speedup and 9-time speedup over state-of-the-art distributed and serial algorithms, respectively, for graphs with higher ratios of maximal cliques to total cliques. Further, we attainmore » additional speedups on a GPU architecture, demonstrating the portable performance of our data-parallel design.« less
Efficient provisioning for multi-core applications with LSF

NASA Astrophysics Data System (ADS)

Dal Pra, Stefano

2015-12-01

Tier-1 sites providing computing power for HEP experiments are usually tightly designed for high throughput performances. This is pursued by reducing the variety of supported use cases and tuning for performances those ones, the most important of which have been that of singlecore jobs. Moreover, the usual workload is saturation: each available core in the farm is in use and there are queued jobs waiting for their turn to run. Enabling multi-core jobs thus requires dedicating a number of hosts where to run, and waiting for them to free the needed number of cores. This drain-time introduces a loss of computing power driven by the number of unusable empty cores. As an increasing demand for multi-core capable resources have emerged, a Task Force have been constituted in WLCG, with the goal to define a simple and efficient multi-core resource provisioning model. This paper details the work done at the INFN Tier-1 to enable multi-core support for the LSF batch system, with the intent of reducing to the minimum the average number of unused cores. The adopted strategy has been that of dedicating to multi-core a dynamic set of nodes, whose dimension is mainly driven by the number of pending multi-core requests and fair-share priority of the submitting user. The node status transition, from single to multi core et vice versa, is driven by a finite state machine which is implemented in a custom multi-core director script, running in the cluster. After describing and motivating both the implementation and the details specific to the LSF batch system, results about performance are reported. Factors having positive and negative impact on the overall efficiency are discussed and solutions to reduce at most the negative ones are proposed.

Progress Towards a Rad-Hydro Code for Modern Computing Architectures LA-UR-10-02825

NASA Astrophysics Data System (ADS)

Wohlbier, J. G.; Lowrie, R. B.; Bergen, B.; Calef, M.

2010-11-01

We are entering an era of high performance computing where data movement is the overwhelming bottleneck to scalable performance, as opposed to the speed of floating-point operations per processor. All multi-core hardware paradigms, whether heterogeneous or homogeneous, be it the Cell processor, GPGPU, or multi-core x86, share this common trait. In multi-physics applications such as inertial confinement fusion or astrophysics, one may be solving multi-material hydrodynamics with tabular equation of state data lookups, radiation transport, nuclear reactions, and charged particle transport in a single time cycle. The algorithms are intensely data dependent, e.g., EOS, opacity, nuclear data, and multi-core hardware memory restrictions are forcing code developers to rethink code and algorithm design. For the past two years LANL has been funding a small effort referred to as Multi-Physics on Multi-Core to explore ideas for code design as pertaining to inertial confinement fusion and astrophysics applications. The near term goals of this project are to have a multi-material radiation hydrodynamics capability, with tabular equation of state lookups, on cartesian and curvilinear block structured meshes. In the longer term we plan to add fully implicit multi-group radiation diffusion and material heat conduction, and block structured AMR. We will report on our progress to date.
Comparative performance between compressed and uncompressed airborne imagery

NASA Astrophysics Data System (ADS)

Phan, Chung; Rupp, Ronald; Agarwal, Sanjeev; Trang, Anh; Nair, Sumesh

2008-04-01

The US Army's RDECOM CERDEC Night Vision and Electronic Sensors Directorate (NVESD), Countermine Division is evaluating the compressibility of airborne multi-spectral imagery for mine and minefield detection application. Of particular interest is to assess the highest image data compression rate that can be afforded without the loss of image quality for war fighters in the loop and performance of near real time mine detection algorithm. The JPEG-2000 compression standard is used to perform data compression. Both lossless and lossy compressions are considered. A multi-spectral anomaly detector such as RX (Reed & Xiaoli), which is widely used as a core algorithm baseline in airborne mine and minefield detection on different mine types, minefields, and terrains to identify potential individual targets, is used to compare the mine detection performance. This paper presents the compression scheme and compares detection performance results between compressed and uncompressed imagery for various level of compressions. The compression efficiency is evaluated and its dependence upon different backgrounds and other factors are documented and presented using multi-spectral data.
Interactive high-resolution isosurface ray casting on multicore processors.

PubMed

Wang, Qin; JaJa, Joseph

2008-01-01

We present a new method for the interactive rendering of isosurfaces using ray casting on multi-core processors. This method consists of a combination of an object-order traversal that coarsely identifies possible candidate 3D data blocks for each small set of contiguous pixels, and an isosurface ray casting strategy tailored for the resulting limited-size lists of candidate 3D data blocks. While static screen partitioning is widely used in the literature, our scheme performs dynamic allocation of groups of ray casting tasks to ensure almost equal loads among the different threads running on multi-cores while maintaining spatial locality. We also make careful use of memory management environment commonly present in multi-core processors. We test our system on a two-processor Clovertown platform, each consisting of a Quad-Core 1.86-GHz Intel Xeon Processor, for a number of widely different benchmarks. The detailed experimental results show that our system is efficient and scalable, and achieves high cache performance and excellent load balancing, resulting in an overall performance that is superior to any of the previous algorithms. In fact, we achieve an interactive isosurface rendering on a 1024(2) screen for all the datasets tested up to the maximum size of the main memory of our platform.
The contour-buildup algorithm to calculate the analytical molecular surface.

PubMed

Totrov, M; Abagyan, R

1996-01-01

A new algorithm is presented to calculate the analytical molecular surface defined as a smooth envelope traced out by the surface of a probe sphere rolled over the molecule. The core of the algorithm is the sequential build up of multi-arc contours on the van der Waals spheres. This algorithm yields substantial reduction in both memory and time requirements of surface calculations. Further, the contour-buildup principle is intrinsically "local", which makes calculations of the partial molecular surfaces even more efficient. Additionally, the algorithm is equally applicable not only to convex patches, but also to concave triangular patches which may have complex multiple intersections. The algorithm permits the rigorous calculation of the full analytical molecular surface for a 100-residue protein in about 2 seconds on an SGI indigo with R4400++ processor at 150 Mhz, with the performance scaling almost linearly with the protein size. The contour-buildup algorithm is faster than the original Connolly algorithm an order of magnitude.
Fast multi-core based multimodal registration of 2D cross-sections and 3D datasets.

PubMed

Scharfe, Michael; Pielot, Rainer; Schreiber, Falk

2010-01-11

Solving bioinformatics tasks often requires extensive computational power. Recent trends in processor architecture combine multiple cores into a single chip to improve overall performance. The Cell Broadband Engine (CBE), a heterogeneous multi-core processor, provides power-efficient and cost-effective high-performance computing. One application area is image analysis and visualisation, in particular registration of 2D cross-sections into 3D image datasets. Such techniques can be used to put different image modalities into spatial correspondence, for example, 2D images of histological cuts into morphological 3D frameworks. We evaluate the CBE-driven PlayStation 3 as a high performance, cost-effective computing platform by adapting a multimodal alignment procedure to several characteristic hardware properties. The optimisations are based on partitioning, vectorisation, branch reducing and loop unrolling techniques with special attention to 32-bit multiplies and limited local storage on the computing units. We show how a typical image analysis and visualisation problem, the multimodal registration of 2D cross-sections and 3D datasets, benefits from the multi-core based implementation of the alignment algorithm. We discuss several CBE-based optimisation methods and compare our results to standard solutions. More information and the source code are available from http://cbe.ipk-gatersleben.de. The results demonstrate that the CBE processor in a PlayStation 3 accelerates computational intensive multimodal registration, which is of great importance in biological/medical image processing. The PlayStation 3 as a low cost CBE-based platform offers an efficient option to conventional hardware to solve computational problems in image processing and bioinformatics.
Performance analysis of distributed symmetric sparse matrix vector multiplication algorithm for multi-core architectures

DOE PAGES

Oryspayev, Dossay; Aktulga, Hasan Metin; Sosonkina, Masha; ...

2015-07-14

In this article, sparse matrix vector multiply (SpMVM) is an important kernel that frequently arises in high performance computing applications. Due to its low arithmetic intensity, several approaches have been proposed in literature to improve its scalability and efficiency in large scale computations. In this paper, our target systems are high end multi-core architectures and we use messaging passing interface + open multiprocessing hybrid programming model for parallelism. We analyze the performance of recently proposed implementation of the distributed symmetric SpMVM, originally developed for large sparse symmetric matrices arising in ab initio nuclear structure calculations. We also study important featuresmore » of this implementation and compare with previously reported implementations that do not exploit underlying symmetry. Our SpMVM implementations leverage the hybrid paradigm to efficiently overlap expensive communications with computations. Our main comparison criterion is the "CPU core hours" metric, which is the main measure of resource usage on supercomputers. We analyze the effects of topology-aware mapping heuristic using simplified network load model. Furthermore, we have tested the different SpMVM implementations on two large clusters with 3D Torus and Dragonfly topology. Our results show that the distributed SpMVM implementation that exploits matrix symmetry and hides communication yields the best value for the "CPU core hours" metric and significantly reduces data movement overheads.« less
AthenaMT: upgrading the ATLAS software framework for the many-core world with multi-threading

NASA Astrophysics Data System (ADS)

Leggett, Charles; Baines, John; Bold, Tomasz; Calafiura, Paolo; Farrell, Steven; van Gemmeren, Peter; Malon, David; Ritsch, Elmar; Stewart, Graeme; Snyder, Scott; Tsulaia, Vakhtang; Wynne, Benjamin; ATLAS Collaboration

2017-10-01

ATLAS’s current software framework, Gaudi/Athena, has been very successful for the experiment in LHC Runs 1 and 2. However, its single threaded design has been recognized for some time to be increasingly problematic as CPUs have increased core counts and decreased available memory per core. Even the multi-process version of Athena, AthenaMP, will not scale to the range of architectures we expect to use beyond Run2. After concluding a rigorous requirements phase, where many design components were examined in detail, ATLAS has begun the migration to a new data-flow driven, multi-threaded framework, which enables the simultaneous processing of singleton, thread unsafe legacy Algorithms, cloned Algorithms that execute concurrently in their own threads with different Event contexts, and fully re-entrant, thread safe Algorithms. In this paper we report on the process of modifying the framework to safely process multiple concurrent events in different threads, which entails significant changes in the underlying handling of features such as event and time dependent data, asynchronous callbacks, metadata, integration with the online High Level Trigger for partial processing in certain regions of interest, concurrent I/O, as well as ensuring thread safety of core services. We also report on upgrading the framework to handle Algorithms that are fully re-entrant.
Application of composite dictionary multi-atom matching in gear fault diagnosis.

PubMed

Cui, Lingli; Kang, Chenhui; Wang, Huaqing; Chen, Peng

2011-01-01

The sparse decomposition based on matching pursuit is an adaptive sparse expression method for signals. This paper proposes an idea concerning a composite dictionary multi-atom matching decomposition and reconstruction algorithm, and the introduction of threshold de-noising in the reconstruction algorithm. Based on the structural characteristics of gear fault signals, a composite dictionary combining the impulse time-frequency dictionary and the Fourier dictionary was constituted, and a genetic algorithm was applied to search for the best matching atom. The analysis results of gear fault simulation signals indicated the effectiveness of the hard threshold, and the impulse or harmonic characteristic components could be separately extracted. Meanwhile, the robustness of the composite dictionary multi-atom matching algorithm at different noise levels was investigated. Aiming at the effects of data lengths on the calculation efficiency of the algorithm, an improved segmented decomposition and reconstruction algorithm was proposed, and the calculation efficiency of the decomposition algorithm was significantly enhanced. In addition it is shown that the multi-atom matching algorithm was superior to the single-atom matching algorithm in both calculation efficiency and algorithm robustness. Finally, the above algorithm was applied to gear fault engineering signals, and achieved good results.
HACC: Extreme Scaling and Performance Across Diverse Architectures

NASA Astrophysics Data System (ADS)

Habib, Salman; Morozov, Vitali; Frontiere, Nicholas; Finkel, Hal; Pope, Adrian; Heitmann, Katrin

2013-11-01

Supercomputing is evolving towards hybrid and accelerator-based architectures with millions of cores. The HACC (Hardware/Hybrid Accelerated Cosmology Code) framework exploits this diverse landscape at the largest scales of problem size, obtaining high scalability and sustained performance. Developed to satisfy the science requirements of cosmological surveys, HACC melds particle and grid methods using a novel algorithmic structure that flexibly maps across architectures, including CPU/GPU, multi/many-core, and Blue Gene systems. We demonstrate the success of HACC on two very different machines, the CPU/GPU system Titan and the BG/Q systems Sequoia and Mira, attaining unprecedented levels of scalable performance. We demonstrate strong and weak scaling on Titan, obtaining up to 99.2% parallel efficiency, evolving 1.1 trillion particles. On Sequoia, we reach 13.94 PFlops (69.2% of peak) and 90% parallel efficiency on 1,572,864 cores, with 3.6 trillion particles, the largest cosmological benchmark yet performed. HACC design concepts are applicable to several other supercomputer applications.
Scaling deep learning on GPU and knights landing clusters

DOE Office of Scientific and Technical Information (OSTI.GOV)

You, Yang; Buluc, Aydin; Demmel, James

Training neural networks has become a big bottleneck. For example, training ImageNet dataset on one Nvidia K20 GPU needs 21 days. To speed up the training process, the current deep learning systems heavily rely on the hardware accelerators. However, these accelerators have limited on-chip memory compared with CPUs. We use both self-host Intel Knights Landing (KNL) clusters and multi-GPU clusters as our target platforms. From the algorithm aspect, we focus on Elastic Averaging SGD (EASGD) to design algorithms for HPC clusters. We redesign four efficient algorithms for HPC systems to improve EASGD's poor scaling on clusters. Async EASGD, Async MEASGD,more » and Hogwild EASGD are faster than existing counter-part methods (Async SGD, Async MSGD, and Hogwild SGD) in all comparisons. Sync EASGD achieves 5.3X speedup over original EASGD on the same platform. We achieve 91.5% weak scaling efficiency on 4253 KNL cores, which is higher than the state-of-the-art implementation.« less
Optimal Golomb Ruler Sequences Generation for Optical WDM Systems: A Novel Parallel Hybrid Multi-objective Bat Algorithm

NASA Astrophysics Data System (ADS)

Bansal, Shonak; Singh, Arun Kumar; Gupta, Neena

2017-02-01

In real-life, multi-objective engineering design problems are very tough and time consuming optimization problems due to their high degree of nonlinearities, complexities and inhomogeneity. Nature-inspired based multi-objective optimization algorithms are now becoming popular for solving multi-objective engineering design problems. This paper proposes original multi-objective Bat algorithm (MOBA) and its extended form, namely, novel parallel hybrid multi-objective Bat algorithm (PHMOBA) to generate shortest length Golomb ruler called optimal Golomb ruler (OGR) sequences at a reasonable computation time. The OGRs found their application in optical wavelength division multiplexing (WDM) systems as channel-allocation algorithm to reduce the four-wave mixing (FWM) crosstalk. The performances of both the proposed algorithms to generate OGRs as optical WDM channel-allocation is compared with other existing classical computing and nature-inspired algorithms, including extended quadratic congruence (EQC), search algorithm (SA), genetic algorithms (GAs), biogeography based optimization (BBO) and big bang-big crunch (BB-BC) optimization algorithms. Simulations conclude that the proposed parallel hybrid multi-objective Bat algorithm works efficiently as compared to original multi-objective Bat algorithm and other existing algorithms to generate OGRs for optical WDM systems. The algorithm PHMOBA to generate OGRs, has higher convergence and success rate than original MOBA. The efficiency improvement of proposed PHMOBA to generate OGRs up to 20-marks, in terms of ruler length and total optical channel bandwidth (TBW) is 100 %, whereas for original MOBA is 85 %. Finally the implications for further research are also discussed.
C3: A Command-line Catalogue Cross-matching tool for modern astrophysical survey data

NASA Astrophysics Data System (ADS)

Riccio, Giuseppe; Brescia, Massimo; Cavuoti, Stefano; Mercurio, Amata; di Giorgio, Anna Maria; Molinari, Sergio

2017-06-01

In the current data-driven science era, it is needed that data analysis techniques has to quickly evolve to face with data whose dimensions has increased up to the Petabyte scale. In particular, being modern astrophysics based on multi-wavelength data organized into large catalogues, it is crucial that the astronomical catalog cross-matching methods, strongly dependant from the catalogues size, must ensure efficiency, reliability and scalability. Furthermore, multi-band data are archived and reduced in different ways, so that the resulting catalogues may differ each other in formats, resolution, data structure, etc, thus requiring the highest generality of cross-matching features. We present C 3 (Command-line Catalogue Cross-match), a multi-platform application designed to efficiently cross-match massive catalogues from modern surveys. Conceived as a stand-alone command-line process or a module within generic data reduction/analysis pipeline, it provides the maximum flexibility, in terms of portability, configuration, coordinates and cross-matching types, ensuring high performance capabilities by using a multi-core parallel processing paradigm and a sky partitioning algorithm.
Optimization of Selected Remote Sensing Algorithms for Embedded NVIDIA Kepler GPU Architecture

NASA Technical Reports Server (NTRS)

Riha, Lubomir; Le Moigne, Jacqueline; El-Ghazawi, Tarek

2015-01-01

This paper evaluates the potential of embedded Graphic Processing Units in the Nvidias Tegra K1 for onboard processing. The performance is compared to a general purpose multi-core CPU and full fledge GPU accelerator. This study uses two algorithms: Wavelet Spectral Dimension Reduction of Hyperspectral Imagery and Automated Cloud-Cover Assessment (ACCA) Algorithm. Tegra K1 achieved 51 for ACCA algorithm and 20 for the dimension reduction algorithm, as compared to the performance of the high-end 8-core server Intel Xeon CPU with 13.5 times higher power consumption.
Fast multi-core based multimodal registration of 2D cross-sections and 3D datasets

PubMed Central

2010-01-01

Background Solving bioinformatics tasks often requires extensive computational power. Recent trends in processor architecture combine multiple cores into a single chip to improve overall performance. The Cell Broadband Engine (CBE), a heterogeneous multi-core processor, provides power-efficient and cost-effective high-performance computing. One application area is image analysis and visualisation, in particular registration of 2D cross-sections into 3D image datasets. Such techniques can be used to put different image modalities into spatial correspondence, for example, 2D images of histological cuts into morphological 3D frameworks. Results We evaluate the CBE-driven PlayStation 3 as a high performance, cost-effective computing platform by adapting a multimodal alignment procedure to several characteristic hardware properties. The optimisations are based on partitioning, vectorisation, branch reducing and loop unrolling techniques with special attention to 32-bit multiplies and limited local storage on the computing units. We show how a typical image analysis and visualisation problem, the multimodal registration of 2D cross-sections and 3D datasets, benefits from the multi-core based implementation of the alignment algorithm. We discuss several CBE-based optimisation methods and compare our results to standard solutions. More information and the source code are available from http://cbe.ipk-gatersleben.de. Conclusions The results demonstrate that the CBE processor in a PlayStation 3 accelerates computational intensive multimodal registration, which is of great importance in biological/medical image processing. The PlayStation 3 as a low cost CBE-based platform offers an efficient option to conventional hardware to solve computational problems in image processing and bioinformatics. PMID:20064262
Visualization assisted by parallel processing

NASA Astrophysics Data System (ADS)

Lange, B.; Rey, H.; Vasques, X.; Puech, W.; Rodriguez, N.

2011-01-01

This paper discusses the experimental results of our visualization model for data extracted from sensors. The objective of this paper is to find a computationally efficient method to produce a real time rendering visualization for a large amount of data. We develop visualization method to monitor temperature variance of a data center. Sensors are placed on three layers and do not cover all the room. We use particle paradigm to interpolate data sensors. Particles model the "space" of the room. In this work we use a partition of the particle set, using two mathematical methods: Delaunay triangulation and Voronoý cells. Avis and Bhattacharya present these two algorithms in. Particles provide information on the room temperature at different coordinates over time. To locate and update particles data we define a computational cost function. To solve this function in an efficient way, we use a client server paradigm. Server computes data and client display this data on different kind of hardware. This paper is organized as follows. The first part presents related algorithm used to visualize large flow of data. The second part presents different platforms and methods used, which was evaluated in order to determine the better solution for the task proposed. The benchmark use the computational cost of our algorithm that formed based on located particles compared to sensors and on update of particles value. The benchmark was done on a personal computer using CPU, multi core programming, GPU programming and hybrid GPU/CPU. GPU programming method is growing in the research field; this method allows getting a real time rendering instates of a precompute rendering. For improving our results, we compute our algorithm on a High Performance Computing (HPC), this benchmark was used to improve multi-core method. HPC is commonly used in data visualization (astronomy, physic, etc) for improving the rendering and getting real-time.
The parallel algorithm for the 2D discrete wavelet transform

NASA Astrophysics Data System (ADS)

Barina, David; Najman, Pavel; Kleparnik, Petr; Kula, Michal; Zemcik, Pavel

2018-04-01

The discrete wavelet transform can be found at the heart of many image-processing algorithms. Until now, the transform on general-purpose processors (CPUs) was mostly computed using a separable lifting scheme. As the lifting scheme consists of a small number of operations, it is preferred for processing using single-core CPUs. However, considering a parallel processing using multi-core processors, this scheme is inappropriate due to a large number of steps. On such architectures, the number of steps corresponds to the number of points that represent the exchange of data. Consequently, these points often form a performance bottleneck. Our approach appropriately rearranges calculations inside the transform, and thereby reduces the number of steps. In other words, we propose a new scheme that is friendly to parallel environments. When evaluating on multi-core CPUs, we consistently overcome the original lifting scheme. The evaluation was performed on 61-core Intel Xeon Phi and 8-core Intel Xeon processors.
F3D Image Processing and Analysis for Many - and Multi-core Platforms

DOE Office of Scientific and Technical Information (OSTI.GOV)

F3D is written in OpenCL, so it achieve[sic] platform-portable parallelism on modern mutli-core CPUs and many-core GPUs. The interface and mechanims to access F3D core are written in Java as a plugin for Fiji/ImageJ to deliver several key image-processing algorithms necessary to remove artifacts from micro-tomography data. The algorithms consist of data parallel aware filters that can efficiently utilizes[sic] resources and can work on out of core datasets and scale efficiently across multiple accelerators. Optimizing for data parallel filters, streaming out of core datasets, and efficient resource and memory and data managements over complex execution sequence of filters greatly expeditesmore » any scientific workflow with image processing requirements. F3D performs several different types of 3D image processing operations, such as non-linear filtering using bilateral filtering and/or median filtering and/or morphological operators (MM). F3D gray-level MM operators are one-pass constant time methods that can perform morphological transformations with a line-structuring element oriented in discrete directions. Additionally, MM operators can be applied to gray-scale images, and consist of two parts: (a) a reference shape or structuring element, which is translated over the image, and (b) a mechanism, or operation, that defines the comparisons to be performed between the image and the structuring element. This tool provides a critical component within many complex pipelines such as those for performing automated segmentation of image stacks. F3D is also called a "descendent" of Quant-CT, another software we developed in the past. These two modules are to be integrated in a next version. Further details were reported in: D.M. Ushizima, T. Perciano, H. Krishnan, B. Loring, H. Bale, D. Parkinson, and J. Sethian. Structure recognition from high-resolution images of ceramic composites. IEEE International Conference on Big Data, October 2014.« less
Numerical parametric studies of spray combustion instability

NASA Technical Reports Server (NTRS)

Pindera, M. Z.

1993-01-01

A coupled numerical algorithm has been developed for studies of combustion instabilities in spray-driven liquid rocket engines. The model couples gas and liquid phase physics using the method of fractional steps. Also introduced is a novel, efficient methodology for accounting for spray formation through direct solution of liquid phase equations. Preliminary parametric studies show marked sensitivity of spray penetration and geometry to droplet diameter, considerations of liquid core, and acoustic interactions. Less sensitivity was shown to the combustion model type although more rigorous (multi-step) formulations may be needed for the differences to become apparent.
Self-adaptive multi-objective harmony search for optimal design of water distribution networks

NASA Astrophysics Data System (ADS)

Choi, Young Hwan; Lee, Ho Min; Yoo, Do Guen; Kim, Joong Hoon

2017-11-01

In multi-objective optimization computing, it is important to assign suitable parameters to each optimization problem to obtain better solutions. In this study, a self-adaptive multi-objective harmony search (SaMOHS) algorithm is developed to apply the parameter-setting-free technique, which is an example of a self-adaptive methodology. The SaMOHS algorithm attempts to remove some of the inconvenience from parameter setting and selects the most adaptive parameters during the iterative solution search process. To verify the proposed algorithm, an optimal least cost water distribution network design problem is applied to three different target networks. The results are compared with other well-known algorithms such as multi-objective harmony search and the non-dominated sorting genetic algorithm-II. The efficiency of the proposed algorithm is quantified by suitable performance indices. The results indicate that SaMOHS can be efficiently applied to the search for Pareto-optimal solutions in a multi-objective solution space.
Multi-fidelity stochastic collocation method for computation of statistical moments

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhu, Xueyu, E-mail: xueyu-zhu@uiowa.edu; Linebarger, Erin M., E-mail: aerinline@sci.utah.edu; Xiu, Dongbin, E-mail: xiu.16@osu.edu

We present an efficient numerical algorithm to approximate the statistical moments of stochastic problems, in the presence of models with different fidelities. The method extends the multi-fidelity approximation method developed in . By combining the efficiency of low-fidelity models and the accuracy of high-fidelity models, our method exhibits fast convergence with a limited number of high-fidelity simulations. We establish an error bound of the method and present several numerical examples to demonstrate the efficiency and applicability of the multi-fidelity algorithm.

Multi-reference approach to the calculation of photoelectron spectra including spin-orbit coupling.

PubMed

Grell, Gilbert; Bokarev, Sergey I; Winter, Bernd; Seidel, Robert; Aziz, Emad F; Aziz, Saadullah G; Kühn, Oliver

2015-08-21

X-ray photoelectron spectra provide a wealth of information on the electronic structure. The extraction of molecular details requires adequate theoretical methods, which in case of transition metal complexes has to account for effects due to the multi-configurational and spin-mixed nature of the many-electron wave function. Here, the restricted active space self-consistent field method including spin-orbit coupling is used to cope with this challenge and to calculate valence- and core-level photoelectron spectra. The intensities are estimated within the frameworks of the Dyson orbital formalism and the sudden approximation. Thereby, we utilize an efficient computational algorithm that is based on a biorthonormal basis transformation. The approach is applied to the valence photoionization of the gas phase water molecule and to the core ionization spectrum of the [Fe(H2O)6](2+) complex. The results show good agreement with the experimental data obtained in this work, whereas the sudden approximation demonstrates distinct deviations from experiments.
Research on Intelligent Control System of DC SQUID Magnetometer Parameters for Multi-channel System

NASA Astrophysics Data System (ADS)

Chen, Hua; Yang, Kang; Lu, Li; Kong, Xiangyan; Wang, Hai; Wu, Jun; Wang, Yongliang

2018-07-01

In a multi-channel SQUID measurement system, adjusting device parameters to optimal condition for all channels is time-consuming. In this paper, an intelligent control system is presented to determine the optimal working point of devices which is automatic and more efficient comparing to the manual one. An optimal working point searching algorithm is introduced as the core component of the control system. In this algorithm, the bias voltage V_bias is step scanned to obtain the maximal value of the peak-to-peak current value I_pp of the SQUID magnetometer modulation curve. We choose this point as the optimal one. Using the above control system, more than 30 weakly damped SQUID magnetometers with area of 5 × 5 mm^2 or 10 × 10 mm^2 are adjusted and a 36-channel magnetocardiography system perfectly worked in a magnetically shielded room. The average white flux noise is 15 {μ Φ }_0/Hz^{1/2}.
Research on Intelligent Control System of DC SQUID Magnetometer Parameters for Multi-channel System

NASA Astrophysics Data System (ADS)

Chen, Hua; Yang, Kang; Lu, Li; Kong, Xiangyan; Wang, Hai; Wu, Jun; Wang, Yongliang

2018-03-01

In a multi-channel SQUID measurement system, adjusting device parameters to optimal condition for all channels is time-consuming. In this paper, an intelligent control system is presented to determine the optimal working point of devices which is automatic and more efficient comparing to the manual one. An optimal working point searching algorithm is introduced as the core component of the control system. In this algorithm, the bias voltage V_bias is step scanned to obtain the maximal value of the peak-to-peak current value I_pp of the SQUID magnetometer modulation curve. We choose this point as the optimal one. Using the above control system, more than 30 weakly damped SQUID magnetometers with area of 5 × 5 mm^2 or 10 × 10 mm^2 are adjusted and a 36-channel magnetocardiography system perfectly worked in a magnetically shielded room. The average white flux noise is 15 μΦ_0/Hz^{1/2}.
A method of boundary equations for unsteady hyperbolic problems in 3D

NASA Astrophysics Data System (ADS)

Petropavlovsky, S.; Tsynkov, S.; Turkel, E.

2018-07-01

We consider interior and exterior initial boundary value problems for the three-dimensional wave (d'Alembert) equation. First, we reduce a given problem to an equivalent operator equation with respect to unknown sources defined only at the boundary of the original domain. In doing so, the Huygens' principle enables us to obtain the operator equation in a form that involves only finite and non-increasing pre-history of the solution in time. Next, we discretize the resulting boundary equation and solve it efficiently by the method of difference potentials (MDP). The overall numerical algorithm handles boundaries of general shape using regular structured grids with no deterioration of accuracy. For long simulation times it offers sub-linear complexity with respect to the grid dimension, i.e., is asymptotically cheaper than the cost of a typical explicit scheme. In addition, our algorithm allows one to share the computational cost between multiple similar problems. On multi-processor (multi-core) platforms, it benefits from what can be considered an effective parallelization in time.
Towards energy-efficient nonoscillatory forward-in-time integrations on lat-lon grids

NASA Astrophysics Data System (ADS)

Polkowski, Marcin; Piotrowski, Zbigniew; Ryczkowski, Adam

2017-04-01

The design of the next-generation weather prediction models calls for new algorithmic approaches allowing for robust integrations of atmospheric flow over complex orography at sub-km resolutions. These need to be accompanied by efficient implementations exposing multi-level parallelism, capable to run on modern supercomputing architectures. Here we present the recent advances in the energy-efficient implementation of the consistent soundproof/implicit compressible EULAG dynamical core of the COSMO weather prediction framework. Based on the experiences of the atmospheric dwarfs developed within H2020 ESCAPE project, we develop efficient, architecture agnostic implementations of fully three-dimensional MPDATA advection schemes and generalized diffusion operator in curvilinear coordinates and spherical geometry. We compare optimized Fortran implementation with preliminary C++ implementation employing the Gridtools library, allowing for integrations on CPU and GPU while maintaining single source code.
Evaluation of Genetic Algorithm Concepts Using Model Problems. Part 2; Multi-Objective Optimization

NASA Technical Reports Server (NTRS)

Holst, Terry L.; Pulliam, Thomas H.

2003-01-01

A genetic algorithm approach suitable for solving multi-objective optimization problems is described and evaluated using a series of simple model problems. Several new features including a binning selection algorithm and a gene-space transformation procedure are included. The genetic algorithm is suitable for finding pareto optimal solutions in search spaces that are defined by any number of genes and that contain any number of local extrema. Results indicate that the genetic algorithm optimization approach is flexible in application and extremely reliable, providing optimal results for all optimization problems attempted. The binning algorithm generally provides pareto front quality enhancements and moderate convergence efficiency improvements for most of the model problems. The gene-space transformation procedure provides a large convergence efficiency enhancement for problems with non-convoluted pareto fronts and a degradation in efficiency for problems with convoluted pareto fronts. The most difficult problems --multi-mode search spaces with a large number of genes and convoluted pareto fronts-- require a large number of function evaluations for GA convergence, but always converge.
An efficient solution of real-time data processing for multi-GNSS network

NASA Astrophysics Data System (ADS)

Gong, Xiaopeng; Gu, Shengfeng; Lou, Yidong; Zheng, Fu; Ge, Maorong; Liu, Jingnan

2017-12-01

Global navigation satellite systems (GNSS) are acting as an indispensable tool for geodetic research and global monitoring of the Earth, and they have been rapidly developed over the past few years with abundant GNSS networks, modern constellations, and significant improvement in mathematic models of data processing. However, due to the increasing number of satellites and stations, the computational efficiency becomes a key issue and it could hamper the further development of GNSS applications. In this contribution, this problem is overcome from the aspects of both dense linear algebra algorithms and GNSS processing strategy. First, in order to fully explore the power of modern microprocessors, the square root information filter solution based on the blocked QR factorization employing as many matrix-matrix operations as possible is introduced. In addition, the algorithm complexity of GNSS data processing is further decreased by centralizing the carrier-phase observations and ambiguity parameters, as well as performing the real-time ambiguity resolution and elimination. Based on the QR factorization of the simulated matrix, we can conclude that compared to unblocked QR factorization, the blocked QR factorization can greatly improve processing efficiency with a magnitude of nearly two orders on a personal computer with four 3.30 GHz cores. Then, with 82 globally distributed stations, the processing efficiency is further validated in multi-GNSS (GPS/BDS/Galileo) satellite clock estimation. The results suggest that it will take about 31.38 s per epoch for the unblocked method. While, without any loss of accuracy, it only takes 0.50 and 0.31 s for our new algorithm per epoch for float and fixed clock solutions, respectively.
Architecting the Finite Element Method Pipeline for the GPU.

PubMed

Fu, Zhisong; Lewis, T James; Kirby, Robert M; Whitaker, Ross T

2014-02-01

The finite element method (FEM) is a widely employed numerical technique for approximating the solution of partial differential equations (PDEs) in various science and engineering applications. Many of these applications benefit from fast execution of the FEM pipeline. One way to accelerate the FEM pipeline is by exploiting advances in modern computational hardware, such as the many-core streaming processors like the graphical processing unit (GPU). In this paper, we present the algorithms and data-structures necessary to move the entire FEM pipeline to the GPU. First we propose an efficient GPU-based algorithm to generate local element information and to assemble the global linear system associated with the FEM discretization of an elliptic PDE. To solve the corresponding linear system efficiently on the GPU, we implement a conjugate gradient method preconditioned with a geometry-informed algebraic multi-grid (AMG) method preconditioner. We propose a new fine-grained parallelism strategy, a corresponding multigrid cycling stage and efficient data mapping to the many-core architecture of GPU. Comparison of our on-GPU assembly versus a traditional serial implementation on the CPU achieves up to an 87 × speedup. Focusing on the linear system solver alone, we achieve a speedup of up to 51 × versus use of a comparable state-of-the-art serial CPU linear system solver. Furthermore, the method compares favorably with other GPU-based, sparse, linear solvers.
A Computationally Efficient Parallel Levenberg-Marquardt Algorithm for Large-Scale Big-Data Inversion

NASA Astrophysics Data System (ADS)

Lin, Y.; O'Malley, D.; Vesselinov, V. V.

2015-12-01

Inverse modeling seeks model parameters given a set of observed state variables. However, for many practical problems due to the facts that the observed data sets are often large and model parameters are often numerous, conventional methods for solving the inverse modeling can be computationally expensive. We have developed a new, computationally-efficient Levenberg-Marquardt method for solving large-scale inverse modeling. Levenberg-Marquardt methods require the solution of a dense linear system of equations which can be prohibitively expensive to compute for large-scale inverse problems. Our novel method projects the original large-scale linear problem down to a Krylov subspace, such that the dimensionality of the measurements can be significantly reduced. Furthermore, instead of solving the linear system for every Levenberg-Marquardt damping parameter, we store the Krylov subspace computed when solving the first damping parameter and recycle it for all the following damping parameters. The efficiency of our new inverse modeling algorithm is significantly improved by using these computational techniques. We apply this new inverse modeling method to invert for a random transitivity field. Our algorithm is fast enough to solve for the distributed model parameters (transitivity) at each computational node in the model domain. The inversion is also aided by the use regularization techniques. The algorithm is coded in Julia and implemented in the MADS computational framework (http://mads.lanl.gov). Julia is an advanced high-level scientific programing language that allows for efficient memory management and utilization of high-performance computational resources. By comparing with a Levenberg-Marquardt method using standard linear inversion techniques, our Levenberg-Marquardt method yields speed-up ratio of 15 in a multi-core computational environment and a speed-up ratio of 45 in a single-core computational environment. Therefore, our new inverse modeling method is a powerful tool for large-scale applications.
Theory and practical application of out of sequence measurements with results for multi-static tracking

NASA Astrophysics Data System (ADS)

Iny, David

2007-09-01

This paper addresses the out-of-sequence measurement (OOSM) problem associated with multiple platform tracking systems. The problem arises due to different transmission delays in communication of detection reports across platforms. Much of the literature focuses on the improvement to the state estimate by incorporating the OOSM. As the time lag increases, there is diminishing improvement to the state estimate. However, this paper shows that optimal processing of OOSMs may still be beneficial by improving data association as part of a multi-target tracker. This paper derives exact multi-lag algorithms with the property that the standard log likelihood track scoring is independent of the order in which the measurements are processed. The orthogonality principle is applied to generalize the method of Bar- Shalom in deriving the exact A1 algorithm for 1-lag estimation. Theory is also developed for optimal filtering of time averaged measurements and measurements correlated through periodic updates of a target aim-point. An alternative derivation of the multi-lag algorithms is also achieved using an efficient variant of the augmented state Kalman filter (AS-KF). This results in practical and reasonably efficient multi-lag algorithms. Results are compared to a well known ad hoc algorithm for incorporating OOSMs. Finally, the paper presents some simulated multi-target multi-static scenarios where there is a benefit to processing the data out of sequence in order to improve pruning efficiency.
The DANTE Boltzmann transport solver: An unstructured mesh, 3-D, spherical harmonics algorithm compatible with parallel computer architectures

DOE Office of Scientific and Technical Information (OSTI.GOV)

McGhee, J.M.; Roberts, R.M.; Morel, J.E.

1997-06-01

A spherical harmonics research code (DANTE) has been developed which is compatible with parallel computer architectures. DANTE provides 3-D, multi-material, deterministic, transport capabilities using an arbitrary finite element mesh. The linearized Boltzmann transport equation is solved in a second order self-adjoint form utilizing a Galerkin finite element spatial differencing scheme. The core solver utilizes a preconditioned conjugate gradient algorithm. Other distinguishing features of the code include options for discrete-ordinates and simplified spherical harmonics angular differencing, an exact Marshak boundary treatment for arbitrarily oriented boundary faces, in-line matrix construction techniques to minimize memory consumption, and an effective diffusion based preconditioner formore » scattering dominated problems. Algorithm efficiency is demonstrated for a massively parallel SIMD architecture (CM-5), and compatibility with MPP multiprocessor platforms or workstation clusters is anticipated.« less
Efficient parallel and out of core algorithms for constructing large bi-directed de Bruijn graphs.

PubMed

Kundeti, Vamsi K; Rajasekaran, Sanguthevar; Dinh, Hieu; Vaughn, Matthew; Thapar, Vishal

2010-11-15

Assembling genomic sequences from a set of overlapping reads is one of the most fundamental problems in computational biology. Algorithms addressing the assembly problem fall into two broad categories - based on the data structures which they employ. The first class uses an overlap/string graph and the second type uses a de Bruijn graph. However with the recent advances in short read sequencing technology, de Bruijn graph based algorithms seem to play a vital role in practice. Efficient algorithms for building these massive de Bruijn graphs are very essential in large sequencing projects based on short reads. In an earlier work, an O(n/p) time parallel algorithm has been given for this problem. Here n is the size of the input and p is the number of processors. This algorithm enumerates all possible bi-directed edges which can overlap with a node and ends up generating Θ(nΣ) messages (Σ being the size of the alphabet). In this paper we present a Θ(n/p) time parallel algorithm with a communication complexity that is equal to that of parallel sorting and is not sensitive to Σ. The generality of our algorithm makes it very easy to extend it even to the out-of-core model and in this case it has an optimal I/O complexity of Θ(nlog(n/B)Blog(M/B)) (M being the main memory size and B being the size of the disk block). We demonstrate the scalability of our parallel algorithm on a SGI/Altix computer. A comparison of our algorithm with the previous approaches reveals that our algorithm is faster--both asymptotically and practically. We demonstrate the scalability of our sequential out-of-core algorithm by comparing it with the algorithm used by VELVET to build the bi-directed de Bruijn graph. Our experiments reveal that our algorithm can build the graph with a constant amount of memory, which clearly outperforms VELVET. We also provide efficient algorithms for the bi-directed chain compaction problem. The bi-directed de Bruijn graph is a fundamental data structure for any sequence assembly program based on Eulerian approach. Our algorithms for constructing Bi-directed de Bruijn graphs are efficient in parallel and out of core settings. These algorithms can be used in building large scale bi-directed de Bruijn graphs. Furthermore, our algorithms do not employ any all-to-all communications in a parallel setting and perform better than the prior algorithms. Finally our out-of-core algorithm is extremely memory efficient and can replace the existing graph construction algorithm in VELVET.
Non-proximity resonant tunneling in multi-core photonic band gap fibers: An efficient mechanism for engineering highly-selective ultra-narrow band pass splitters

NASA Astrophysics Data System (ADS)

Florous, Nikolaos J.; Saitoh, Kunimasa; Murao, Tadashi; Koshiba, Masanori; Skorobogatiy, Maksim

2006-05-01

The objective of the present investigation is to demonstrate the possibility of designing compact ultra-narrow band-pass filters based on the phenomenon of non-proximity resonant tunneling in multi-core photonic band gap fibers (PBGFs). The proposed PBGF consists of three identical air-cores separated by two defected air-holes which act as highly-selective resonators. With a fine adjustment of the design parameters associated with the resonant-air-holes, phase matching at two distinct wavelengths can be achieved, thus enabling very narrow-band resonant directional coupling between the input and the two output cores. The validation of the proposed design is ensured with an accurate PBGF analysis based on finite element modal and beam propagation algorithms. Typical characteristics of the proposed device over a single polarization are: reasonable short coupling length of 2.7 mm, dual bandpass transmission response at wavelengths of 1.339 and 1.357 μm, with corresponding full width at half maximum bandwidths of 1.2 nm and 1.1 nm respectively, and a relatively high transmission of 95% at the exact resonance wavelengths. The proposed ultra-narrow band-pass filter can be employed in various applications such as all-fiber bandpass/bandstop filtering and resonant sensors.
Non-proximity resonant tunneling in multi-core photonic band gap fibers: An efficient mechanism for engineering highly-selective ultra-narrow band pass splitters.

PubMed

Florous, Nikolaos J; Saitoh, Kunimasa; Murao, Tadashi; Koshiba, Masanori; Skorobogatiy, Maksim

2006-05-29

The objective of the present investigation is to demonstrate the possibility of designing compact ultra-narrow band-pass filters based on the phenomenon of non-proximity resonant tunneling in multi-core photonic band gap fibers (PBGFs). The proposed PBGF consists of three identical air-cores separated by two defected air-holes which act as highly-selective resonators. With a fine adjustment of the design parameters associated with the resonant-air-holes, phase matching at two distinct wavelengths can be achieved, thus enabling very narrow-band resonant directional coupling between the input and the two output cores. The validation of the proposed design is ensured with an accurate PBGF analysis based on finite element modal and beam propagation algorithms. Typical characteristics of the proposed device over a single polarization are: reasonable short coupling length of 2.7 mm, dual bandpass transmission response at wavelengths of 1.339 and 1.357 mum, with corresponding full width at half maximum bandwidths of 1.2 nm and 1.1 nm respectively, and a relatively high transmission of 95% at the exact resonance wavelengths. The proposed ultra-narrow band-pass filter can be employed in various applications such as all-fiber bandpass/bandstop filtering and resonant sensors.
Multi-agent cooperation rescue algorithm based on influence degree and state prediction

NASA Astrophysics Data System (ADS)

Zheng, Yanbin; Ma, Guangfu; Wang, Linlin; Xi, Pengxue

2018-04-01

Aiming at the multi-agent cooperative rescue in disaster, a multi-agent cooperative rescue algorithm based on impact degree and state prediction is proposed. Firstly, based on the influence of the information in the scene on the collaborative task, the influence degree function is used to filter the information. Secondly, using the selected information to predict the state of the system and Agent behavior. Finally, according to the result of the forecast, the cooperative behavior of Agent is guided and improved the efficiency of individual collaboration. The simulation results show that this algorithm can effectively solve the cooperative rescue problem of multi-agent and ensure the efficient completion of the task.
Hybrid Reduced Order Modeling Algorithms for Reactor Physics Calculations

NASA Astrophysics Data System (ADS)

Bang, Youngsuk

Reduced order modeling (ROM) has been recognized as an indispensable approach when the engineering analysis requires many executions of high fidelity simulation codes. Examples of such engineering analyses in nuclear reactor core calculations, representing the focus of this dissertation, include the functionalization of the homogenized few-group cross-sections in terms of the various core conditions, e.g. burn-up, fuel enrichment, temperature, etc. This is done via assembly calculations which are executed many times to generate the required functionalization for use in the downstream core calculations. Other examples are sensitivity analysis used to determine important core attribute variations due to input parameter variations, and uncertainty quantification employed to estimate core attribute uncertainties originating from input parameter uncertainties. ROM constructs a surrogate model with quantifiable accuracy which can replace the original code for subsequent engineering analysis calculations. This is achieved by reducing the effective dimensionality of the input parameter, the state variable, or the output response spaces, by projection onto the so-called active subspaces. Confining the variations to the active subspace allows one to construct an ROM model of reduced complexity which can be solved more efficiently. This dissertation introduces a new algorithm to render reduction with the reduction errors bounded based on a user-defined error tolerance which represents the main challenge of existing ROM techniques. Bounding the error is the key to ensuring that the constructed ROM models are robust for all possible applications. Providing such error bounds represents one of the algorithmic contributions of this dissertation to the ROM state-of-the-art. Recognizing that ROM techniques have been developed to render reduction at different levels, e.g. the input parameter space, the state space, and the response space, this dissertation offers a set of novel hybrid ROM algorithms which can be readily integrated into existing methods and offer higher computational efficiency and defendable accuracy of the reduced models. For example, the snapshots ROM algorithm is hybridized with the range finding algorithm to render reduction in the state space, e.g. the flux in reactor calculations. In another implementation, the perturbation theory used to calculate first order derivatives of responses with respect to parameters is hybridized with a forward sensitivity analysis approach to render reduction in the parameter space. Reduction at the state and parameter spaces can be combined to render further reduction at the interface between different physics codes in a multi-physics model with the accuracy quantified in a similar manner to the single physics case. Although the proposed algorithms are generic in nature, we focus here on radiation transport models used in support of the design and analysis of nuclear reactor cores. In particular, we focus on replacing the traditional assembly calculations by ROM models to facilitate the generation of homogenized cross-sections for downstream core calculations. The implication is that assembly calculations could be done instantaneously therefore precluding the need for the expensive evaluation of the few-group cross-sections for all possible core conditions. Given the generic natures of the algorithms, we make an effort to introduce the material in a general form to allow non-nuclear engineers to benefit from this work.
Learning Behavior Characterization with Multi-Feature, Hierarchical Activity Sequences

ERIC Educational Resources Information Center

Ye, Cheng; Segedy, James R.; Kinnebrew, John S.; Biswas, Gautam

2015-01-01

This paper discusses Multi-Feature Hierarchical Sequential Pattern Mining, MFH-SPAM, a novel algorithm that efficiently extracts patterns from students' learning activity sequences. This algorithm extends an existing sequential pattern mining algorithm by dynamically selecting the level of specificity for hierarchically-defined features…
Multi-level slug tests in highly permeable formations: 2. Hydraulic conductivity identification, method verification, and field applications

USGS Publications Warehouse

Zlotnik, V.A.; McGuire, V.L.

1998-01-01

Using the developed theory and modified Springer-Gelhar (SG) model, an identification method is proposed for estimating hydraulic conductivity from multi-level slug tests. The computerized algorithm calculates hydraulic conductivity from both monotonic and oscillatory well responses obtained using a double-packer system. Field verification of the method was performed at a specially designed fully penetrating well of 0.1-m diameter with a 10-m screen in a sand and gravel alluvial aquifer (MSEA site, Shelton, Nebraska). During well installation, disturbed core samples were collected every 0.6 m using a split-spoon sampler. Vertical profiles of hydraulic conductivity were produced on the basis of grain-size analysis of the disturbed core samples. These results closely correlate with the vertical profile of horizontal hydraulic conductivity obtained by interpreting multi-level slug test responses using the modified SG model. The identification method was applied to interpret the response from 474 slug tests in 156 locations at the MSEA site. More than 60% of responses were oscillatory. The method produced a good match to experimental data for both oscillatory and monotonic responses using an automated curve matching procedure. The proposed method allowed us to drastically increase the efficiency of each well used for aquifer characterization and to process massive arrays of field data. Recommendations generalizing this experience to massive application of the proposed method are developed.Using the developed theory and modified Springer-Gelhar (SG) model, an identification method is proposed for estimating hydraulic conductivity from multi-level slug tests. The computerized algorithm calculates hydraulic conductivity from both monotonic and oscillatory well responses obtained using a double-packer system. Field verification of the method was performed at a specially designed fully penetrating well of 0.1-m diameter with a 10-m screen in a sand and gravel alluvial aquifer (MSEA site, Shelton, Nebraska). During well installation, disturbed core samples were collected every 0.6 m using a split-spoon sampler. Vertical profiles of hydraulic conductivity were produced on the basis of grain-size analysis of the disturbed core samples. These results closely correlate with the vertical profile of horizontal hydraulic conductivity obtained by interpreting multi-level slug test responses using the modified SG model. The identification method was applied to interpret the response from 474 slug tests in 156 locations at the MSEA site. More than 60% of responses were oscillatory. The method produced a good match to experimental data for both oscillatory and monotonic responses using an automated curve matching procedure. The proposed method allowed us to drastically increase the efficiency of each well used for aquifer characterization and to process massive arrays of field data. Recommendations generalizing this experience to massive application of the proposed method are developed.
Online track detection in triggerless mode for INO

NASA Astrophysics Data System (ADS)

Jain, A.; Padmini, S.; Joseph, A. N.; Mahesh, P.; Preetha, N.; Behere, A.; Sikder, S. S.; Majumder, G.; Behera, S. P.

2018-03-01

The India based Neutrino Observatory (INO) is a proposed particle physics research project to study the atmospheric neutrinos. INO-Iron Calorimeter (ICAL) will consist of 28,800 detectors having 3.6 million electronic channels expected to activate with 100 Hz single rate, producing data at a rate of 3 GBps. Data collected contains a few real hits generated by muon tracks and the remaining noise-induced spurious hits. Estimated reduction factor after filtering out data of interest from generated data is of the order of 103. This makes trigger generation critical for efficient data collection and storage. Trigger is generated by detecting coincidence across multiple channels satisfying trigger criteria, within a small window of 200 ns in the trigger region. As the probability of neutrino interaction is very low, track detection algorithm has to be efficient and fast enough to process 5 × 106 events-candidates/s without introducing significant dead time, so that not even a single neutrino event is missed out. A hardware based trigger system is presently proposed for on-line track detection considering stringent timing requirements. Though the trigger system can be designed with scalability, a lot of hardware devices and interconnections make it a complex and expensive solution with limited flexibility. A software based track detection approach working on the hit information offers an elegant solution with possibility of varying trigger criteria for selecting various potentially interesting physics events. An event selection approach for an alternative triggerless readout scheme has been developed. The algorithm is mathematically simple, robust and parallelizable. It has been validated by detecting simulated muon events for energies of the range of 1 GeV-10 GeV with 100% efficiency at a processing rate of 60 μs/event on a 16 core machine. The algorithm and result of a proof-of-concept for its faster implementation over multiple cores is presented. The paper also discusses about harnessing the computing capabilities of multi-core computing farm, thereby optimizing number of nodes required for the proposed system.
Delayed Slater determinant update algorithms for high efficiency quantum Monte Carlo

DOE PAGES

McDaniel, Tyler; D’Azevedo, Ed F.; Li, Ying Wai; ...

2017-11-07

Within ab initio Quantum Monte Carlo simulations, the leading numerical cost for large systems is the computation of the values of the Slater determinants in the trial wavefunction. Each Monte Carlo step requires finding the determinant of a dense matrix. This is most commonly iteratively evaluated using a rank-1 Sherman-Morrison updating scheme to avoid repeated explicit calculation of the inverse. The overall computational cost is therefore formally cubic in the number of electrons or matrix size. To improve the numerical efficiency of this procedure, we propose a novel multiple rank delayed update scheme. This strategy enables probability evaluation with applicationmore » of accepted moves to the matrices delayed until after a predetermined number of moves, K. The accepted events are then applied to the matrices en bloc with enhanced arithmetic intensity and computational efficiency via matrix-matrix operations instead of matrix-vector operations. Here this procedure does not change the underlying Monte Carlo sampling or its statistical efficiency. For calculations on large systems and algorithms such as diffusion Monte Carlo where the acceptance ratio is high, order of magnitude improvements in the update time can be obtained on both multi- core CPUs and GPUs.« less

Delayed Slater determinant update algorithms for high efficiency quantum Monte Carlo

DOE Office of Scientific and Technical Information (OSTI.GOV)

McDaniel, Tyler; D’Azevedo, Ed F.; Li, Ying Wai

Within ab initio Quantum Monte Carlo simulations, the leading numerical cost for large systems is the computation of the values of the Slater determinants in the trial wavefunction. Each Monte Carlo step requires finding the determinant of a dense matrix. This is most commonly iteratively evaluated using a rank-1 Sherman-Morrison updating scheme to avoid repeated explicit calculation of the inverse. The overall computational cost is therefore formally cubic in the number of electrons or matrix size. To improve the numerical efficiency of this procedure, we propose a novel multiple rank delayed update scheme. This strategy enables probability evaluation with applicationmore » of accepted moves to the matrices delayed until after a predetermined number of moves, K. The accepted events are then applied to the matrices en bloc with enhanced arithmetic intensity and computational efficiency via matrix-matrix operations instead of matrix-vector operations. Here this procedure does not change the underlying Monte Carlo sampling or its statistical efficiency. For calculations on large systems and algorithms such as diffusion Monte Carlo where the acceptance ratio is high, order of magnitude improvements in the update time can be obtained on both multi- core CPUs and GPUs.« less
Delayed Slater determinant update algorithms for high efficiency quantum Monte Carlo.

PubMed

McDaniel, T; D'Azevedo, E F; Li, Y W; Wong, K; Kent, P R C

2017-11-07

Within ab initio Quantum Monte Carlo simulations, the leading numerical cost for large systems is the computation of the values of the Slater determinants in the trial wavefunction. Each Monte Carlo step requires finding the determinant of a dense matrix. This is most commonly iteratively evaluated using a rank-1 Sherman-Morrison updating scheme to avoid repeated explicit calculation of the inverse. The overall computational cost is, therefore, formally cubic in the number of electrons or matrix size. To improve the numerical efficiency of this procedure, we propose a novel multiple rank delayed update scheme. This strategy enables probability evaluation with an application of accepted moves to the matrices delayed until after a predetermined number of moves, K. The accepted events are then applied to the matrices en bloc with enhanced arithmetic intensity and computational efficiency via matrix-matrix operations instead of matrix-vector operations. This procedure does not change the underlying Monte Carlo sampling or its statistical efficiency. For calculations on large systems and algorithms such as diffusion Monte Carlo, where the acceptance ratio is high, order of magnitude improvements in the update time can be obtained on both multi-core central processing units and graphical processing units.
Delayed Slater determinant update algorithms for high efficiency quantum Monte Carlo

NASA Astrophysics Data System (ADS)

McDaniel, T.; D'Azevedo, E. F.; Li, Y. W.; Wong, K.; Kent, P. R. C.

2017-11-01

Within ab initio Quantum Monte Carlo simulations, the leading numerical cost for large systems is the computation of the values of the Slater determinants in the trial wavefunction. Each Monte Carlo step requires finding the determinant of a dense matrix. This is most commonly iteratively evaluated using a rank-1 Sherman-Morrison updating scheme to avoid repeated explicit calculation of the inverse. The overall computational cost is, therefore, formally cubic in the number of electrons or matrix size. To improve the numerical efficiency of this procedure, we propose a novel multiple rank delayed update scheme. This strategy enables probability evaluation with an application of accepted moves to the matrices delayed until after a predetermined number of moves, K. The accepted events are then applied to the matrices en bloc with enhanced arithmetic intensity and computational efficiency via matrix-matrix operations instead of matrix-vector operations. This procedure does not change the underlying Monte Carlo sampling or its statistical efficiency. For calculations on large systems and algorithms such as diffusion Monte Carlo, where the acceptance ratio is high, order of magnitude improvements in the update time can be obtained on both multi-core central processing units and graphical processing units.
User-Assisted Store Recycling for Dynamic Task Graph Schedulers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kurt, Mehmet Can; Krishnamoorthy, Sriram; Agrawal, Gagan

The emergence of the multi-core era has led to increased interest in designing effective yet practical parallel programming models. Models based on task graphs that operate on single-assignment data are attractive in several ways: they can support dynamic applications and precisely represent the available concurrency. However, they also require nuanced algorithms for scheduling and memory management for efficient execution. In this paper, we consider memory-efficient dynamic scheduling of task graphs. Specifically, we present a novel approach for dynamically recycling the memory locations assigned to data items as they are produced by tasks. We develop algorithms to identify memory-efficient store recyclingmore » functions by systematically evaluating the validity of a set of (user-provided or automatically generated) alternatives. Because recycling function can be input data-dependent, we have also developed support for continued correct execution of a task graph in the presence of a potentially incorrect store recycling function. Experimental evaluation demonstrates that our approach to automatic store recycling incurs little to no overheads, achieves memory usage comparable to the best manually derived solutions, often produces recycling functions valid across problem sizes and input parameters, and efficiently recovers from an incorrect choice of store recycling functions.« less
Efficient sequential and parallel algorithms for finding edit distance based motifs.

PubMed

Pal, Soumitra; Xiao, Peng; Rajasekaran, Sanguthevar

2016-08-18

Motif search is an important step in extracting meaningful patterns from biological data. The general problem of motif search is intractable and there is a pressing need to develop efficient, exact and approximation algorithms to solve this problem. In this paper, we present several novel, exact, sequential and parallel algorithms for solving the (l,d) Edit-distance-based Motif Search (EMS) problem: given two integers l,d and n biological strings, find all strings of length l that appear in each input string with atmost d errors of types substitution, insertion and deletion. One popular technique to solve the problem is to explore for each input string the set of all possible l-mers that belong to the d-neighborhood of any substring of the input string and output those which are common for all input strings. We introduce a novel and provably efficient neighborhood exploration technique. We show that it is enough to consider the candidates in neighborhood which are at a distance exactly d. We compactly represent these candidate motifs using wildcard characters and efficiently explore them with very few repetitions. Our sequential algorithm uses a trie based data structure to efficiently store and sort the candidate motifs. Our parallel algorithm in a multi-core shared memory setting uses arrays for storing and a novel modification of radix-sort for sorting the candidate motifs. The algorithms for EMS are customarily evaluated on several challenging instances such as (8,1), (12,2), (16,3), (20,4), and so on. The best previously known algorithm, EMS1, is sequential and in estimated 3 days solves up to instance (16,3). Our sequential algorithms are more than 20 times faster on (16,3). On other hard instances such as (9,2), (11,3), (13,4), our algorithms are much faster. Our parallel algorithm has more than 600 % scaling performance while using 16 threads. Our algorithms have pushed up the state-of-the-art of EMS solvers and we believe that the techniques introduced in this paper are also applicable to other motif search problems such as Planted Motif Search (PMS) and Simple Motif Search (SMS).
Multi-GPU Accelerated Admittance Method for High-Resolution Human Exposure Evaluation.

PubMed

Xiong, Zubiao; Feng, Shi; Kautz, Richard; Chandra, Sandeep; Altunyurt, Nevin; Chen, Ji

2015-12-01

A multi-graphics processing unit (GPU) accelerated admittance method solver is presented for solving the induced electric field in high-resolution anatomical models of human body when exposed to external low-frequency magnetic fields. In the solver, the anatomical model is discretized as a three-dimensional network of admittances. The conjugate orthogonal conjugate gradient (COCG) iterative algorithm is employed to take advantage of the symmetric property of the complex-valued linear system of equations. Compared against the widely used biconjugate gradient stabilized method, the COCG algorithm can reduce the solving time by 3.5 times and reduce the storage requirement by about 40%. The iterative algorithm is then accelerated further by using multiple NVIDIA GPUs. The computations and data transfers between GPUs are overlapped in time by using asynchronous concurrent execution design. The communication overhead is well hidden so that the acceleration is nearly linear with the number of GPU cards. Numerical examples show that our GPU implementation running on four NVIDIA Tesla K20c cards can reach 90 times faster than the CPU implementation running on eight CPU cores (two Intel Xeon E5-2603 processors). The implemented solver is able to solve large dimensional problems efficiently. A whole adult body discretized in 1-mm resolution can be solved in just several minutes. The high efficiency achieved makes it practical to investigate human exposure involving a large number of cases with a high resolution that meets the requirements of international dosimetry guidelines.
The research of automatic speed control algorithm based on Green CBTC

NASA Astrophysics Data System (ADS)

Lin, Ying; Xiong, Hui; Wang, Xiaoliang; Wu, Youyou; Zhang, Chuanqi

2017-06-01

Automatic speed control algorithm is one of the core technologies of train operation control system. It’s a typical multi-objective optimization control algorithm, which achieve the train speed control for timing, comfort, energy-saving and precise parking. At present, the train speed automatic control technology is widely used in metro and inter-city railways. It has been found that the automatic speed control technology can effectively reduce the driver’s intensity, and improve the operation quality. However, the current used algorithm is poor at energy-saving, even not as good as manual driving. In order to solve the problem of energy-saving, this paper proposes an automatic speed control algorithm based on Green CBTC system. Based on the Green CBTC system, the algorithm can adjust the operation status of the train to improve the efficient using rate of regenerative braking feedback energy while ensuring the timing, comfort and precise parking targets. Due to the reason, the energy-using of Green CBTC system is lower than traditional CBTC system. The simulation results show that the algorithm based on Green CBTC system can effectively reduce the energy-using due to the improvement of the using rate of regenerative braking feedback energy.
Parallel Multi-Step/Multi-Rate Integration of Two-Time Scale Dynamic Systems

NASA Technical Reports Server (NTRS)

Chang, Johnny T.; Ploen, Scott R.; Sohl, Garett. A,; Martin, Bryan J.

2004-01-01

Increasing demands on the fidelity of simulations for real-time and high-fidelity simulations are stressing the capacity of modern processors. New integration techniques are required that provide maximum efficiency for systems that are parallelizable. However many current techniques make assumptions that are at odds with non-cascadable systems. A new serial multi-step/multi-rate integration algorithm for dual-timescale continuous state systems is presented which applies to these systems, and is extended to a parallel multi-step/multi-rate algorithm. The superior performance of both algorithms is demonstrated through a representative example.
Development of a Next Generation Concurrent Framework for the ATLAS Experiment

NASA Astrophysics Data System (ADS)

Calafiura, P.; Lampl, W.; Leggett, C.; Malon, D.; Stewart, G.; Wynne, B.

2015-12-01

The ATLAS experiment has successfully used its Gaudi/Athena software framework for data taking and analysis during the first LHC run, with billions of events successfully processed. However, the design of Gaudi/Athena dates from early 2000 and the software and the physics code has been written using a single threaded, serial design. This programming model has increasing difficulty in exploiting the potential of current CPUs, which offer their best performance only through taking full advantage of multiple cores and wide vector registers. Future CPU evolution will intensify this trend, with core counts increasing and memory per core falling. With current memory consumption for 64 bit ATLAS reconstruction in a high luminosity environment approaching 4GB, it will become impossible to fully occupy all cores in a machine without exhausting available memory. However, since maximizing performance per watt will be a key metric, a mechanism must be found to use all cores as efficiently as possible. In this paper we report on our progress with a practical demonstration of the use of multithreading in the ATLAS reconstruction software, using the GaudiHive framework. We have expanded support to Calorimeter, Inner Detector, and Tracking code, discussing what changes were necessary in order to allow the serially designed ATLAS code to run, both to the framework and to the tools and algorithms used. We report on both the performance gains, and what general lessons were learned about the code patterns that had been employed in the software and which patterns were identified as particularly problematic for multi-threading. We also present our findings on implementing a hybrid multi-threaded / multi-process framework, to take advantage of the strengths of each type of concurrency, while avoiding some of their corresponding limitations.
High-performance sparse matrix-matrix products on Intel KNL and multicore architectures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nagasaka, Y; Matsuoka, S; Azad, A

Sparse matrix-matrix multiplication (SpGEMM) is a computational primitive that is widely used in areas ranging from traditional numerical applications to recent big data analysis and machine learning. Although many SpGEMM algorithms have been proposed, hardware specific optimizations for multi- and many-core processors are lacking and a detailed analysis of their performance under various use cases and matrices is not available. We firstly identify and mitigate multiple bottlenecks with memory management and thread scheduling on Intel Xeon Phi (Knights Landing or KNL). Specifically targeting multi- and many-core processors, we develop a hash-table-based algorithm and optimize a heap-based shared-memory SpGEMM algorithm. Wemore » examine their performance together with other publicly available codes. Different from the literature, our evaluation also includes use cases that are representative of real graph algorithms, such as multi-source breadth-first search or triangle counting. Our hash-table and heap-based algorithms are showing significant speedups from libraries in the majority of the cases while different algorithms dominate the other scenarios with different matrix size, sparsity, compression factor and operation type. We wrap up in-depth evaluation results and make a recipe to give the best SpGEMM algorithm for target scenario. A critical finding is that hash-table-based SpGEMM gets a significant performance boost if the nonzeros are not required to be sorted within each row of the output matrix.« less
Fast and Accurate Support Vector Machines on Large Scale Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Vishnu, Abhinav; Narasimhan, Jayenthi; Holder, Larry

Support Vector Machines (SVM) is a supervised Machine Learning and Data Mining (MLDM) algorithm, which has become ubiquitous largely due to its high accuracy and obliviousness to dimensionality. The objective of SVM is to find an optimal boundary --- also known as hyperplane --- which separates the samples (examples in a dataset) of different classes by a maximum margin. Usually, very few samples contribute to the definition of the boundary. However, existing parallel algorithms use the entire dataset for finding the boundary, which is sub-optimal for performance reasons. In this paper, we propose a novel distributed memory algorithm to eliminatemore » the samples which do not contribute to the boundary definition in SVM. We propose several heuristics, which range from early (aggressive) to late (conservative) elimination of the samples, such that the overall time for generating the boundary is reduced considerably. In a few cases, a sample may be eliminated (shrunk) pre-emptively --- potentially resulting in an incorrect boundary. We propose a scalable approach to synchronize the necessary data structures such that the proposed algorithm maintains its accuracy. We consider the necessary trade-offs of single/multiple synchronization using in-depth time-space complexity analysis. We implement the proposed algorithm using MPI and compare it with libsvm--- de facto sequential SVM software --- which we enhance with OpenMP for multi-core/many-core parallelism. Our proposed approach shows excellent efficiency using up to 4096 processes on several large datasets such as UCI HIGGS Boson dataset and Offending URL dataset.« less
Electronic Structure Calculations and Adaptation Scheme in Multi-core Computing Environments

DOE Office of Scientific and Technical Information (OSTI.GOV)

Seshagiri, Lakshminarasimhan; Sosonkina, Masha; Zhang, Zhao

2009-05-20

Multi-core processing environments have become the norm in the generic computing environment and are being considered for adding an extra dimension to the execution of any application. The T2 Niagara processor is a very unique environment where it consists of eight cores having a capability of running eight threads simultaneously in each of the cores. Applications like General Atomic and Molecular Electronic Structure (GAMESS), used for ab-initio molecular quantum chemistry calculations, can be good indicators of the performance of such machines and would be a guideline for both hardware designers and application programmers. In this paper we try to benchmarkmore » the GAMESS performance on a T2 Niagara processor for a couple of molecules. We also show the suitability of using a middleware based adaptation algorithm on GAMESS on such a multi-core environment.« less
Printed freeform lens arrays on multi-core fibers for highly efficient coupling in astrophotonic systems.

PubMed

Dietrich, Philipp-Immanuel; Harris, Robert J; Blaicher, Matthias; Corrigan, Mark K; Morris, Tim M; Freude, Wolfgang; Quirrenbach, Andreas; Koos, Christian

2017-07-24

Coupling of light into multi-core fibers (MCF) for spatially resolved spectroscopy is of great importance to astronomical instrumentation. To achieve high coupling efficiencies along with fill-fractions close to unity, micro-optical elements are required to concentrate the incoming light to the individual cores of the MCF. In this paper we demonstrate facet-attached lens arrays (LA) fabricated by two-photon polymerization. The LA provide close to 100% fill-fraction along with efficiencies of up to 73% (down to 1.4 dB loss) for coupling of light from free space into an MCF core. We show the viability of the concept for astrophotonic applications by integrating an MCF-LA assembly in an adaptive-optics test bed and by assessing its performance as a tip/tilt sensor.
Exact diagonalization of quantum lattice models on coprocessors

NASA Astrophysics Data System (ADS)

Siro, T.; Harju, A.

2016-10-01

We implement the Lanczos algorithm on an Intel Xeon Phi coprocessor and compare its performance to a multi-core Intel Xeon CPU and an NVIDIA graphics processor. The Xeon and the Xeon Phi are parallelized with OpenMP and the graphics processor is programmed with CUDA. The performance is evaluated by measuring the execution time of a single step in the Lanczos algorithm. We study two quantum lattice models with different particle numbers, and conclude that for small systems, the multi-core CPU is the fastest platform, while for large systems, the graphics processor is the clear winner, reaching speedups of up to 7.6 compared to the CPU. The Xeon Phi outperforms the CPU with sufficiently large particle number, reaching a speedup of 2.5.
Efficient computation of the phylogenetic likelihood function on multi-gene alignments and multi-core architectures.

PubMed

Stamatakis, Alexandros; Ott, Michael

2008-12-27

The continuous accumulation of sequence data, for example, due to novel wet-laboratory techniques such as pyrosequencing, coupled with the increasing popularity of multi-gene phylogenies and emerging multi-core processor architectures that face problems of cache congestion, poses new challenges with respect to the efficient computation of the phylogenetic maximum-likelihood (ML) function. Here, we propose two approaches that can significantly speed up likelihood computations that typically represent over 95 per cent of the computational effort conducted by current ML or Bayesian inference programs. Initially, we present a method and an appropriate data structure to efficiently compute the likelihood score on 'gappy' multi-gene alignments. By 'gappy' we denote sampling-induced gaps owing to missing sequences in individual genes (partitions), i.e. not real alignment gaps. A first proof-of-concept implementation in RAXML indicates that this approach can accelerate inferences on large and gappy alignments by approximately one order of magnitude. Moreover, we present insights and initial performance results on multi-core architectures obtained during the transition from an OpenMP-based to a Pthreads-based fine-grained parallelization of the ML function.
Multi-Level Sequential Pattern Mining Based on Prime Encoding

NASA Astrophysics Data System (ADS)

Lianglei, Sun; Yun, Li; Jiang, Yin

Encoding is not only to express the hierarchical relationship, but also to facilitate the identification of the relationship between different levels, which will directly affect the efficiency of the algorithm in the area of mining the multi-level sequential pattern. In this paper, we prove that one step of division operation can decide the parent-child relationship between different levels by using prime encoding and present PMSM algorithm and CROSS-PMSM algorithm which are based on prime encoding for mining multi-level sequential pattern and cross-level sequential pattern respectively. Experimental results show that the algorithm can effectively extract multi-level and cross-level sequential pattern from the sequence database.
High-efficiency wavefunction updates for large scale Quantum Monte Carlo

NASA Astrophysics Data System (ADS)

Kent, Paul; McDaniel, Tyler; Li, Ying Wai; D'Azevedo, Ed

Within ab intio Quantum Monte Carlo (QMC) simulations, the leading numerical cost for large systems is the computation of the values of the Slater determinants in the trial wavefunctions. The evaluation of each Monte Carlo move requires finding the determinant of a dense matrix, which is traditionally iteratively evaluated using a rank-1 Sherman-Morrison updating scheme to avoid repeated explicit calculation of the inverse. For calculations with thousands of electrons, this operation dominates the execution profile. We propose a novel rank- k delayed update scheme. This strategy enables probability evaluation for multiple successive Monte Carlo moves, with application of accepted moves to the matrices delayed until after a predetermined number of moves, k. Accepted events grouped in this manner are then applied to the matrices en bloc with enhanced arithmetic intensity and computational efficiency. This procedure does not change the underlying Monte Carlo sampling or the sampling efficiency. For large systems and algorithms such as diffusion Monte Carlo where the acceptance ratio is high, order of magnitude speedups can be obtained on both multi-core CPU and on GPUs, making this algorithm highly advantageous for current petascale and future exascale computations.
An Efficient Next Hop Selection Algorithm for Multi-Hop Body Area Networks

PubMed Central

Ayatollahitafti, Vahid; Ngadi, Md Asri; Mohamad Sharif, Johan bin; Abdullahi, Mohammed

2016-01-01

Body Area Networks (BANs) consist of various sensors which gather patient’s vital signs and deliver them to doctors. One of the most significant challenges faced, is the design of an energy-efficient next hop selection algorithm to satisfy Quality of Service (QoS) requirements for different healthcare applications. In this paper, a novel efficient next hop selection algorithm is proposed in multi-hop BANs. This algorithm uses the minimum hop count and a link cost function jointly in each node to choose the best next hop node. The link cost function includes the residual energy, free buffer size, and the link reliability of the neighboring nodes, which is used to balance the energy consumption and to satisfy QoS requirements in terms of end to end delay and reliability. Extensive simulation experiments were performed to evaluate the efficiency of the proposed algorithm using the NS-2 simulator. Simulation results show that our proposed algorithm provides significant improvement in terms of energy consumption, number of packets forwarded, end to end delay and packet delivery ratio compared to the existing routing protocol. PMID:26771586
Fair and efficient network congestion control based on minority game

NASA Astrophysics Data System (ADS)

Wang, Zuxi; Wang, Wen; Hu, Hanping; Deng, Zhaozhang

2011-12-01

Low link utility, RTT unfairness and unfairness of Multi-Bottleneck network are the existing problems in the present network congestion control algorithms at large. Through the analogy of network congestion control with the "El Farol Bar" problem, we establish a congestion control model based on minority game(MG), and then present a novel network congestion control algorithm based on the model. The result of simulations indicates that the proposed algorithm can make the achievements of link utility closing to 100%, zero packet lose rate, and small of queue size. Besides, the RTT unfairness and the unfairness of Multi-Bottleneck network can be solved, to achieve the max-min fairness in Multi-Bottleneck network, while efficiently weaken the "ping-pong" oscillation caused by the overall synchronization.
Single-step reinitialization and extending algorithms for level-set based multi-phase flow simulations

NASA Astrophysics Data System (ADS)

Fu, Lin; Hu, Xiangyu Y.; Adams, Nikolaus A.

2017-12-01

We propose efficient single-step formulations for reinitialization and extending algorithms, which are critical components of level-set based interface-tracking methods. The level-set field is reinitialized with a single-step (non iterative) "forward tracing" algorithm. A minimum set of cells is defined that describes the interface, and reinitialization employs only data from these cells. Fluid states are extrapolated or extended across the interface by a single-step "backward tracing" algorithm. Both algorithms, which are motivated by analogy to ray-tracing, avoid multiple block-boundary data exchanges that are inevitable for iterative reinitialization and extending approaches within a parallel-computing environment. The single-step algorithms are combined with a multi-resolution conservative sharp-interface method and validated by a wide range of benchmark test cases. We demonstrate that the proposed reinitialization method achieves second-order accuracy in conserving the volume of each phase. The interface location is invariant to reapplication of the single-step reinitialization. Generally, we observe smaller absolute errors than for standard iterative reinitialization on the same grid. The computational efficiency is higher than for the standard and typical high-order iterative reinitialization methods. We observe a 2- to 6-times efficiency improvement over the standard method for serial execution. The proposed single-step extending algorithm, which is commonly employed for assigning data to ghost cells with ghost-fluid or conservative interface interaction methods, shows about 10-times efficiency improvement over the standard method while maintaining same accuracy. Despite their simplicity, the proposed algorithms offer an efficient and robust alternative to iterative reinitialization and extending methods for level-set based multi-phase simulations.

Node Resource Manager: A Distributed Computing Software Framework Used for Solving Geophysical Problems

NASA Astrophysics Data System (ADS)

Lawry, B. J.; Encarnacao, A.; Hipp, J. R.; Chang, M.; Young, C. J.

2011-12-01

With the rapid growth of multi-core computing hardware, it is now possible for scientific researchers to run complex, computationally intensive software on affordable, in-house commodity hardware. Multi-core CPUs (Central Processing Unit) and GPUs (Graphics Processing Unit) are now commonplace in desktops and servers. Developers today have access to extremely powerful hardware that enables the execution of software that could previously only be run on expensive, massively-parallel systems. It is no longer cost-prohibitive for an institution to build a parallel computing cluster consisting of commodity multi-core servers. In recent years, our research team has developed a distributed, multi-core computing system and used it to construct global 3D earth models using seismic tomography. Traditionally, computational limitations forced certain assumptions and shortcuts in the calculation of tomographic models; however, with the recent rapid growth in computational hardware including faster CPU's, increased RAM, and the development of multi-core computers, we are now able to perform seismic tomography, 3D ray tracing and seismic event location using distributed parallel algorithms running on commodity hardware, thereby eliminating the need for many of these shortcuts. We describe Node Resource Manager (NRM), a system we developed that leverages the capabilities of a parallel computing cluster. NRM is a software-based parallel computing management framework that works in tandem with the Java Parallel Processing Framework (JPPF, http://www.jppf.org/), a third party library that provides a flexible and innovative way to take advantage of modern multi-core hardware. NRM enables multiple applications to use and share a common set of networked computers, regardless of their hardware platform or operating system. Using NRM, algorithms can be parallelized to run on multiple processing cores of a distributed computing cluster of servers and desktops, which results in a dramatic speedup in execution time. NRM is sufficiently generic to support applications in any domain, as long as the application is parallelizable (i.e., can be subdivided into multiple individual processing tasks). At present, NRM has been effective in decreasing the overall runtime of several algorithms: 1) the generation of a global 3D model of the compressional velocity distribution in the Earth using tomographic inversion, 2) the calculation of the model resolution matrix, model covariance matrix, and travel time uncertainty for the aforementioned velocity model, and 3) the correlation of waveforms with archival data on a massive scale for seismic event detection. Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-AC04-94AL85000.
Tiled architecture of a CNN-mostly IP system

NASA Astrophysics Data System (ADS)

Spaanenburg, Lambert; Malki, Suleyman

2009-05-01

Multi-core architectures have been popularized with the advent of the IBM CELL. On a finer grain the problems in scheduling multi-cores have already existed in the tiled architectures, such as the EPIC and Da Vinci. It is not easy to evaluate the performance of a schedule on such architecture as historical data are not available. One solution is to compile algorithms for which an optimal schedule is known by analysis. A typical example is an algorithm that is already defined in terms of many collaborating simple nodes, such as a Cellular Neural Network (CNN). A simple node with a local register stack together with a 'rotating wheel' internal communication mechanism has been proposed. Though the basic CNN allows for a tiled implementation of a tiled algorithm on a tiled structure, a practical CNN system will have to disturb this regularity by the additional need for arithmetical and logical operations. Arithmetic operations are needed for instance to accommodate for low-level image processing, while logical operations are needed to fork and merge different data streams without use of the external memory. It is found that the 'rotating wheel' internal communication mechanism still handles such mechanisms without the need for global control. Overall the CNN system provides for a practical network size as implemented on a FPGA, can be easily used as embedded IP and provides a clear benchmark for a multi-core compiler.
Multi-core processing and scheduling performance in CMS

NASA Astrophysics Data System (ADS)

Hernández, J. M.; Evans, D.; Foulkes, S.

2012-12-01

Commodity hardware is going many-core. We might soon not be able to satisfy the job memory needs per core in the current single-core processing model in High Energy Physics. In addition, an ever increasing number of independent and incoherent jobs running on the same physical hardware not sharing resources might significantly affect processing performance. It will be essential to effectively utilize the multi-core architecture. CMS has incorporated support for multi-core processing in the event processing framework and the workload management system. Multi-core processing jobs share common data in memory, such us the code libraries, detector geometry and conditions data, resulting in a much lower memory usage than standard single-core independent jobs. Exploiting this new processing model requires a new model in computing resource allocation, departing from the standard single-core allocation for a job. The experiment job management system needs to have control over a larger quantum of resource since multi-core aware jobs require the scheduling of multiples cores simultaneously. CMS is exploring the approach of using whole nodes as unit in the workload management system where all cores of a node are allocated to a multi-core job. Whole-node scheduling allows for optimization of the data/workflow management (e.g. I/O caching, local merging) but efficient utilization of all scheduled cores is challenging. Dedicated whole-node queues have been setup at all Tier-1 centers for exploring multi-core processing workflows in CMS. We present the evaluation of the performance scheduling and executing multi-core workflows in whole-node queues compared to the standard single-core processing workflows.
An Improved Scheduling Algorithm for Data Transmission in Ultrasonic Phased Arrays with Multi-Group Ultrasonic Sensors

PubMed Central

Tang, Wenming; Liu, Guixiong; Li, Yuzhong; Tan, Daji

2017-01-01

High data transmission efficiency is a key requirement for an ultrasonic phased array with multi-group ultrasonic sensors. Here, a novel FIFOs scheduling algorithm was proposed and the data transmission efficiency with hardware technology was improved. This algorithm includes FIFOs as caches for the ultrasonic scanning data obtained from the sensors with the output data in a bandwidth-sharing way, on the basis of which an optimal length ratio of all the FIFOs is achieved, allowing the reading operations to be switched among all the FIFOs without time slot waiting. Therefore, this algorithm enhances the utilization ratio of the reading bandwidth resources so as to obtain higher efficiency than the traditional scheduling algorithms. The reliability and validity of the algorithm are substantiated after its implementation in the field programmable gate array (FPGA) technology, and the bandwidth utilization ratio and the real-time performance of the ultrasonic phased array are enhanced. PMID:29035345
Fiber-based three-dimensional multi-mode interference device as efficient power divider and vector curvature sensor

NASA Astrophysics Data System (ADS)

Zhang, Ziyang; Fiebrandt, Julia; Haynes, Dionne; Sun, Kai; Madhav, Kalaga; Stoll, Andreas; Makan, Kirill; Makan, Vadim; Roth, Martin

2018-03-01

Three-dimensional multi-mode interference devices are demonstrated using a single-mode fiber (SMF) center-spliced to a section of polygon-shaped core multimode fiber (MMF). This simple structure can effectively generate well-localized self-focusing spots that match to the layout of a chosen multi-core fiber (MCF) as a launcher device. An optimized hexagon-core MMF can provide efficient coupling from a SMF to a 7-core MCF with an insertion loss of 0.6 dB and a power imbalance of 0.5 dB, while a square-core MMF can form a self-imaging pattern with symmetrically distributed 2 × 2, 3 × 3 or 4 × 4 spots. These spots can be directly received by a two-dimensional detector array. The device can work as a vector curvature sensor by comparing the relative power among the spots with a resolution of ∼0.1° over a 1.8 mm-long MMF.
Parallel multi-join query optimization algorithm for distributed sensor network in the internet of things

NASA Astrophysics Data System (ADS)

Zheng, Yan

2015-03-01

Internet of things (IoT), focusing on providing users with information exchange and intelligent control, attracts a lot of attention of researchers from all over the world since the beginning of this century. IoT is consisted of large scale of sensor nodes and data processing units, and the most important features of IoT can be illustrated as energy confinement, efficient communication and high redundancy. With the sensor nodes increment, the communication efficiency and the available communication band width become bottle necks. Many research work is based on the instance which the number of joins is less. However, it is not proper to the increasing multi-join query in whole internet of things. To improve the communication efficiency between parallel units in the distributed sensor network, this paper proposed parallel query optimization algorithm based on distribution attributes cost graph. The storage information relations and the network communication cost are considered in this algorithm, and an optimized information changing rule is established. The experimental result shows that the algorithm has good performance, and it would effectively use the resource of each node in the distributed sensor network. Therefore, executive efficiency of multi-join query between different nodes could be improved.
A heterogeneous computing accelerated SCE-UA global optimization method using OpenMP, OpenCL, CUDA, and OpenACC.

PubMed

Kan, Guangyuan; He, Xiaoyan; Ding, Liuqian; Li, Jiren; Liang, Ke; Hong, Yang

2017-10-01

The shuffled complex evolution optimization developed at the University of Arizona (SCE-UA) has been successfully applied in various kinds of scientific and engineering optimization applications, such as hydrological model parameter calibration, for many years. The algorithm possesses good global optimality, convergence stability and robustness. However, benchmark and real-world applications reveal the poor computational efficiency of the SCE-UA. This research aims at the parallelization and acceleration of the SCE-UA method based on powerful heterogeneous computing technology. The parallel SCE-UA is implemented on Intel Xeon multi-core CPU (by using OpenMP and OpenCL) and NVIDIA Tesla many-core GPU (by using OpenCL, CUDA, and OpenACC). The serial and parallel SCE-UA were tested based on the Griewank benchmark function. Comparison results indicate the parallel SCE-UA significantly improves computational efficiency compared to the original serial version. The OpenCL implementation obtains the best overall acceleration results however, with the most complex source code. The parallel SCE-UA has bright prospects to be applied in real-world applications.
Matrix Algebra for GPU and Multicore Architectures (MAGMA) for Large Petascale Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dongarra, Jack J.; Tomov, Stanimire

2014-03-24

The goal of the MAGMA project is to create a new generation of linear algebra libraries that achieve the fastest possible time to an accurate solution on hybrid Multicore+GPU-based systems, using all the processing power that future high-end systems can make available within given energy constraints. Our efforts at the University of Tennessee achieved the goals set in all of the five areas identified in the proposal: 1. Communication optimal algorithms; 2. Autotuning for GPU and hybrid processors; 3. Scheduling and memory management techniques for heterogeneity and scale; 4. Fault tolerance and robustness for large scale systems; 5. Building energymore » efficiency into software foundations. The University of Tennessee’s main contributions, as proposed, were the research and software development of new algorithms for hybrid multi/many-core CPUs and GPUs, as related to two-sided factorizations and complete eigenproblem solvers, hybrid BLAS, and energy efficiency for dense, as well as sparse, operations. Furthermore, as proposed, we investigated and experimented with various techniques targeting the five main areas outlined.« less
Efficient sequential and parallel algorithms for record linkage.

PubMed

Mamun, Abdullah-Al; Mi, Tian; Aseltine, Robert; Rajasekaran, Sanguthevar

2014-01-01

Integrating data from multiple sources is a crucial and challenging problem. Even though there exist numerous algorithms for record linkage or deduplication, they suffer from either large time needs or restrictions on the number of datasets that they can integrate. In this paper we report efficient sequential and parallel algorithms for record linkage which handle any number of datasets and outperform previous algorithms. Our algorithms employ hierarchical clustering algorithms as the basis. A key idea that we use is radix sorting on certain attributes to eliminate identical records before any further processing. Another novel idea is to form a graph that links similar records and find the connected components. Our sequential and parallel algorithms have been tested on a real dataset of 1,083,878 records and synthetic datasets ranging in size from 50,000 to 9,000,000 records. Our sequential algorithm runs at least two times faster, for any dataset, than the previous best-known algorithm, the two-phase algorithm using faster computation of the edit distance (TPA (FCED)). The speedups obtained by our parallel algorithm are almost linear. For example, we get a speedup of 7.5 with 8 cores (residing in a single node), 14.1 with 16 cores (residing in two nodes), and 26.4 with 32 cores (residing in four nodes). We have compared the performance of our sequential algorithm with TPA (FCED) and found that our algorithm outperforms the previous one. The accuracy is the same as that of this previous best-known algorithm.
An adaptive multi-level simulation algorithm for stochastic biological systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lester, C., E-mail: lesterc@maths.ox.ac.uk; Giles, M. B.; Baker, R. E.

2015-01-14

Discrete-state, continuous-time Markov models are widely used in the modeling of biochemical reaction networks. Their complexity often precludes analytic solution, and we rely on stochastic simulation algorithms (SSA) to estimate system statistics. The Gillespie algorithm is exact, but computationally costly as it simulates every single reaction. As such, approximate stochastic simulation algorithms such as the tau-leap algorithm are often used. Potentially computationally more efficient, the system statistics generated suffer from significant bias unless tau is relatively small, in which case the computational time can be comparable to that of the Gillespie algorithm. The multi-level method [Anderson and Higham, “Multi-level Montemore » Carlo for continuous time Markov chains, with applications in biochemical kinetics,” SIAM Multiscale Model. Simul. 10(1), 146–179 (2012)] tackles this problem. A base estimator is computed using many (cheap) sample paths at low accuracy. The bias inherent in this estimator is then reduced using a number of corrections. Each correction term is estimated using a collection of paired sample paths where one path of each pair is generated at a higher accuracy compared to the other (and so more expensive). By sharing random variables between these paired paths, the variance of each correction estimator can be reduced. This renders the multi-level method very efficient as only a relatively small number of paired paths are required to calculate each correction term. In the original multi-level method, each sample path is simulated using the tau-leap algorithm with a fixed value of τ. This approach can result in poor performance when the reaction activity of a system changes substantially over the timescale of interest. By introducing a novel adaptive time-stepping approach where τ is chosen according to the stochastic behaviour of each sample path, we extend the applicability of the multi-level method to such cases. We demonstrate the efficiency of our method using a number of examples.« less
SAND: an automated VLBI imaging and analysing pipeline - I. Stripping component trajectories

NASA Astrophysics Data System (ADS)

Zhang, M.; Collioud, A.; Charlot, P.

2018-02-01

We present our implementation of an automated very long baseline interferometry (VLBI) data-reduction pipeline that is dedicated to interferometric data imaging and analysis. The pipeline can handle massive VLBI data efficiently, which makes it an appropriate tool to investigate multi-epoch multiband VLBI data. Compared to traditional manual data reduction, our pipeline provides more objective results as less human interference is involved. The source extraction is carried out in the image plane, while deconvolution and model fitting are performed in both the image plane and the uv plane for parallel comparison. The output from the pipeline includes catalogues of CLEANed images and reconstructed models, polarization maps, proper motion estimates, core light curves and multiband spectra. We have developed a regression STRIP algorithm to automatically detect linear or non-linear patterns in the jet component trajectories. This algorithm offers an objective method to match jet components at different epochs and to determine their proper motions.
Adaptive reference update (ARU) algorithm. A stochastic search algorithm for efficient optimization of multi-drug cocktails

PubMed Central

2012-01-01

Background Multi-target therapeutics has been shown to be effective for treating complex diseases, and currently, it is a common practice to combine multiple drugs to treat such diseases to optimize the therapeutic outcomes. However, considering the huge number of possible ways to mix multiple drugs at different concentrations, it is practically difficult to identify the optimal drug combination through exhaustive testing. Results In this paper, we propose a novel stochastic search algorithm, called the adaptive reference update (ARU) algorithm, that can provide an efficient and systematic way for optimizing multi-drug cocktails. The ARU algorithm iteratively updates the drug combination to improve its response, where the update is made by comparing the response of the current combination with that of a reference combination, based on which the beneficial update direction is predicted. The reference combination is continuously updated based on the drug response values observed in the past, thereby adapting to the underlying drug response function. To demonstrate the effectiveness of the proposed algorithm, we evaluated its performance based on various multi-dimensional drug functions and compared it with existing algorithms. Conclusions Simulation results show that the ARU algorithm significantly outperforms existing stochastic search algorithms, including the Gur Game algorithm. In fact, the ARU algorithm can more effectively identify potent drug combinations and it typically spends fewer iterations for finding effective combinations. Furthermore, the ARU algorithm is robust to random fluctuations and noise in the measured drug response, which makes the algorithm well-suited for practical drug optimization applications. PMID:23134742
Towards Symbolic Model Checking for Multi-Agent Systems via OBDDs

NASA Technical Reports Server (NTRS)

Raimondi, Franco; Lomunscio, Alessio

2004-01-01

We present an algorithm for model checking temporal-epistemic properties of multi-agent systems, expressed in the formalism of interpreted systems. We first introduce a technique for the translation of interpreted systems into boolean formulae, and then present a model-checking algorithm based on this translation. The algorithm is based on OBDD's, as they offer a compact and efficient representation for boolean formulae.
Parallel heterogeneous architectures for efficient OMP compressive sensing reconstruction

NASA Astrophysics Data System (ADS)

Kulkarni, Amey; Stanislaus, Jerome L.; Mohsenin, Tinoosh

2014-05-01

Compressive Sensing (CS) is a novel scheme, in which a signal that is sparse in a known transform domain can be reconstructed using fewer samples. The signal reconstruction techniques are computationally intensive and have sluggish performance, which make them impractical for real-time processing applications . The paper presents novel architectures for Orthogonal Matching Pursuit algorithm, one of the popular CS reconstruction algorithms. We show the implementation results of proposed architectures on FPGA, ASIC and on a custom many-core platform. For FPGA and ASIC implementation, a novel thresholding method is used to reduce the processing time for the optimization problem by at least 25%. Whereas, for the custom many-core platform, efficient parallelization techniques are applied, to reconstruct signals with variant signal lengths of N and sparsity of m. The algorithm is divided into three kernels. Each kernel is parallelized to reduce execution time, whereas efficient reuse of the matrix operators allows us to reduce area. Matrix operations are efficiently paralellized by taking advantage of blocked algorithms. For demonstration purpose, all architectures reconstruct a 256-length signal with maximum sparsity of 8 using 64 measurements. Implementation on Xilinx Virtex-5 FPGA, requires 27.14 μs to reconstruct the signal using basic OMP. Whereas, with thresholding method it requires 18 μs. ASIC implementation reconstructs the signal in 13 μs. However, our custom many-core, operating at 1.18 GHz, takes 18.28 μs to complete. Our results show that compared to the previous published work of the same algorithm and matrix size, proposed architectures for FPGA and ASIC implementations perform 1.3x and 1.8x respectively faster. Also, the proposed many-core implementation performs 3000x faster than the CPU and 2000x faster than the GPU.
The efficiency of average linkage hierarchical clustering algorithm associated multi-scale bootstrap resampling in identifying homogeneous precipitation catchments

NASA Astrophysics Data System (ADS)

Chuan, Zun Liang; Ismail, Noriszura; Shinyie, Wendy Ling; Lit Ken, Tan; Fam, Soo-Fen; Senawi, Azlyna; Yusoff, Wan Nur Syahidah Wan

2018-04-01

Due to the limited of historical precipitation records, agglomerative hierarchical clustering algorithms widely used to extrapolate information from gauged to ungauged precipitation catchments in yielding a more reliable projection of extreme hydro-meteorological events such as extreme precipitation events. However, identifying the optimum number of homogeneous precipitation catchments accurately based on the dendrogram resulted using agglomerative hierarchical algorithms are very subjective. The main objective of this study is to propose an efficient regionalized algorithm to identify the homogeneous precipitation catchments for non-stationary precipitation time series. The homogeneous precipitation catchments are identified using average linkage hierarchical clustering algorithm associated multi-scale bootstrap resampling, while uncentered correlation coefficient as the similarity measure. The regionalized homogeneous precipitation is consolidated using K-sample Anderson Darling non-parametric test. The analysis result shows the proposed regionalized algorithm performed more better compared to the proposed agglomerative hierarchical clustering algorithm in previous studies.
Multi-core processing and scheduling performance in CMS

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hernandez, J. M.; Evans, D.; Foulkes, S.

2012-01-01

Commodity hardware is going many-core. We might soon not be able to satisfy the job memory needs per core in the current single-core processing model in High Energy Physics. In addition, an ever increasing number of independent and incoherent jobs running on the same physical hardware not sharing resources might significantly affect processing performance. It will be essential to effectively utilize the multi-core architecture. CMS has incorporated support for multi-core processing in the event processing framework and the workload management system. Multi-core processing jobs share common data in memory, such us the code libraries, detector geometry and conditions data, resultingmore » in a much lower memory usage than standard single-core independent jobs. Exploiting this new processing model requires a new model in computing resource allocation, departing from the standard single-core allocation for a job. The experiment job management system needs to have control over a larger quantum of resource since multi-core aware jobs require the scheduling of multiples cores simultaneously. CMS is exploring the approach of using whole nodes as unit in the workload management system where all cores of a node are allocated to a multi-core job. Whole-node scheduling allows for optimization of the data/workflow management (e.g. I/O caching, local merging) but efficient utilization of all scheduled cores is challenging. Dedicated whole-node queues have been setup at all Tier-1 centers for exploring multi-core processing workflows in CMS. We present the evaluation of the performance scheduling and executing multi-core workflows in whole-node queues compared to the standard single-core processing workflows.« less
PCTO-SIM: Multiple-point geostatistical modeling using parallel conditional texture optimization

NASA Astrophysics Data System (ADS)

Pourfard, Mohammadreza; Abdollahifard, Mohammad J.; Faez, Karim; Motamedi, Sayed Ahmad; Hosseinian, Tahmineh

2017-05-01

Multiple-point Geostatistics is a well-known general statistical framework by which complex geological phenomena have been modeled efﬁciently. Pixel-based and patch-based are two major categories of these methods. In this paper, the optimization-based category is used which has a dual concept in texture synthesis as texture optimization. Our extended version of texture optimization uses the energy concept to model geological phenomena. While honoring the hard point, the minimization of our proposed cost function forces simulation grid pixels to be as similar as possible to training images. Our algorithm has a self-enrichment capability and creates a richer training database from a sparser one through mixing the information of all surrounding patches of the simulation nodes. Therefore, it preserves pattern continuity in both continuous and categorical variables very well. It also shows a fuzzy result in its every realization similar to the expected result of multi realizations of other statistical models. While the main core of most previous Multiple-point Geostatistics methods is sequential, the parallel main core of our algorithm enabled it to use GPU efficiently to reduce the CPU time. One new validation method for MPS has also been proposed in this paper.
Multi channel thermal hydraulic analysis of gas cooled fast reactor using genetic algorithm

NASA Astrophysics Data System (ADS)

Drajat, R. Z.; Su'ud, Z.; Soewono, E.; Gunawan, A. Y.

2012-05-01

There are three analyzes to be done in the design process of nuclear reactor i.e. neutronic analysis, thermal hydraulic analysis and thermodynamic analysis. The focus in this article is the thermal hydraulic analysis, which has a very important role in terms of system efficiency and the selection of the optimal design. This analysis is performed in a type of Gas Cooled Fast Reactor (GFR) using cooling Helium (He). The heat from nuclear fission reactions in nuclear reactors will be distributed through the process of conduction in fuel elements. Furthermore, the heat is delivered through a process of heat convection in the fluid flow in cooling channel. Temperature changes that occur in the coolant channels cause a decrease in pressure at the top of the reactor core. The governing equations in each channel consist of mass balance, momentum balance, energy balance, mass conservation and ideal gas equation. The problem is reduced to finding flow rates in each channel such that the pressure drops at the top of the reactor core are all equal. The problem is solved numerically with the genetic algorithm method. Flow rates and temperature distribution in each channel are obtained here.
Adapting wave-front algorithms to efficiently utilize systems with deep communication hierarchies

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kerbyson, Darren J; Lang, Michael; Pakin, Scott

2009-01-01

Large-scale systems increasingly exhibit a differential between intra-chip and inter-chip communication performance. Processor-cores on the same socket are able to communicate at lower latencies, and with higher bandwidths, than cores on different sockets either within the same node or between nodes. A key challenge is to efficiently use this communication hierarchy and hence optimize performance. We consider here the class of applications that contain wave-front processing. In these applications data can only be processed after their upstream neighbors have been processed. Similar dependencies result between processors in which communication is required to pass boundary data downstream and whose cost ismore » typically impacted by the slowest communication channel in use. In this work we develop a novel hierarchical wave-front approach that reduces the use of slower communications in the hierarchy but at the cost of additional computation and higher use of on-chip communications. This tradeoff is explored using a performance model and an implementation on the Petascale Roadrunner system demonstrates a 27% performance improvement at full system-scale on a kernel application. The approach is generally applicable to large-scale multi-core and accelerated systems where a differential in system communication performance exists.« less
Scalable, High-performance 3D Imaging Software Platform: System Architecture and Application to Virtual Colonoscopy

PubMed Central

Yoshida, Hiroyuki; Wu, Yin; Cai, Wenli; Brett, Bevin

2013-01-01

One of the key challenges in three-dimensional (3D) medical imaging is to enable the fast turn-around time, which is often required for interactive or real-time response. This inevitably requires not only high computational power but also high memory bandwidth due to the massive amount of data that need to be processed. In this work, we have developed a software platform that is designed to support high-performance 3D medical image processing for a wide range of applications using increasingly available and affordable commodity computing systems: multi-core, clusters, and cloud computing systems. To achieve scalable, high-performance computing, our platform (1) employs size-adaptive, distributable block volumes as a core data structure for efficient parallelization of a wide range of 3D image processing algorithms; (2) supports task scheduling for efficient load distribution and balancing; and (3) consists of a layered parallel software libraries that allow a wide range of medical applications to share the same functionalities. We evaluated the performance of our platform by applying it to an electronic cleansing system in virtual colonoscopy, with initial experimental results showing a 10 times performance improvement on an 8-core workstation over the original sequential implementation of the system. PMID:23366803

Efficient Multicriteria Protein Structure Comparison on Modern Processor Architectures

PubMed Central

Manolakos, Elias S.

2015-01-01

Fast increasing computational demand for all-to-all protein structures comparison (PSC) is a result of three confounding factors: rapidly expanding structural proteomics databases, high computational complexity of pairwise protein comparison algorithms, and the trend in the domain towards using multiple criteria for protein structures comparison (MCPSC) and combining results. We have developed a software framework that exploits many-core and multicore CPUs to implement efficient parallel MCPSC in modern processors based on three popular PSC methods, namely, TMalign, CE, and USM. We evaluate and compare the performance and efficiency of the two parallel MCPSC implementations using Intel's experimental many-core Single-Chip Cloud Computer (SCC) as well as Intel's Core i7 multicore processor. We show that the 48-core SCC is more efficient than the latest generation Core i7, achieving a speedup factor of 42 (efficiency of 0.9), making many-core processors an exciting emerging technology for large-scale structural proteomics. We compare and contrast the performance of the two processors on several datasets and also show that MCPSC outperforms its component methods in grouping related domains, achieving a high F-measure of 0.91 on the benchmark CK34 dataset. The software implementation for protein structure comparison using the three methods and combined MCPSC, along with the developed underlying rckskel algorithmic skeletons library, is available via GitHub. PMID:26605332
Efficient Multicriteria Protein Structure Comparison on Modern Processor Architectures.

PubMed

Sharma, Anuj; Manolakos, Elias S

2015-01-01

Fast increasing computational demand for all-to-all protein structures comparison (PSC) is a result of three confounding factors: rapidly expanding structural proteomics databases, high computational complexity of pairwise protein comparison algorithms, and the trend in the domain towards using multiple criteria for protein structures comparison (MCPSC) and combining results. We have developed a software framework that exploits many-core and multicore CPUs to implement efficient parallel MCPSC in modern processors based on three popular PSC methods, namely, TMalign, CE, and USM. We evaluate and compare the performance and efficiency of the two parallel MCPSC implementations using Intel's experimental many-core Single-Chip Cloud Computer (SCC) as well as Intel's Core i7 multicore processor. We show that the 48-core SCC is more efficient than the latest generation Core i7, achieving a speedup factor of 42 (efficiency of 0.9), making many-core processors an exciting emerging technology for large-scale structural proteomics. We compare and contrast the performance of the two processors on several datasets and also show that MCPSC outperforms its component methods in grouping related domains, achieving a high F-measure of 0.91 on the benchmark CK34 dataset. The software implementation for protein structure comparison using the three methods and combined MCPSC, along with the developed underlying rckskel algorithmic skeletons library, is available via GitHub.
Spectral efficiency in crosstalk-impaired multi-core fiber links

NASA Astrophysics Data System (ADS)

Luís, Ruben S.; Puttnam, Benjamin J.; Rademacher, Georg; Klaus, Werner; Agrell, Erik; Awaji, Yoshinari; Wada, Naoya

2018-02-01

We review the latest advances on ultra-high throughput transmission using crosstalk-limited single-mode multicore fibers and compare these with the theoretical spectral efficiency of such systems. We relate the crosstalkimposed spectral efficiency limits with fiber parameters, such as core diameter, core pitch, and trench design. Furthermore, we investigate the potential of techniques such as direction interleaving and high-order MIMO to improve the throughput or reach of these systems when using various modulation formats.
Optimization of cladding parameters for resisting corrosion on low carbon steels using simulated annealing algorithm

NASA Astrophysics Data System (ADS)

Balan, A. V.; Shivasankaran, N.; Magibalan, S.

2018-04-01

Low carbon steels used in chemical industries are frequently affected by corrosion. Cladding is a surfacing process used for depositing a thick layer of filler metal in a highly corrosive materials to achieve corrosion resistance. Flux cored arc welding (FCAW) is preferred in cladding process due to its augmented efficiency and higher deposition rate. In this cladding process, the effect of corrosion can be minimized by controlling the output responses such as minimizing dilution, penetration and maximizing bead width, reinforcement and ferrite number. This paper deals with the multi-objective optimization of flux cored arc welding responses by controlling the process parameters such as wire feed rate, welding speed, Nozzle to plate distance, welding gun angle for super duplex stainless steel material using simulated annealing technique. Regression equation has been developed and validated using ANOVA technique. The multi-objective optimization of weld bead parameters was carried out using simulated annealing to obtain optimum bead geometry for reducing corrosion. The potentiodynamic polarization test reveals the balanced formation of fine particles of ferrite and autenite content with desensitized nature of the microstructure in the optimized clad bead.
Fire behavior simulation in Mediterranean forests using the minimum travel time algorithm

Treesearch

Kostas Kalabokidis; Palaiologos Palaiologou; Mark A. Finney

2014-01-01

Recent large wildfires in Greece exemplify the need for pre-fire burn probability assessment and possible landscape fire flow estimation to enhance fire planning and resource allocation. The Minimum Travel Time (MTT) algorithm, incorporated as FlamMap's version five module, provide valuable fire behavior functions, while enabling multi-core utilization for the...
Efficient sequential and parallel algorithms for record linkage

PubMed Central

Mamun, Abdullah-Al; Mi, Tian; Aseltine, Robert; Rajasekaran, Sanguthevar

2014-01-01

Background and objective Integrating data from multiple sources is a crucial and challenging problem. Even though there exist numerous algorithms for record linkage or deduplication, they suffer from either large time needs or restrictions on the number of datasets that they can integrate. In this paper we report efficient sequential and parallel algorithms for record linkage which handle any number of datasets and outperform previous algorithms. Methods Our algorithms employ hierarchical clustering algorithms as the basis. A key idea that we use is radix sorting on certain attributes to eliminate identical records before any further processing. Another novel idea is to form a graph that links similar records and find the connected components. Results Our sequential and parallel algorithms have been tested on a real dataset of 1 083 878 records and synthetic datasets ranging in size from 50 000 to 9 000 000 records. Our sequential algorithm runs at least two times faster, for any dataset, than the previous best-known algorithm, the two-phase algorithm using faster computation of the edit distance (TPA (FCED)). The speedups obtained by our parallel algorithm are almost linear. For example, we get a speedup of 7.5 with 8 cores (residing in a single node), 14.1 with 16 cores (residing in two nodes), and 26.4 with 32 cores (residing in four nodes). Conclusions We have compared the performance of our sequential algorithm with TPA (FCED) and found that our algorithm outperforms the previous one. The accuracy is the same as that of this previous best-known algorithm. PMID:24154837
An efficient sampling algorithm for uncertain abnormal data detection in biomedical image processing and disease prediction.

PubMed

Liu, Fei; Zhang, Xi; Jia, Yan

2015-01-01

In this paper, we propose a computer information processing algorithm that can be used for biomedical image processing and disease prediction. A biomedical image is considered a data object in a multi-dimensional space. Each dimension is a feature that can be used for disease diagnosis. We introduce a new concept of the top (k1,k2) outlier. It can be used to detect abnormal data objects in the multi-dimensional space. This technique focuses on uncertain space, where each data object has several possible instances with distinct probabilities. We design an efficient sampling algorithm for the top (k1,k2) outlier in uncertain space. Some improvement techniques are used for acceleration. Experiments show our methods' high accuracy and high efficiency.
System Framework for a Multi-Band, Multi-Mode Software Defined Radio

DTIC Science & Technology

2014-06-01

detection, while the VITA Radio Transport ( VRT ) protocol over Gigabit Ethernet (GIGE) is implemented for the data interface. In addition to the SoC...CTRL VGA CTRL C2 GPP C2 CORE SW ARM0 RX SYN CTRL PL MEMORY MAP DR CTRL GENERIC INTERRUPT CONTROLLER DR GPP VITERBI ALGORITHM & VRT INTERFACE ARM1
Design synthesis and optimization of permanent magnet synchronous machines based on computationally-efficient finite element analysis

NASA Astrophysics Data System (ADS)

Sizov, Gennadi Y.

In this dissertation, a model-based multi-objective optimal design of permanent magnet ac machines, supplied by sine-wave current regulated drives, is developed and implemented. The design procedure uses an efficient electromagnetic finite element-based solver to accurately model nonlinear material properties and complex geometric shapes associated with magnetic circuit design. Application of an electromagnetic finite element-based solver allows for accurate computation of intricate performance parameters and characteristics. The first contribution of this dissertation is the development of a rapid computational method that allows accurate and efficient exploration of large multi-dimensional design spaces in search of optimum design(s). The computationally efficient finite element-based approach developed in this work provides a framework of tools that allow rapid analysis of synchronous electric machines operating under steady-state conditions. In the developed modeling approach, major steady-state performance parameters such as, winding flux linkages and voltages, average, cogging and ripple torques, stator core flux densities, core losses, efficiencies and saturated machine winding inductances, are calculated with minimum computational effort. In addition, the method includes means for rapid estimation of distributed stator forces and three-dimensional effects of stator and/or rotor skew on the performance of the machine. The second contribution of this dissertation is the development of the design synthesis and optimization method based on a differential evolution algorithm. The approach relies on the developed finite element-based modeling method for electromagnetic analysis and is able to tackle large-scale multi-objective design problems using modest computational resources. Overall, computational time savings of up to two orders of magnitude are achievable, when compared to current and prevalent state-of-the-art methods. These computational savings allow one to expand the optimization problem to achieve more complex and comprehensive design objectives. The method is used in the design process of several interior permanent magnet industrial motors. The presented case studies demonstrate that the developed finite element-based approach practically eliminates the need for using less accurate analytical and lumped parameter equivalent circuit models for electric machine design optimization. The design process and experimental validation of the case-study machines are detailed in the dissertation.
Performance improvement of multi-class detection using greedy algorithm for Viola-Jones cascade selection

NASA Astrophysics Data System (ADS)

Tereshin, Alexander A.; Usilin, Sergey A.; Arlazarov, Vladimir V.

2018-04-01

This paper aims to study the problem of multi-class object detection in video stream with Viola-Jones cascades. An adaptive algorithm for selecting Viola-Jones cascade based on greedy choice strategy in solution of the N-armed bandit problem is proposed. The efficiency of the algorithm on the problem of detection and recognition of the bank card logos in the video stream is shown. The proposed algorithm can be effectively used in documents localization and identification, recognition of road scene elements, localization and tracking of the lengthy objects , and for solving other problems of rigid object detection in a heterogeneous data flows. The computational efficiency of the algorithm makes it possible to use it both on personal computers and on mobile devices based on processors with low power consumption.
Superscattering of light optimized by a genetic algorithm

NASA Astrophysics Data System (ADS)

Mirzaei, Ali; Miroshnichenko, Andrey E.; Shadrivov, Ilya V.; Kivshar, Yuri S.

2014-07-01

We analyse scattering of light from multi-layer plasmonic nanowires and employ a genetic algorithm for optimizing the scattering cross section. We apply the mode-expansion method using experimental data for material parameters to demonstrate that our genetic algorithm allows designing realistic core-shell nanostructures with the superscattering effect achieved at any desired wavelength. This approach can be employed for optimizing both superscattering and cloaking at different wavelengths in the visible spectral range.
Behavior-aware cache hierarchy optimization for low-power multi-core embedded systems

NASA Astrophysics Data System (ADS)

Zhao, Huatao; Luo, Xiao; Zhu, Chen; Watanabe, Takahiro; Zhu, Tianbo

2017-07-01

In modern embedded systems, the increasing number of cores requires efficient cache hierarchies to ensure data throughput, but such cache hierarchies are restricted by their tumid size and interference accesses which leads to both performance degradation and wasted energy. In this paper, we firstly propose a behavior-aware cache hierarchy (BACH) which can optimally allocate the multi-level cache resources to many cores and highly improved the efficiency of cache hierarchy, resulting in low energy consumption. The BACH takes full advantage of the explored application behaviors and runtime cache resource demands as the cache allocation bases, so that we can optimally configure the cache hierarchy to meet the runtime demand. The BACH was implemented on the GEM5 simulator. The experimental results show that energy consumption of a three-level cache hierarchy can be saved from 5.29% up to 27.94% compared with other key approaches while the performance of the multi-core system even has a slight improvement counting in hardware overhead.
Nipype: a flexible, lightweight and extensible neuroimaging data processing framework in python.

PubMed

Gorgolewski, Krzysztof; Burns, Christopher D; Madison, Cindee; Clark, Dav; Halchenko, Yaroslav O; Waskom, Michael L; Ghosh, Satrajit S

2011-01-01

Current neuroimaging software offer users an incredible opportunity to analyze their data in different ways, with different underlying assumptions. Several sophisticated software packages (e.g., AFNI, BrainVoyager, FSL, FreeSurfer, Nipy, R, SPM) are used to process and analyze large and often diverse (highly multi-dimensional) data. However, this heterogeneous collection of specialized applications creates several issues that hinder replicable, efficient, and optimal use of neuroimaging analysis approaches: (1) No uniform access to neuroimaging analysis software and usage information; (2) No framework for comparative algorithm development and dissemination; (3) Personnel turnover in laboratories often limits methodological continuity and training new personnel takes time; (4) Neuroimaging software packages do not address computational efficiency; and (5) Methods sections in journal articles are inadequate for reproducing results. To address these issues, we present Nipype (Neuroimaging in Python: Pipelines and Interfaces; http://nipy.org/nipype), an open-source, community-developed, software package, and scriptable library. Nipype solves the issues by providing Interfaces to existing neuroimaging software with uniform usage semantics and by facilitating interaction between these packages using Workflows. Nipype provides an environment that encourages interactive exploration of algorithms, eases the design of Workflows within and between packages, allows rapid comparative development of algorithms and reduces the learning curve necessary to use different packages. Nipype supports both local and remote execution on multi-core machines and clusters, without additional scripting. Nipype is Berkeley Software Distribution licensed, allowing anyone unrestricted usage. An open, community-driven development philosophy allows the software to quickly adapt and address the varied needs of the evolving neuroimaging community, especially in the context of increasing demand for reproducible research.
Nipype: A Flexible, Lightweight and Extensible Neuroimaging Data Processing Framework in Python

PubMed Central

Gorgolewski, Krzysztof; Burns, Christopher D.; Madison, Cindee; Clark, Dav; Halchenko, Yaroslav O.; Waskom, Michael L.; Ghosh, Satrajit S.

2011-01-01

Current neuroimaging software offer users an incredible opportunity to analyze their data in different ways, with different underlying assumptions. Several sophisticated software packages (e.g., AFNI, BrainVoyager, FSL, FreeSurfer, Nipy, R, SPM) are used to process and analyze large and often diverse (highly multi-dimensional) data. However, this heterogeneous collection of specialized applications creates several issues that hinder replicable, efficient, and optimal use of neuroimaging analysis approaches: (1) No uniform access to neuroimaging analysis software and usage information; (2) No framework for comparative algorithm development and dissemination; (3) Personnel turnover in laboratories often limits methodological continuity and training new personnel takes time; (4) Neuroimaging software packages do not address computational efficiency; and (5) Methods sections in journal articles are inadequate for reproducing results. To address these issues, we present Nipype (Neuroimaging in Python: Pipelines and Interfaces; http://nipy.org/nipype), an open-source, community-developed, software package, and scriptable library. Nipype solves the issues by providing Interfaces to existing neuroimaging software with uniform usage semantics and by facilitating interaction between these packages using Workflows. Nipype provides an environment that encourages interactive exploration of algorithms, eases the design of Workflows within and between packages, allows rapid comparative development of algorithms and reduces the learning curve necessary to use different packages. Nipype supports both local and remote execution on multi-core machines and clusters, without additional scripting. Nipype is Berkeley Software Distribution licensed, allowing anyone unrestricted usage. An open, community-driven development philosophy allows the software to quickly adapt and address the varied needs of the evolving neuroimaging community, especially in the context of increasing demand for reproducible research. PMID:21897815
Software-Defined Architectures for Spectrally Efficient Cognitive Networking in Extreme Environments

NASA Astrophysics Data System (ADS)

Sklivanitis, Georgios

The objective of this dissertation is the design, development, and experimental evaluation of novel algorithms and reconfigurable radio architectures for spectrally efficient cognitive networking in terrestrial, airborne, and underwater environments. Next-generation wireless communication architectures and networking protocols that maximize spectrum utilization efficiency in congested/contested or low-spectral availability (extreme) communication environments can enable a rich body of applications with unprecedented societal impact. In recent years, underwater wireless networks have attracted significant attention for military and commercial applications including oceanographic data collection, disaster prevention, tactical surveillance, offshore exploration, and pollution monitoring. Unmanned aerial systems that are autonomously networked and fully mobile can assist humans in extreme or difficult-to-reach environments and provide cost-effective wireless connectivity for devices without infrastructure coverage. Cognitive radio (CR) has emerged as a promising technology to maximize spectral efficiency in dynamically changing communication environments by adaptively reconfiguring radio communication parameters. At the same time, the fast developing technology of software-defined radio (SDR) platforms has enabled hardware realization of cognitive radio algorithms for opportunistic spectrum access. However, existing algorithmic designs and protocols for shared spectrum access do not effectively capture the interdependencies between radio parameters at the physical (PHY), medium-access control (MAC), and network (NET) layers of the network protocol stack. In addition, existing off-the-shelf radio platforms and SDR programmable architectures are far from fulfilling runtime adaptation and reconfiguration across PHY, MAC, and NET layers. Spectrum allocation in cognitive networks with multi-hop communication requirements depends on the location, network traffic load, and interference profile at each network node. As a result, the development and implementation of algorithms and cross-layer reconfigurable radio platforms that can jointly treat space, time, and frequency as a unified resource to be dynamically optimized according to inter- and intra-network interference constraints is of fundamental importance. In the next chapters, we present novel algorithmic and software/hardware implementation developments toward the deployment of spectrally efficient terrestrial, airborne, and underwater wireless networks. In Chapter 1 we review the state-of-art in commercially available SDR platforms, describe their software and hardware capabilities, and classify them based on their ability to enable rapid prototyping and advance experimental research in wireless networks. Chapter 2 discusses system design and implementation details toward real-time evaluation of a software-radio platform for all-spectrum cognitive channelization in the presence of narrowband or wideband primary stations. All-spectrum channelization is achieved by designing maximum signal-to-interference-plus-noise ratio (SINR) waveforms that span the whole continuum of the device-accessible spectrum, while satisfying peak power and interference temperature (IT) constraints for the secondary and primary users, respectively. In Chapter 3, we introduce the concept of all-spectrum channelization based on max-SINR optimized sparse-binary waveforms, we propose optimal and suboptimal waveform design algorithms, and evaluate their SINR and bit-error-rate (BER) performance in an SDR testbed. Chapter 4 considers the problem of channel estimation with minimal pilot signaling in multi-cell multi-user multi-input multi-output (MIMO) systems with very large antenna arrays at the base station, and proposes a least-squares (LS)-type algorithm that iteratively extracts channel and data estimates from a short record of data measurements. Our algorithmic developments toward spectrally-efficient cognitive networking through joint optimization of channel access code-waveforms and routes in a multi-hop network are described in Chapter 5. Algorithmic designs are software optimized on heterogeneous multi-core general-purpose processor (GPP)-based SDR architectures by leveraging a novel software-radio framework that offers self-optimization and real-time adaptation capabilities at the PHY, MAC, and NET layers of the network protocol stack. Our system design approach is experimentally validated under realistic conditions in a large-scale hybrid ground-air testbed deployment. Chapter 6 reviews the state-of-art in software and hardware platforms for underwater wireless networking and proposes a software-defined acoustic modem prototype that enables (i) cognitive reconfiguration of PHY/MAC parameters, and (ii) cross-technology communication adaptation. The proposed modem design is evaluated in terms of effective communication data rate in both water tank and lake testbed setups. In Chapter 7, we present a novel receiver configuration for code-waveform-based multiple-access underwater communications. The proposed receiver is fully reconfigurable and executes (i) all-spectrum cognitive channelization, and (ii) combined synchronization, channel estimation, and demodulation. Experimental evaluation in terms of SINR and BER show that all-spectrum channelization is a powerful proposition for underwater communications. At the same time, the proposed receiver design can significantly enhance bandwidth utilization. Finally, in Chapter 8, we focus on challenging practical issues that arise in underwater acoustic sensor network setups where co-located multi-antenna sensor deployment is not feasible due to power, computation, and hardware limitations, and design, implement, and evaluate an underwater receiver structure that accounts for multiple carrier frequency and timing offsets in virtual (distributed) MIMO underwater systems.
MIC-SVM: Designing A Highly Efficient Support Vector Machine For Advanced Modern Multi-Core and Many-Core Architectures

DOE Office of Scientific and Technical Information (OSTI.GOV)

You, Yang; Song, Shuaiwen; Fu, Haohuan

2014-08-16

Support Vector Machine (SVM) has been widely used in data-mining and Big Data applications as modern commercial databases start to attach an increasing importance to the analytic capabilities. In recent years, SVM was adapted to the field of High Performance Computing for power/performance prediction, auto-tuning, and runtime scheduling. However, even at the risk of losing prediction accuracy due to insufficient runtime information, researchers can only afford to apply offline model training to avoid significant runtime training overhead. To address the challenges above, we designed and implemented MICSVM, a highly efficient parallel SVM for x86 based multi-core and many core architectures,more » such as the Intel Ivy Bridge CPUs and Intel Xeon Phi coprocessor (MIC).« less
Optimization of view weighting in tilted-plane-based reconstruction algorithms to minimize helical artifacts in multi-slice helical CT

NASA Astrophysics Data System (ADS)

Tang, Xiangyang

2003-05-01

In multi-slice helical CT, the single-tilted-plane-based reconstruction algorithm has been proposed to combat helical and cone beam artifacts by tilting a reconstruction plane to fit a helical source trajectory optimally. Furthermore, to improve the noise characteristics or dose efficiency of the single-tilted-plane-based reconstruction algorithm, the multi-tilted-plane-based reconstruction algorithm has been proposed, in which the reconstruction plane deviates from the pose globally optimized due to an extra rotation along the 3rd axis. As a result, the capability of suppressing helical and cone beam artifacts in the multi-tilted-plane-based reconstruction algorithm is compromised. An optomized tilted-plane-based reconstruction algorithm is proposed in this paper, in which a matched view weighting strategy is proposed to optimize the capability of suppressing helical and cone beam artifacts and noise characteristics. A helical body phantom is employed to quantitatively evaluate the imaging performance of the matched view weighting approach by tabulating artifact index and noise characteristics, showing that the matched view weighting improves both the helical artifact suppression and noise characteristics or dose efficiency significantly in comparison to the case in which non-matched view weighting is applied. Finally, it is believed that the matched view weighting approach is of practical importance in the development of multi-slive helical CT, because it maintains the computational structure of fan beam filtered backprojection and demands no extra computational services.
Nebo: An efficient, parallel, and portable domain-specific language for numerically solving partial differential equations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Earl, Christopher; Might, Matthew; Bagusetty, Abhishek

This study presents Nebo, a declarative domain-specific language embedded in C++ for discretizing partial differential equations for transport phenomena on multiple architectures. Application programmers use Nebo to write code that appears sequential but can be run in parallel, without editing the code. Currently Nebo supports single-thread execution, multi-thread execution, and many-core (GPU-based) execution. With single-thread execution, Nebo performs on par with code written by domain experts. With multi-thread execution, Nebo can linearly scale (with roughly 90% efficiency) up to 12 cores, compared to its single-thread execution. Moreover, Nebo’s many-core execution can be over 140x faster than its single-thread execution.
Nebo: An efficient, parallel, and portable domain-specific language for numerically solving partial differential equations

DOE PAGES

Earl, Christopher; Might, Matthew; Bagusetty, Abhishek; ...

2016-01-26

This study presents Nebo, a declarative domain-specific language embedded in C++ for discretizing partial differential equations for transport phenomena on multiple architectures. Application programmers use Nebo to write code that appears sequential but can be run in parallel, without editing the code. Currently Nebo supports single-thread execution, multi-thread execution, and many-core (GPU-based) execution. With single-thread execution, Nebo performs on par with code written by domain experts. With multi-thread execution, Nebo can linearly scale (with roughly 90% efficiency) up to 12 cores, compared to its single-thread execution. Moreover, Nebo’s many-core execution can be over 140x faster than its single-thread execution.
Energy-Efficient Deadline-Aware Data-Gathering Scheme Using Multiple Mobile Data Collectors.

PubMed

Dasgupta, Rumpa; Yoon, Seokhoon

2017-04-01

In wireless sensor networks, the data collected by sensors are usually forwarded to the sink through multi-hop forwarding. However, multi-hop forwarding can be inefficient due to the energy hole problem and high communications overhead. Moreover, when the monitored area is large and the number of sensors is small, sensors cannot send the data via multi-hop forwarding due to the lack of network connectivity. In order to address those problems of multi-hop forwarding, in this paper, we consider a data collection scheme that uses mobile data collectors (MDCs), which visit sensors and collect data from them. Due to the recent breakthroughs in wireless power transfer technology, MDCs can also be used to recharge the sensors to keep them from draining their energy. In MDC-based data-gathering schemes, a big challenge is how to find the MDCs' traveling paths in a balanced way, such that their energy consumption is minimized and the packet-delay constraint is satisfied. Therefore, in this paper, we aim at finding the MDCs' paths, taking energy efficiency and delay constraints into account. We first define an optimization problem, named the delay-constrained energy minimization (DCEM) problem, to find the paths for MDCs. An integer linear programming problem is formulated to find the optimal solution. We also propose a two-phase path-selection algorithm to efficiently solve the DCEM problem. Simulations are performed to compare the performance of the proposed algorithms with two heuristics algorithms for the vehicle routing problem under various scenarios. The simulation results show that the proposed algorithms can outperform existing algorithms in terms of energy efficiency and packet delay.

Energy-Efficient Deadline-Aware Data-Gathering Scheme Using Multiple Mobile Data Collectors

PubMed Central

Dasgupta, Rumpa; Yoon, Seokhoon

2017-01-01

In wireless sensor networks, the data collected by sensors are usually forwarded to the sink through multi-hop forwarding. However, multi-hop forwarding can be inefficient due to the energy hole problem and high communications overhead. Moreover, when the monitored area is large and the number of sensors is small, sensors cannot send the data via multi-hop forwarding due to the lack of network connectivity. In order to address those problems of multi-hop forwarding, in this paper, we consider a data collection scheme that uses mobile data collectors (MDCs), which visit sensors and collect data from them. Due to the recent breakthroughs in wireless power transfer technology, MDCs can also be used to recharge the sensors to keep them from draining their energy. In MDC-based data-gathering schemes, a big challenge is how to find the MDCs’ traveling paths in a balanced way, such that their energy consumption is minimized and the packet-delay constraint is satisfied. Therefore, in this paper, we aim at finding the MDCs’ paths, taking energy efficiency and delay constraints into account. We first define an optimization problem, named the delay-constrained energy minimization (DCEM) problem, to find the paths for MDCs. An integer linear programming problem is formulated to find the optimal solution. We also propose a two-phase path-selection algorithm to efficiently solve the DCEM problem. Simulations are performed to compare the performance of the proposed algorithms with two heuristics algorithms for the vehicle routing problem under various scenarios. The simulation results show that the proposed algorithms can outperform existing algorithms in terms of energy efficiency and packet delay. PMID:28368300
MultiNest: Efficient and Robust Bayesian Inference

NASA Astrophysics Data System (ADS)

Feroz, F.; Hobson, M. P.; Bridges, M.

2011-09-01

We present further development and the first public release of our multimodal nested sampling algorithm, called MultiNest. This Bayesian inference tool calculates the evidence, with an associated error estimate, and produces posterior samples from distributions that may contain multiple modes and pronounced (curving) degeneracies in high dimensions. The developments presented here lead to further substantial improvements in sampling efficiency and robustness, as compared to the original algorithm presented in Feroz & Hobson (2008), which itself significantly outperformed existing MCMC techniques in a wide range of astrophysical inference problems. The accuracy and economy of the MultiNest algorithm is demonstrated by application to two toy problems and to a cosmological inference problem focusing on the extension of the vanilla LambdaCDM model to include spatial curvature and a varying equation of state for dark energy. The MultiNest software is fully parallelized using MPI and includes an interface to CosmoMC. It will also be released as part of the SuperBayeS package, for the analysis of supersymmetric theories of particle physics, at this http URL.
The Speech multi features fusion perceptual hash algorithm based on tensor decomposition

NASA Astrophysics Data System (ADS)

Huang, Y. B.; Fan, M. H.; Zhang, Q. Y.

2018-03-01

With constant progress in modern speech communication technologies, the speech data is prone to be attacked by the noise or maliciously tampered. In order to make the speech perception hash algorithm has strong robustness and high efficiency, this paper put forward a speech perception hash algorithm based on the tensor decomposition and multi features is proposed. This algorithm analyses the speech perception feature acquires each speech component wavelet packet decomposition. LPCC, LSP and ISP feature of each speech component are extracted to constitute the speech feature tensor. Speech authentication is done by generating the hash values through feature matrix quantification which use mid-value. Experimental results showing that the proposed algorithm is robust for content to maintain operations compared with similar algorithms. It is able to resist the attack of the common background noise. Also, the algorithm is highly efficiency in terms of arithmetic, and is able to meet the real-time requirements of speech communication and complete the speech authentication quickly.
Multigroup Monte Carlo on GPUs: Comparison of history- and event-based algorithms

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hamilton, Steven P.; Slattery, Stuart R.; Evans, Thomas M.

This article presents an investigation of the performance of different multigroup Monte Carlo transport algorithms on GPUs with a discussion of both history-based and event-based approaches. Several algorithmic improvements are introduced for both approaches. By modifying the history-based algorithm that is traditionally favored in CPU-based MC codes to occasionally filter out dead particles to reduce thread divergence, performance exceeds that of either the pure history-based or event-based approaches. The impacts of several algorithmic choices are discussed, including performance studies on Kepler and Pascal generation NVIDIA GPUs for fixed source and eigenvalue calculations. Single-device performance equivalent to 20–40 CPU cores onmore » the K40 GPU and 60–80 CPU cores on the P100 GPU is achieved. Last, in addition, nearly perfect multi-device parallel weak scaling is demonstrated on more than 16,000 nodes of the Titan supercomputer.« less
Multigroup Monte Carlo on GPUs: Comparison of history- and event-based algorithms

DOE PAGES

Hamilton, Steven P.; Slattery, Stuart R.; Evans, Thomas M.

2017-12-22

This article presents an investigation of the performance of different multigroup Monte Carlo transport algorithms on GPUs with a discussion of both history-based and event-based approaches. Several algorithmic improvements are introduced for both approaches. By modifying the history-based algorithm that is traditionally favored in CPU-based MC codes to occasionally filter out dead particles to reduce thread divergence, performance exceeds that of either the pure history-based or event-based approaches. The impacts of several algorithmic choices are discussed, including performance studies on Kepler and Pascal generation NVIDIA GPUs for fixed source and eigenvalue calculations. Single-device performance equivalent to 20–40 CPU cores onmore » the K40 GPU and 60–80 CPU cores on the P100 GPU is achieved. Last, in addition, nearly perfect multi-device parallel weak scaling is demonstrated on more than 16,000 nodes of the Titan supercomputer.« less
Efficient algorithms and implementations of entropy-based moment closures for rarefied gases

NASA Astrophysics Data System (ADS)

Schaerer, Roman Pascal; Bansal, Pratyuksh; Torrilhon, Manuel

2017-07-01

We present efficient algorithms and implementations of the 35-moment system equipped with the maximum-entropy closure in the context of rarefied gases. While closures based on the principle of entropy maximization have been shown to yield very promising results for moderately rarefied gas flows, the computational cost of these closures is in general much higher than for closure theories with explicit closed-form expressions of the closing fluxes, such as Grad's classical closure. Following a similar approach as Garrett et al. (2015) [13], we investigate efficient implementations of the computationally expensive numerical quadrature method used for the moment evaluations of the maximum-entropy distribution by exploiting its inherent fine-grained parallelism with the parallelism offered by multi-core processors and graphics cards. We show that using a single graphics card as an accelerator allows speed-ups of two orders of magnitude when compared to a serial CPU implementation. To accelerate the time-to-solution for steady-state problems, we propose a new semi-implicit time discretization scheme. The resulting nonlinear system of equations is solved with a Newton type method in the Lagrange multipliers of the dual optimization problem in order to reduce the computational cost. Additionally, fully explicit time-stepping schemes of first and second order accuracy are presented. We investigate the accuracy and efficiency of the numerical schemes for several numerical test cases, including a steady-state shock-structure problem.
Evaluation of Emerging Energy-Efficient Heterogeneous Computing Platforms for Biomolecular and Cellular Simulation Workloads.

PubMed

Stone, John E; Hallock, Michael J; Phillips, James C; Peterson, Joseph R; Luthey-Schulten, Zaida; Schulten, Klaus

2016-05-01

Many of the continuing scientific advances achieved through computational biology are predicated on the availability of ongoing increases in computational power required for detailed simulation and analysis of cellular processes on biologically-relevant timescales. A critical challenge facing the development of future exascale supercomputer systems is the development of new computing hardware and associated scientific applications that dramatically improve upon the energy efficiency of existing solutions, while providing increased simulation, analysis, and visualization performance. Mobile computing platforms have recently become powerful enough to support interactive molecular visualization tasks that were previously only possible on laptops and workstations, creating future opportunities for their convenient use for meetings, remote collaboration, and as head mounted displays for immersive stereoscopic viewing. We describe early experiences adapting several biomolecular simulation and analysis applications for emerging heterogeneous computing platforms that combine power-efficient system-on-chip multi-core CPUs with high-performance massively parallel GPUs. We present low-cost power monitoring instrumentation that provides sufficient temporal resolution to evaluate the power consumption of individual CPU algorithms and GPU kernels. We compare the performance and energy efficiency of scientific applications running on emerging platforms with results obtained on traditional platforms, identify hardware and algorithmic performance bottlenecks that affect the usability of these platforms, and describe avenues for improving both the hardware and applications in pursuit of the needs of molecular modeling tasks on mobile devices and future exascale computers.
Multi-heuristic dynamic task allocation using genetic algorithms in a heterogeneous distributed system

PubMed Central

Page, Andrew J.; Keane, Thomas M.; Naughton, Thomas J.

2010-01-01

We present a multi-heuristic evolutionary task allocation algorithm to dynamically map tasks to processors in a heterogeneous distributed system. It utilizes a genetic algorithm, combined with eight common heuristics, in an effort to minimize the total execution time. It operates on batches of unmapped tasks and can preemptively remap tasks to processors. The algorithm has been implemented on a Java distributed system and evaluated with a set of six problems from the areas of bioinformatics, biomedical engineering, computer science and cryptography. Experiments using up to 150 heterogeneous processors show that the algorithm achieves better efficiency than other state-of-the-art heuristic algorithms. PMID:20862190
Multi-task feature selection in microarray data by binary integer programming.

PubMed

Lan, Liang; Vucetic, Slobodan

2013-12-20

A major challenge in microarray classification is that the number of features is typically orders of magnitude larger than the number of examples. In this paper, we propose a novel feature filter algorithm to select the feature subset with maximal discriminative power and minimal redundancy by solving a quadratic objective function with binary integer constraints. To improve the computational efficiency, the binary integer constraints are relaxed and a low-rank approximation to the quadratic term is applied. The proposed feature selection algorithm was extended to solve multi-task microarray classification problems. We compared the single-task version of the proposed feature selection algorithm with 9 existing feature selection methods on 4 benchmark microarray data sets. The empirical results show that the proposed method achieved the most accurate predictions overall. We also evaluated the multi-task version of the proposed algorithm on 8 multi-task microarray datasets. The multi-task feature selection algorithm resulted in significantly higher accuracy than when using the single-task feature selection methods.
CCOMP: An efficient algorithm for complex roots computation of determinantal equations

NASA Astrophysics Data System (ADS)

Zouros, Grigorios P.

2018-01-01

In this paper a free Python algorithm, entitled CCOMP (Complex roots COMPutation), is developed for the efficient computation of complex roots of determinantal equations inside a prescribed complex domain. The key to the method presented is the efficient determination of the candidate points inside the domain which, in their close neighborhood, a complex root may lie. Once these points are detected, the algorithm proceeds to a two-dimensional minimization problem with respect to the minimum modulus eigenvalue of the system matrix. In the core of CCOMP exist three sub-algorithms whose tasks are the efficient estimation of the minimum modulus eigenvalues of the system matrix inside the prescribed domain, the efficient computation of candidate points which guarantee the existence of minima, and finally, the computation of minima via bound constrained minimization algorithms. Theoretical results and heuristics support the development and the performance of the algorithm, which is discussed in detail. CCOMP supports general complex matrices, and its efficiency, applicability and validity is demonstrated to a variety of microwave applications.
Toward GEOS-6, A Global Cloud System Resolving Atmospheric Model

NASA Technical Reports Server (NTRS)

Putman, William M.

2010-01-01

NASA is committed to observing and understanding the weather and climate of our home planet through the use of multi-scale modeling systems and space-based observations. Global climate models have evolved to take advantage of the influx of multi- and many-core computing technologies and the availability of large clusters of multi-core microprocessors. GEOS-6 is a next-generation cloud system resolving atmospheric model that will place NASA at the forefront of scientific exploration of our atmosphere and climate. Model simulations with GEOS-6 will produce a realistic representation of our atmosphere on the scale of typical satellite observations, bringing a visual comprehension of model results to a new level among the climate enthusiasts. In preparation for GEOS-6, the agency's flagship Earth System Modeling Framework [JDl] has been enhanced to support cutting-edge high-resolution global climate and weather simulations. Improvements include a cubed-sphere grid that exposes parallelism; a non-hydrostatic finite volume dynamical core, and algorithm designed for co-processor technologies, among others. GEOS-6 represents a fundamental advancement in the capability of global Earth system models. The ability to directly compare global simulations at the resolution of spaceborne satellite images will lead to algorithm improvements and better utilization of space-based observations within the GOES data assimilation system
Protecting core networks with dual-homing: A study on enhanced network availability, resource efficiency, and energy-savings

NASA Astrophysics Data System (ADS)

Abeywickrama, Sandu; Furdek, Marija; Monti, Paolo; Wosinska, Lena; Wong, Elaine

2016-12-01

Core network survivability affects the reliability performance of telecommunication networks and remains one of the most important network design considerations. This paper critically examines the benefits arising from utilizing dual-homing in the optical access networks to provide resource-efficient protection against link and node failures in the optical core segment. Four novel, heuristic-based RWA algorithms that provide dedicated path protection in networks with dual-homing are proposed and studied. These algorithms protect against different failure scenarios (i.e. single link or node failures) and are implemented with different optimization objectives (i.e., minimization of wavelength usage and path length). Results obtained through simulations and comparison with baseline architectures indicate that exploiting dual-homed architecture in the access segment can bring significant improvements in terms of core network resource usage, connection availability, and power consumption.
MC3: Multi-core Markov-chain Monte Carlo code

NASA Astrophysics Data System (ADS)

Cubillos, Patricio; Harrington, Joseph; Lust, Nate; Foster, AJ; Stemm, Madison; Loredo, Tom; Stevenson, Kevin; Campo, Chris; Hardin, Matt; Hardy, Ryan

2016-10-01

MC3 (Multi-core Markov-chain Monte Carlo) is a Bayesian statistics tool that can be executed from the shell prompt or interactively through the Python interpreter with single- or multiple-CPU parallel computing. It offers Markov-chain Monte Carlo (MCMC) posterior-distribution sampling for several algorithms, Levenberg-Marquardt least-squares optimization, and uniform non-informative, Jeffreys non-informative, or Gaussian-informative priors. MC3 can share the same value among multiple parameters and fix the value of parameters to constant values, and offers Gelman-Rubin convergence testing and correlated-noise estimation with time-averaging or wavelet-based likelihood estimation methods.
Towards robust algorithms for current deposition and dynamic load-balancing in a GPU particle in cell code

NASA Astrophysics Data System (ADS)

Rossi, Francesco; Londrillo, Pasquale; Sgattoni, Andrea; Sinigardi, Stefano; Turchetti, Giorgio

2012-12-01

We present `jasmine', an implementation of a fully relativistic, 3D, electromagnetic Particle-In-Cell (PIC) code, capable of running simulations in various laser plasma acceleration regimes on Graphics-Processing-Units (GPUs) HPC clusters. Standard energy/charge preserving FDTD-based algorithms have been implemented using double precision and quadratic (or arbitrary sized) shape functions for the particle weighting. When porting a PIC scheme to the GPU architecture (or, in general, a shared memory environment), the particle-to-grid operations (e.g. the evaluation of the current density) require special care to avoid memory inconsistencies and conflicts. Here we present a robust implementation of this operation that is efficient for any number of particles per cell and particle shape function order. Our algorithm exploits the exposed GPU memory hierarchy and avoids the use of atomic operations, which can hurt performance especially when many particles lay on the same cell. We show the code multi-GPU scalability results and present a dynamic load-balancing algorithm. The code is written using a python-based C++ meta-programming technique which translates in a high level of modularity and allows for easy performance tuning and simple extension of the core algorithms to various simulation schemes.
Multi-GPU parallel algorithm design and analysis for improved inversion of probability tomography with gravity gradiometry data

NASA Astrophysics Data System (ADS)

Hou, Zhenlong; Huang, Danian

2017-09-01

In this paper, we make a study on the inversion of probability tomography (IPT) with gravity gradiometry data at first. The space resolution of the results is improved by multi-tensor joint inversion, depth weighting matrix and the other methods. Aiming at solving the problems brought by the big data in the exploration, we present the parallel algorithm and the performance analysis combining Compute Unified Device Architecture (CUDA) with Open Multi-Processing (OpenMP) based on Graphics Processing Unit (GPU) accelerating. In the test of the synthetic model and real data from Vinton Dome, we get the improved results. It is also proved that the improved inversion algorithm is effective and feasible. The performance of parallel algorithm we designed is better than the other ones with CUDA. The maximum speedup could be more than 200. In the performance analysis, multi-GPU speedup and multi-GPU efficiency are applied to analyze the scalability of the multi-GPU programs. The designed parallel algorithm is demonstrated to be able to process larger scale of data and the new analysis method is practical.
Efficient cooperative compressive spectrum sensing by identifying multi-candidate and exploiting deterministic matrix

NASA Astrophysics Data System (ADS)

Li, Jia; Wang, Qiang; Yan, Wenjie; Shen, Yi

2015-12-01

Cooperative spectrum sensing exploits the spatial diversity to improve the detection of occupied channels in cognitive radio networks (CRNs). Cooperative compressive spectrum sensing (CCSS) utilizing the sparsity of channel occupancy further improves the efficiency by reducing the number of reports without degrading detection performance. In this paper, we firstly and mainly propose the referred multi-candidate orthogonal matrix matching pursuit (MOMMP) algorithms to efficiently and effectively detect occupied channels at fusion center (FC), where multi-candidate identification and orthogonal projection are utilized to respectively reduce the number of required iterations and improve the probability of exact identification. Secondly, two common but different approaches based on threshold and Gaussian distribution are introduced to realize the multi-candidate identification. Moreover, to improve the detection accuracy and energy efficiency, we propose the matrix construction based on shrinkage and gradient descent (MCSGD) algorithm to provide a deterministic filter coefficient matrix of low t-average coherence. Finally, several numerical simulations validate that our proposals provide satisfactory performance with higher probability of detection, lower probability of false alarm and less detection time.
A Simulated Annealing Algorithm for the Optimization of Multistage Depressed Collector Efficiency

NASA Technical Reports Server (NTRS)

Vaden, Karl R.; Wilson, Jeffrey D.; Bulson, Brian A.

2002-01-01

The microwave traveling wave tube amplifier (TWTA) is widely used as a high-power transmitting source for space and airborne communications. One critical factor in designing a TWTA is the overall efficiency. However, overall efficiency is highly dependent upon collector efficiency; so collector design is critical to the performance of a TWTA. Therefore, NASA Glenn Research Center has developed an optimization algorithm based on Simulated Annealing to quickly design highly efficient multi-stage depressed collectors (MDC).
Harmonic regression based multi-temporal cloud filtering algorithm for Landsat 8

NASA Astrophysics Data System (ADS)

Joshi, P.

2015-12-01

Landsat data archive though rich is seen to have missing dates and periods owing to the weather irregularities and inconsistent coverage. The satellite images are further subject to cloud cover effects resulting in erroneous analysis and observations of ground features. In earlier studies the change detection algorithm using statistical control charts on harmonic residuals of multi-temporal Landsat 5 data have been shown to detect few prominent remnant clouds [Brooks, Evan B., et al, 2014]. So, in this work we build on this harmonic regression approach to detect and filter clouds using a multi-temporal series of Landsat 8 images. Firstly, we compute the harmonic coefficients using the fitting models on annual training data. This time series of residuals is further subjected to Shewhart X-bar control charts which signal the deviations of cloud points from the fitted multi-temporal fourier curve. For the process with standard deviation σ we found the second and third order harmonic regression with a x-bar chart control limit [Lσ] ranging between [0.5σ < Lσ < σ] as most efficient in detecting clouds. By implementing second order harmonic regression with successive x-bar chart control limits of L and 0.5 L on the NDVI, NDSI and haze optimized transformation (HOT), and utilizing the seasonal physical properties of these parameters, we have designed a novel multi-temporal algorithm for filtering clouds from Landsat 8 images. The method is applied to Virginia and Alabama in Landsat8 UTM zones 17 and 16 respectively. Our algorithm efficiently filters all types of cloud cover with an overall accuracy greater than 90%. As a result of the multi-temporal operation and the ability to recreate the multi-temporal database of images using only the coefficients of the fourier regression, our algorithm is largely storage and time efficient. The results show a good potential for this multi-temporal approach for cloud detection as a timely and targeted solution for the Landsat 8 research community, catering to the need for innovative processing solutions in the infant stage of the satellite.
Scaling Deep Learning on GPU and Knights Landing clusters

DOE PAGES

You, Yang; Buluc, Aydin; Demmel, James

2017-09-26

The speed of deep neural networks training has become a big bottleneck of deep learning research and development. For example, training GoogleNet by ImageNet dataset on one Nvidia K20 GPU needs 21 days. To speed up the training process, the current deep learning systems heavily rely on the hardware accelerators. However, these accelerators have limited on-chip memory compared with CPUs. To handle large datasets, they need to fetch data from either CPU memory or remote processors. We use both self-hosted Intel Knights Landing (KNL) clusters and multi-GPU clusters as our target platforms. From an algorithm aspect, current distributed machine learningmore » systems are mainly designed for cloud systems. These methods are asynchronous because of the slow network and high fault-tolerance requirement on cloud systems. We focus on Elastic Averaging SGD (EASGD) to design algorithms for HPC clusters. Original EASGD used round-robin method for communication and updating. The communication is ordered by the machine rank ID, which is inefficient on HPC clusters. First, we redesign four efficient algorithms for HPC systems to improve EASGD's poor scaling on clusters. Async EASGD, Async MEASGD, and Hogwild EASGD are faster \\textcolor{black}{than} their existing counterparts (Async SGD, Async MSGD, and Hogwild SGD, resp.) in all the comparisons. Finally, we design Sync EASGD, which ties for the best performance among all the methods while being deterministic. In addition to the algorithmic improvements, we use some system-algorithm codesign techniques to scale up the algorithms. By reducing the percentage of communication from 87% to 14%, our Sync EASGD achieves 5.3x speedup over original EASGD on the same platform. We get 91.5% weak scaling efficiency on 4253 KNL cores, which is higher than the state-of-the-art implementation.« less
Scaling Deep Learning on GPU and Knights Landing clusters

DOE Office of Scientific and Technical Information (OSTI.GOV)

You, Yang; Buluc, Aydin; Demmel, James

The speed of deep neural networks training has become a big bottleneck of deep learning research and development. For example, training GoogleNet by ImageNet dataset on one Nvidia K20 GPU needs 21 days. To speed up the training process, the current deep learning systems heavily rely on the hardware accelerators. However, these accelerators have limited on-chip memory compared with CPUs. To handle large datasets, they need to fetch data from either CPU memory or remote processors. We use both self-hosted Intel Knights Landing (KNL) clusters and multi-GPU clusters as our target platforms. From an algorithm aspect, current distributed machine learningmore » systems are mainly designed for cloud systems. These methods are asynchronous because of the slow network and high fault-tolerance requirement on cloud systems. We focus on Elastic Averaging SGD (EASGD) to design algorithms for HPC clusters. Original EASGD used round-robin method for communication and updating. The communication is ordered by the machine rank ID, which is inefficient on HPC clusters. First, we redesign four efficient algorithms for HPC systems to improve EASGD's poor scaling on clusters. Async EASGD, Async MEASGD, and Hogwild EASGD are faster \\textcolor{black}{than} their existing counterparts (Async SGD, Async MSGD, and Hogwild SGD, resp.) in all the comparisons. Finally, we design Sync EASGD, which ties for the best performance among all the methods while being deterministic. In addition to the algorithmic improvements, we use some system-algorithm codesign techniques to scale up the algorithms. By reducing the percentage of communication from 87% to 14%, our Sync EASGD achieves 5.3x speedup over original EASGD on the same platform. We get 91.5% weak scaling efficiency on 4253 KNL cores, which is higher than the state-of-the-art implementation.« less

Nash equilibrium and multi criterion aerodynamic optimization

NASA Astrophysics Data System (ADS)

Tang, Zhili; Zhang, Lianhe

2016-06-01

Game theory and its particular Nash Equilibrium (NE) are gaining importance in solving Multi Criterion Optimization (MCO) in engineering problems over the past decade. The solution of a MCO problem can be viewed as a NE under the concept of competitive games. This paper surveyed/proposed four efficient algorithms for calculating a NE of a MCO problem. Existence and equivalence of the solution are analyzed and proved in the paper based on fixed point theorem. Specific virtual symmetric Nash game is also presented to set up an optimization strategy for single objective optimization problems. Two numerical examples are presented to verify proposed algorithms. One is mathematical functions' optimization to illustrate detailed numerical procedures of algorithms, the other is aerodynamic drag reduction of civil transport wing fuselage configuration by using virtual game. The successful application validates efficiency of algorithms in solving complex aerodynamic optimization problem.
Multi-Objective Community Detection Based on Memetic Algorithm

PubMed Central

2015-01-01

Community detection has drawn a lot of attention as it can provide invaluable help in understanding the function and visualizing the structure of networks. Since single objective optimization methods have intrinsic drawbacks to identifying multiple significant community structures, some methods formulate the community detection as multi-objective problems and adopt population-based evolutionary algorithms to obtain multiple community structures. Evolutionary algorithms have strong global search ability, but have difficulty in locating local optima efficiently. In this study, in order to identify multiple significant community structures more effectively, a multi-objective memetic algorithm for community detection is proposed by combining multi-objective evolutionary algorithm with a local search procedure. The local search procedure is designed by addressing three issues. Firstly, nondominated solutions generated by evolutionary operations and solutions in dominant population are set as initial individuals for local search procedure. Then, a new direction vector named as pseudonormal vector is proposed to integrate two objective functions together to form a fitness function. Finally, a network specific local search strategy based on label propagation rule is expanded to search the local optimal solutions efficiently. The extensive experiments on both artificial and real-world networks evaluate the proposed method from three aspects. Firstly, experiments on influence of local search procedure demonstrate that the local search procedure can speed up the convergence to better partitions and make the algorithm more stable. Secondly, comparisons with a set of classic community detection methods illustrate the proposed method can find single partitions effectively. Finally, the method is applied to identify hierarchical structures of networks which are beneficial for analyzing networks in multi-resolution levels. PMID:25932646
Multi-objective community detection based on memetic algorithm.

PubMed

Wu, Peng; Pan, Li

2015-01-01

Community detection has drawn a lot of attention as it can provide invaluable help in understanding the function and visualizing the structure of networks. Since single objective optimization methods have intrinsic drawbacks to identifying multiple significant community structures, some methods formulate the community detection as multi-objective problems and adopt population-based evolutionary algorithms to obtain multiple community structures. Evolutionary algorithms have strong global search ability, but have difficulty in locating local optima efficiently. In this study, in order to identify multiple significant community structures more effectively, a multi-objective memetic algorithm for community detection is proposed by combining multi-objective evolutionary algorithm with a local search procedure. The local search procedure is designed by addressing three issues. Firstly, nondominated solutions generated by evolutionary operations and solutions in dominant population are set as initial individuals for local search procedure. Then, a new direction vector named as pseudonormal vector is proposed to integrate two objective functions together to form a fitness function. Finally, a network specific local search strategy based on label propagation rule is expanded to search the local optimal solutions efficiently. The extensive experiments on both artificial and real-world networks evaluate the proposed method from three aspects. Firstly, experiments on influence of local search procedure demonstrate that the local search procedure can speed up the convergence to better partitions and make the algorithm more stable. Secondly, comparisons with a set of classic community detection methods illustrate the proposed method can find single partitions effectively. Finally, the method is applied to identify hierarchical structures of networks which are beneficial for analyzing networks in multi-resolution levels.
A computationally efficient parallel Levenberg-Marquardt algorithm for highly parameterized inverse model analyses

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lin, Youzuo; O'Malley, Daniel; Vesselinov, Velimir V.

Inverse modeling seeks model parameters given a set of observations. However, for practical problems because the number of measurements is often large and the model parameters are also numerous, conventional methods for inverse modeling can be computationally expensive. We have developed a new, computationally-efficient parallel Levenberg-Marquardt method for solving inverse modeling problems with a highly parameterized model space. Levenberg-Marquardt methods require the solution of a linear system of equations which can be prohibitively expensive to compute for moderate to large-scale problems. Our novel method projects the original linear problem down to a Krylov subspace, such that the dimensionality of themore » problem can be significantly reduced. Furthermore, we store the Krylov subspace computed when using the first damping parameter and recycle the subspace for the subsequent damping parameters. The efficiency of our new inverse modeling algorithm is significantly improved using these computational techniques. We apply this new inverse modeling method to invert for random transmissivity fields in 2D and a random hydraulic conductivity field in 3D. Our algorithm is fast enough to solve for the distributed model parameters (transmissivity) in the model domain. The algorithm is coded in Julia and implemented in the MADS computational framework (http://mads.lanl.gov). By comparing with Levenberg-Marquardt methods using standard linear inversion techniques such as QR or SVD methods, our Levenberg-Marquardt method yields a speed-up ratio on the order of ~10 1 to ~10 2 in a multi-core computational environment. Furthermore, our new inverse modeling method is a powerful tool for characterizing subsurface heterogeneity for moderate- to large-scale problems.« less
A computationally efficient parallel Levenberg-Marquardt algorithm for highly parameterized inverse model analyses

DOE PAGES

Lin, Youzuo; O'Malley, Daniel; Vesselinov, Velimir V.

2016-09-01

Inverse modeling seeks model parameters given a set of observations. However, for practical problems because the number of measurements is often large and the model parameters are also numerous, conventional methods for inverse modeling can be computationally expensive. We have developed a new, computationally-efficient parallel Levenberg-Marquardt method for solving inverse modeling problems with a highly parameterized model space. Levenberg-Marquardt methods require the solution of a linear system of equations which can be prohibitively expensive to compute for moderate to large-scale problems. Our novel method projects the original linear problem down to a Krylov subspace, such that the dimensionality of themore » problem can be significantly reduced. Furthermore, we store the Krylov subspace computed when using the first damping parameter and recycle the subspace for the subsequent damping parameters. The efficiency of our new inverse modeling algorithm is significantly improved using these computational techniques. We apply this new inverse modeling method to invert for random transmissivity fields in 2D and a random hydraulic conductivity field in 3D. Our algorithm is fast enough to solve for the distributed model parameters (transmissivity) in the model domain. The algorithm is coded in Julia and implemented in the MADS computational framework (http://mads.lanl.gov). By comparing with Levenberg-Marquardt methods using standard linear inversion techniques such as QR or SVD methods, our Levenberg-Marquardt method yields a speed-up ratio on the order of ~10 1 to ~10 2 in a multi-core computational environment. Furthermore, our new inverse modeling method is a powerful tool for characterizing subsurface heterogeneity for moderate- to large-scale problems.« less
Adapting hierarchical bidirectional inter prediction on a GPU-based platform for 2D and 3D H.264 video coding

NASA Astrophysics Data System (ADS)

Rodríguez-Sánchez, Rafael; Martínez, José Luis; Cock, Jan De; Fernández-Escribano, Gerardo; Pieters, Bart; Sánchez, José L.; Claver, José M.; de Walle, Rik Van

2013-12-01

The H.264/AVC video coding standard introduces some improved tools in order to increase compression efficiency. Moreover, the multi-view extension of H.264/AVC, called H.264/MVC, adopts many of them. Among the new features, variable block-size motion estimation is one which contributes to high coding efficiency. Furthermore, it defines a different prediction structure that includes hierarchical bidirectional pictures, outperforming traditional Group of Pictures patterns in both scenarios: single-view and multi-view. However, these video coding techniques have high computational complexity. Several techniques have been proposed in the literature over the last few years which are aimed at accelerating the inter prediction process, but there are no works focusing on bidirectional prediction or hierarchical prediction. In this article, with the emergence of many-core processors or accelerators, a step forward is taken towards an implementation of an H.264/AVC and H.264/MVC inter prediction algorithm on a graphics processing unit. The results show a negligible rate distortion drop with a time reduction of up to 98% for the complete H.264/AVC encoder.
Large-Scale Compute-Intensive Analysis via a Combined In-situ and Co-scheduling Workflow Approach

DOE Office of Scientific and Technical Information (OSTI.GOV)

Messer, Bronson; Sewell, Christopher; Heitmann, Katrin

2015-01-01

Large-scale simulations can produce tens of terabytes of data per analysis cycle, complicating and limiting the efficiency of workflows. Traditionally, outputs are stored on the file system and analyzed in post-processing. With the rapidly increasing size and complexity of simulations, this approach faces an uncertain future. Trending techniques consist of performing the analysis in situ, utilizing the same resources as the simulation, and/or off-loading subsets of the data to a compute-intensive analysis system. We introduce an analysis framework developed for HACC, a cosmological N-body code, that uses both in situ and co-scheduling approaches for handling Petabyte-size outputs. An initial inmore » situ step is used to reduce the amount of data to be analyzed, and to separate out the data-intensive tasks handled off-line. The analysis routines are implemented using the PISTON/VTK-m framework, allowing a single implementation of an algorithm that simultaneously targets a variety of GPU, multi-core, and many-core architectures.« less
A concept for a fuel efficient flight planning aid for general aviation

NASA Technical Reports Server (NTRS)

Collins, B. P.; Haines, A. L.; Wales, C. J.

1982-01-01

A core equation for estimation of fuel burn from path profile data was developed. This equation was used as a necessary ingredient in a dynamic program to define a fuel efficient flight path. The resultant algorithm is oriented toward use by general aviation. The pilot provides a description of the desired ground track, standard aircraft parameters, and weather at selected waypoints. The algorithm then derives the fuel efficient altitudes and velocities at the waypoints.
LU Factorization with Partial Pivoting for a Multi-CPU, Multi-GPU Shared Memory System

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kurzak, Jakub; Luszczek, Pitior; Faverge, Mathieu

2012-03-01

LU factorization with partial pivoting is a canonical numerical procedure and the main component of the High Performance LINPACK benchmark. This article presents an implementation of the algorithm for a hybrid, shared memory, system with standard CPU cores and GPU accelerators. Performance in excess of one TeraFLOPS is achieved using four AMD Magny Cours CPUs and four NVIDIA Fermi GPUs.
A software defined RTU multi-protocol automatic adaptation data transmission method

NASA Astrophysics Data System (ADS)

Jin, Huiying; Xu, Xingwu; Wang, Zhanfeng; Ma, Weijun; Li, Sheng; Su, Yong; Pan, Yunpeng

2018-02-01

Remote terminal unit (RTU) is the core device of the monitor system in hydrology and water resources. Different devices often have different communication protocols in the application layer, which results in the difficulty in information analysis and communication networking. Therefore, we introduced the idea of software defined hardware, and abstracted the common feature of mainstream communication protocols of RTU application layer, and proposed a uniformed common protocol model. Then, various communication protocol algorithms of application layer are modularized according to the model. The executable codes of these algorithms are labeled by the virtual functions and stored in the flash chips of embedded CPU to form the protocol stack. According to the configuration commands to initialize the RTU communication systems, it is able to achieve dynamic assembling and loading of various application layer communication protocols of RTU and complete the efficient transport of sensor data from RTU to central station when the data acquisition protocol of sensors and various external communication terminals remain unchanged.
Multi-Optimisation Consensus Clustering

NASA Astrophysics Data System (ADS)

Li, Jian; Swift, Stephen; Liu, Xiaohui

Ensemble Clustering has been developed to provide an alternative way of obtaining more stable and accurate clustering results. It aims to avoid the biases of individual clustering algorithms. However, it is still a challenge to develop an efficient and robust method for Ensemble Clustering. Based on an existing ensemble clustering method, Consensus Clustering (CC), this paper introduces an advanced Consensus Clustering algorithm called Multi-Optimisation Consensus Clustering (MOCC), which utilises an optimised Agreement Separation criterion and a Multi-Optimisation framework to improve the performance of CC. Fifteen different data sets are used for evaluating the performance of MOCC. The results reveal that MOCC can generate more accurate clustering results than the original CC algorithm.
Learning to Predict Combinatorial Structures

NASA Astrophysics Data System (ADS)

Vembu, Shankar

2009-12-01

The major challenge in designing a discriminative learning algorithm for predicting structured data is to address the computational issues arising from the exponential size of the output space. Existing algorithms make different assumptions to ensure efficient, polynomial time estimation of model parameters. For several combinatorial structures, including cycles, partially ordered sets, permutations and other graph classes, these assumptions do not hold. In this thesis, we address the problem of designing learning algorithms for predicting combinatorial structures by introducing two new assumptions: (i) The first assumption is that a particular counting problem can be solved efficiently. The consequence is a generalisation of the classical ridge regression for structured prediction. (ii) The second assumption is that a particular sampling problem can be solved efficiently. The consequence is a new technique for designing and analysing probabilistic structured prediction models. These results can be applied to solve several complex learning problems including but not limited to multi-label classification, multi-category hierarchical classification, and label ranking.
Using multi-class queuing network to solve performance models of e-business sites.

PubMed

Zheng, Xiao-ying; Chen, De-ren

2004-01-01

Due to e-business's variety of customers with different navigational patterns and demands, multi-class queuing network is a natural performance model for it. The open multi-class queuing network(QN) models are based on the assumption that no service center is saturated as a result of the combined loads of all the classes. Several formulas are used to calculate performance measures, including throughput, residence time, queue length, response time and the average number of requests. The solution technique of closed multi-class QN models is an approximate mean value analysis algorithm (MVA) based on three key equations, because the exact algorithm needs huge time and space requirement. As mixed multi-class QN models, include some open and some closed classes, the open classes should be eliminated to create a closed multi-class QN so that the closed model algorithm can be applied. Some corresponding examples are given to show how to apply the algorithms mentioned in this article. These examples indicate that multi-class QN is a reasonably accurate model of e-business and can be solved efficiently.
Advanced and flexible multi-carrier receiver architecture for high-count multi-core fiber based space division multiplexed applications

PubMed Central

Asif, Rameez

2016-01-01

Space division multiplexing (SDM), incorporating multi-core fibers (MCFs), has been demonstrated for effectively maximizing the data capacity in an impending capacity crunch. To achieve high spectral-density through multi-carrier encoding while simultaneously maintaining transmission reach, benefits from inter-core crosstalk (XT) and non-linear compensation must be utilized. In this report, we propose a proof-of-concept unified receiver architecture that jointly compensates optical Kerr effects, intra- and inter-core XT in MCFs. The architecture is analysed in multi-channel 512 Gbit/s dual-carrier DP-16QAM system over 800 km 19-core MCF to validate the digital compensation of inter-core XT. Through this architecture: (a) we efficiently compensates the inter-core XT improving Q-factor by 4.82 dB and (b) achieve a momentous gain in transmission reach, increasing the maximum achievable distance from 480 km to 1208 km, via analytical analysis. Simulation results confirm that inter-core XT distortions are more relentless for cores fabricated around the central axis of cladding. Predominantly, XT induced Q-penalty can be suppressed to be less than 1 dB up-to −11.56 dB of inter-core XT over 800 km MCF, offering flexibility to fabricate dense core structures with same cladding diameter. Moreover, this report outlines the relationship between core pitch and forward-error correction (FEC). PMID:27270381
Parallelized computation for computer simulation of electrocardiograms using personal computers with multi-core CPU and general-purpose GPU.

PubMed

Shen, Wenfeng; Wei, Daming; Xu, Weimin; Zhu, Xin; Yuan, Shizhong

2010-10-01

Biological computations like electrocardiological modelling and simulation usually require high-performance computing environments. This paper introduces an implementation of parallel computation for computer simulation of electrocardiograms (ECGs) in a personal computer environment with an Intel CPU of Core (TM) 2 Quad Q6600 and a GPU of Geforce 8800GT, with software support by OpenMP and CUDA. It was tested in three parallelization device setups: (a) a four-core CPU without a general-purpose GPU, (b) a general-purpose GPU plus 1 core of CPU, and (c) a four-core CPU plus a general-purpose GPU. To effectively take advantage of a multi-core CPU and a general-purpose GPU, an algorithm based on load-prediction dynamic scheduling was developed and applied to setting (c). In the simulation with 1600 time steps, the speedup of the parallel computation as compared to the serial computation was 3.9 in setting (a), 16.8 in setting (b), and 20.0 in setting (c). This study demonstrates that a current PC with a multi-core CPU and a general-purpose GPU provides a good environment for parallel computations in biological modelling and simulation studies. Copyright 2010 Elsevier Ireland Ltd. All rights reserved.
A Hybrid FPGA/Tilera Compute Element for Autonomous Hazard Detection and Navigation

NASA Technical Reports Server (NTRS)

Villalpando, Carlos Y.; Werner, Robert A.; Carson, John M., III; Khanoyan, Garen; Stern, Ryan A.; Trawny, Nikolas

2013-01-01

To increase safety for future missions landing on other planetary or lunar bodies, the Autonomous Landing and Hazard Avoidance Technology (ALHAT) program is developing an integrated sensor for autonomous surface analysis and hazard determination. The ALHAT Hazard Detection System (HDS) consists of a Flash LIDAR for measuring the topography of the landing site, a gimbal to scan across the terrain, and an Inertial Measurement Unit (IMU), along with terrain analysis algorithms to identify the landing site and the local hazards. An FPGA and Manycore processor system was developed to interface all the devices in the HDS, to provide high-resolution timing to accurately measure system state, and to run the surface analysis algorithms quickly and efficiently. In this paper, we will describe how we integrated COTS components such as an FPGA evaluation board, a TILExpress64, and multi-threaded/multi-core aware software to build the HDS Compute Element (HDSCE). The ALHAT program is also working with the NASA Morpheus Project and has integrated the HDS as a sensor on the Morpheus Lander. This paper will also describe how the HDS is integrated with the Morpheus lander and the results of the initial test flights with the HDS installed. We will also describe future improvements to the HDSCE.
A hybrid FPGA/Tilera compute element for autonomous hazard detection and navigation

NASA Astrophysics Data System (ADS)

Villalpando, C. Y.; Werner, R. A.; Carson, J. M.; Khanoyan, G.; Stern, R. A.; Trawny, N.

To increase safety for future missions landing on other planetary or lunar bodies, the Autonomous Landing and Hazard Avoidance Technology (ALHAT) program is developing an integrated sensor for autonomous surface analysis and hazard determination. The ALHAT Hazard Detection System (HDS) consists of a Flash LIDAR for measuring the topography of the landing site, a gimbal to scan across the terrain, and an Inertial Measurement Unit (IMU), along with terrain analysis algorithms to identify the landing site and the local hazards. An FPGA and Manycore processor system was developed to interface all the devices in the HDS, to provide high-resolution timing to accurately measure system state, and to run the surface analysis algorithms quickly and efficiently. In this paper, we will describe how we integrated COTS components such as an FPGA evaluation board, a TILExpress64, and multi-threaded/multi-core aware software to build the HDS Compute Element (HDSCE). The ALHAT program is also working with the NASA Morpheus Project and has integrated the HDS as a sensor on the Morpheus Lander. This paper will also describe how the HDS is integrated with the Morpheus lander and the results of the initial test flights with the HDS installed. We will also describe future improvements to the HDSCE.
Classification algorithm of lung lobe for lung disease cases based on multislice CT images

NASA Astrophysics Data System (ADS)

Matsuhiro, M.; Kawata, Y.; Niki, N.; Nakano, Y.; Mishima, M.; Ohmatsu, H.; Tsuchida, T.; Eguchi, K.; Kaneko, M.; Moriyama, N.

2011-03-01

With the development of multi-slice CT technology, to obtain an accurate 3D image of lung field in a short time is possible. To support that, a lot of image processing methods need to be developed. In clinical setting for diagnosis of lung cancer, it is important to study and analyse lung structure. Therefore, classification of lung lobe provides useful information for lung cancer analysis. In this report, we describe algorithm which classify lungs into lung lobes for lung disease cases from multi-slice CT images. The classification algorithm of lung lobes is efficiently carried out using information of lung blood vessel, bronchus, and interlobar fissure. Applying the classification algorithms to multi-slice CT images of 20 normal cases and 5 lung disease cases, we demonstrate the usefulness of the proposed algorithms.
SU-E-J-88: Deformable Registration Using Multi-Resolution Demons Algorithm for 4DCT.

PubMed

Li, Dengwang; Yin, Yong

2012-06-01

In order to register 4DCT efficiently, we propose an improved deformable registration algorithm based on improved multi-resolution demons strategy to improve the efficiency of the algorithm. 4DCT images of lung cancer patients are collected from a General Electric Discovery ST CT scanner from our cancer hospital. All of the images are sorted into groups and reconstructed according to their phases, and eachrespiratory cycle is divided into 10 phases with the time interval of 10%. Firstly, in our improved demons algorithm we use gradients of both reference and floating images as deformation forces and also redistribute the forces according to the proportion of the two forces. Furthermore, we introduce intermediate variable to cost function for decreasing the noise in registration process. At the same time, Gaussian multi-resolution strategy and BFGS method for optimization are used to improve speed and accuracy of the registration. To validate the performance of the algorithm, we register the previous 10 phase-images. We compared the difference of floating and reference images before and after registered where two landmarks are decided by experienced clinician. We registered 10 phase-images of 4D-CT which is lung cancer patient from cancer hospital and choose images in exhalationas the reference images, and all other images were registered into the reference images. This method has a good accuracy demonstrated by a higher similarity measure for registration of 4D-CT and it can register a large deformation precisely. Finally, we obtain the tumor target achieved by the deformation fields using proposed method, which is more accurately than the internal margin (IM) expanded by the Gross Tumor Volume (GTV). Furthermore, we achieve tumor and normal tissue tracking and dose accumulation using 4DCT data. An efficient deformable registration algorithm was proposed by using multi-resolution demons algorithm for 4DCT. © 2012 American Association of Physicists in Medicine.
Hierarchical inorganic-organic multi-shell nanospheres for intervention and treatment of lead-contaminated blood

NASA Astrophysics Data System (ADS)

Khairy, Mohamed; El-Safty, Sherif A.; Shenashen, Mohamed. A.; Elshehy, Emad A.

2013-08-01

The highly toxic properties, bioavailability, and adverse effects of Pb2+ species on the environment and living organisms necessitate periodic monitoring and removal whenever possible of Pb2+ concentrations in the environment. In this study, we designed a novel optical multi-shell nanosphere sensor that enables selective recognition, unrestrained accessibility, continuous monitoring, and efficient removal (on the order of minutes) of Pb2+ ions from water and human blood, i.e., red blood cells (RBCs). The consequent decoration of the mesoporous core/double-shell silica nanospheres through a chemically responsive azo-chromophore with a long hydrophobic tail enabled us to create a unique hierarchical multi-shell sensor. We examined the efficiency of the multi-shell sensor in removing lead ions from the blood to ascertain the potential use of the sensor in medical applications. The lead-induced hemolysis of RBCs in the sensing/capture assay was inhibited by the ability of the hierarchical sensor to remove lead ions from blood. The results suggest the higher flux and diffusion of Pb2+ ions into the mesopores of the core/multi-shell sensor than into the RBC membranes. These findings indicate that the sensor could be used in the prevention of health risks associated with elevated blood lead levels such as anemia.The highly toxic properties, bioavailability, and adverse effects of Pb2+ species on the environment and living organisms necessitate periodic monitoring and removal whenever possible of Pb2+ concentrations in the environment. In this study, we designed a novel optical multi-shell nanosphere sensor that enables selective recognition, unrestrained accessibility, continuous monitoring, and efficient removal (on the order of minutes) of Pb2+ ions from water and human blood, i.e., red blood cells (RBCs). The consequent decoration of the mesoporous core/double-shell silica nanospheres through a chemically responsive azo-chromophore with a long hydrophobic tail enabled us to create a unique hierarchical multi-shell sensor. We examined the efficiency of the multi-shell sensor in removing lead ions from the blood to ascertain the potential use of the sensor in medical applications. The lead-induced hemolysis of RBCs in the sensing/capture assay was inhibited by the ability of the hierarchical sensor to remove lead ions from blood. The results suggest the higher flux and diffusion of Pb2+ ions into the mesopores of the core/multi-shell sensor than into the RBC membranes. These findings indicate that the sensor could be used in the prevention of health risks associated with elevated blood lead levels such as anemia. Electronic supplementary information (ESI) available: The experimental procedures for synthesis of AC-LHT, mesoporous core/double shell silica, and optical core/multi-shell sensors. The adsorption capacity, optical recognition of Pb ions, colorimetric response of Pb ions in ethanol medium, Langmuir adsorption isotherm and reusability of captor are addressed. See DOI: 10.1039/c3nr02403b

On the Improvement of Convergence Performance for Integrated Design of Wind Turbine Blade Using a Vector Dominating Multi-objective Evolution Algorithm

NASA Astrophysics Data System (ADS)

Wang, L.; Wang, T. G.; Wu, J. H.; Cheng, G. P.

2016-09-01

A novel multi-objective optimization algorithm incorporating evolution strategies and vector mechanisms, referred as VD-MOEA, is proposed and applied in aerodynamic- structural integrated design of wind turbine blade. In the algorithm, a set of uniformly distributed vectors is constructed to guide population in moving forward to the Pareto front rapidly and maintain population diversity with high efficiency. For example, two- and three- objective designs of 1.5MW wind turbine blade are subsequently carried out for the optimization objectives of maximum annual energy production, minimum blade mass, and minimum extreme root thrust. The results show that the Pareto optimal solutions can be obtained in one single simulation run and uniformly distributed in the objective space, maximally maintaining the population diversity. In comparison to conventional evolution algorithms, VD-MOEA displays dramatic improvement of algorithm performance in both convergence and diversity preservation for handling complex problems of multi-variables, multi-objectives and multi-constraints. This provides a reliable high-performance optimization approach for the aerodynamic-structural integrated design of wind turbine blade.
Image Segmentation Method Using Fuzzy C Mean Clustering Based on Multi-Objective Optimization

NASA Astrophysics Data System (ADS)

Chen, Jinlin; Yang, Chunzhi; Xu, Guangkui; Ning, Li

2018-04-01

Image segmentation is not only one of the hottest topics in digital image processing, but also an important part of computer vision applications. As one kind of image segmentation algorithms, fuzzy C-means clustering is an effective and concise segmentation algorithm. However, the drawback of FCM is that it is sensitive to image noise. To solve the problem, this paper designs a novel fuzzy C-mean clustering algorithm based on multi-objective optimization. We add a parameter λ to the fuzzy distance measurement formula to improve the multi-objective optimization. The parameter λ can adjust the weights of the pixel local information. In the algorithm, the local correlation of neighboring pixels is added to the improved multi-objective mathematical model to optimize the clustering cent. Two different experimental results show that the novel fuzzy C-means approach has an efficient performance and computational time while segmenting images by different type of noises.
We introduce an algorithm for the simultaneous reconstruction of faults and slip fields. We prove that the minimum of a related regularized functional converges to the unique solution of the fault inverse problem. We consider a Bayesian approach. We use a parallel multi-core platform and we discuss techniques to save on computational time.

NASA Astrophysics Data System (ADS)

Volkov, D.

2017-12-01

We introduce an algorithm for the simultaneous reconstruction of faults and slip fields on those faults. We define a regularized functional to be minimized for the reconstruction. We prove that the minimum of that functional converges to the unique solution of the related fault inverse problem. Due to inherent uncertainties in measurements, rather than seeking a deterministic solution to the fault inverse problem, we consider a Bayesian approach. The advantage of such an approach is that we obtain a way of quantifying uncertainties as part of our final answer. On the downside, this Bayesian approach leads to a very large computation. To contend with the size of this computation we developed an algorithm for the numerical solution to the stochastic minimization problem which can be easily implemented on a parallel multi-core platform and we discuss techniques to save on computational time. After showing how this algorithm performs on simulated data and assessing the effect of noise, we apply it to measured data. The data was recorded during a slow slip event in Guerrero, Mexico.
Floating-point performance of ARM cores and their efficiency in classical molecular dynamics

NASA Astrophysics Data System (ADS)

Nikolskiy, V.; Stegailov, V.

2016-02-01

Supercomputing of the exascale era is going to be inevitably limited by power efficiency. Nowadays different possible variants of CPU architectures are considered. Recently the development of ARM processors has come to the point when their floating point performance can be seriously considered for a range of scientific applications. In this work we present the analysis of the floating point performance of the latest ARM cores and their efficiency for the algorithms of classical molecular dynamics.
Evolution of the ATLAS Software Framework towards Concurrency

NASA Astrophysics Data System (ADS)

Jones, R. W. L.; Stewart, G. A.; Leggett, C.; Wynne, B. M.

2015-05-01

The ATLAS experiment has successfully used its Gaudi/Athena software framework for data taking and analysis during the first LHC run, with billions of events successfully processed. However, the design of Gaudi/Athena dates from early 2000 and the software and the physics code has been written using a single threaded, serial design. This programming model has increasing difficulty in exploiting the potential of current CPUs, which offer their best performance only through taking full advantage of multiple cores and wide vector registers. Future CPU evolution will intensify this trend, with core counts increasing and memory per core falling. Maximising performance per watt will be a key metric, so all of these cores must be used as efficiently as possible. In order to address the deficiencies of the current framework, ATLAS has embarked upon two projects: first, a practical demonstration of the use of multi-threading in our reconstruction software, using the GaudiHive framework; second, an exercise to gather requirements for an updated framework, going back to the first principles of how event processing occurs. In this paper we report on both these aspects of our work. For the hive based demonstrators, we discuss what changes were necessary in order to allow the serially designed ATLAS code to run, both to the framework and to the tools and algorithms used. We report on what general lessons were learned about the code patterns that had been employed in the software and which patterns were identified as particularly problematic for multi-threading. These lessons were fed into our considerations of a new framework and we present preliminary conclusions on this work. In particular we identify areas where the framework can be simplified in order to aid the implementation of a concurrent event processing scheme. Finally, we discuss the practical difficulties involved in migrating a large established code base to a multi-threaded framework and how this can be achieved for LHC Run 3.
Parallelization of interpolation, solar radiation and water flow simulation modules in GRASS GIS using OpenMP

NASA Astrophysics Data System (ADS)

Hofierka, Jaroslav; Lacko, Michal; Zubal, Stanislav

2017-10-01

In this paper, we describe the parallelization of three complex and computationally intensive modules of GRASS GIS using the OpenMP application programming interface for multi-core computers. These include the v.surf.rst module for spatial interpolation, the r.sun module for solar radiation modeling and the r.sim.water module for water flow simulation. We briefly describe the functionality of the modules and parallelization approaches used in the modules. Our approach includes the analysis of the module's functionality, identification of source code segments suitable for parallelization and proper application of OpenMP parallelization code to create efficient threads processing the subtasks. We document the efficiency of the solutions using the airborne laser scanning data representing land surface in the test area and derived high-resolution digital terrain model grids. We discuss the performance speed-up and parallelization efficiency depending on the number of processor threads. The study showed a substantial increase in computation speeds on a standard multi-core computer while maintaining the accuracy of results in comparison to the output from original modules. The presented parallelization approach showed the simplicity and efficiency of the parallelization of open-source GRASS GIS modules using OpenMP, leading to an increased performance of this geospatial software on standard multi-core computers.
An enhanced multi-view vertical line locus matching algorithm of object space ground primitives based on positioning consistency for aerial and space images

NASA Astrophysics Data System (ADS)

Zhang, Ka; Sheng, Yehua; Wang, Meizhen; Fu, Suxia

2018-05-01

The traditional multi-view vertical line locus (TMVLL) matching method is an object-space-based method that is commonly used to directly acquire spatial 3D coordinates of ground objects in photogrammetry. However, the TMVLL method can only obtain one elevation and lacks an accurate means of validating the matching results. In this paper, we propose an enhanced multi-view vertical line locus (EMVLL) matching algorithm based on positioning consistency for aerial or space images. The algorithm involves three components: confirming candidate pixels of the ground primitive in the base image, multi-view image matching based on the object space constraints for all candidate pixels, and validating the consistency of the object space coordinates with the multi-view matching result. The proposed algorithm was tested using actual aerial images and space images. Experimental results show that the EMVLL method successfully solves the problems associated with the TMVLL method, and has greater reliability, accuracy and computing efficiency.
Mono and multi-objective optimization techniques applied to a large range of industrial test cases using Metamodel assisted Evolutionary Algorithms

NASA Astrophysics Data System (ADS)

Fourment, Lionel; Ducloux, Richard; Marie, Stéphane; Ejday, Mohsen; Monnereau, Dominique; Massé, Thomas; Montmitonnet, Pierre

2010-06-01

The use of material processing numerical simulation allows a strategy of trial and error to improve virtual processes without incurring material costs or interrupting production and therefore save a lot of money, but it requires user time to analyze the results, adjust the operating conditions and restart the simulation. Automatic optimization is the perfect complement to simulation. Evolutionary Algorithm coupled with metamodelling makes it possible to obtain industrially relevant results on a very large range of applications within a few tens of simulations and without any specific automatic optimization technique knowledge. Ten industrial partners have been selected to cover the different area of the mechanical forging industry and provide different examples of the forming simulation tools. It aims to demonstrate that it is possible to obtain industrially relevant results on a very large range of applications within a few tens of simulations and without any specific automatic optimization technique knowledge. The large computational time is handled by a metamodel approach. It allows interpolating the objective function on the entire parameter space by only knowing the exact function values at a reduced number of "master points". Two algorithms are used: an evolution strategy combined with a Kriging metamodel and a genetic algorithm combined with a Meshless Finite Difference Method. The later approach is extended to multi-objective optimization. The set of solutions, which corresponds to the best possible compromises between the different objectives, is then computed in the same way. The population based approach allows using the parallel capabilities of the utilized computer with a high efficiency. An optimization module, fully embedded within the Forge2009 IHM, makes possible to cover all the defined examples, and the use of new multi-core hardware to compute several simulations at the same time reduces the needed time dramatically. The presented examples demonstrate the method versatility. They include billet shape optimization of a common rail, the cogging of a bar and a wire drawing problem.
Using OpenMP vs. Threading Building Blocks for Medical Imaging on Multi-cores

NASA Astrophysics Data System (ADS)

Kegel, Philipp; Schellmann, Maraike; Gorlatch, Sergei

We compare two parallel programming approaches for multi-core systems: the well-known OpenMP and the recently introduced Threading Building Blocks (TBB) library by Intel®. The comparison is made using the parallelization of a real-world numerical algorithm for medical imaging. We develop several parallel implementations, and compare them w.r.t. programming effort, programming style and abstraction, and runtime performance. We show that TBB requires a considerable program re-design, whereas with OpenMP simple compiler directives are sufficient. While TBB appears to be less appropriate for parallelizing existing implementations, it fosters a good programming style and higher abstraction level for newly developed parallel programs. Our experimental measurements on a dual quad-core system demonstrate that OpenMP slightly outperforms TBB in our implementation.
Applying an efficient K-nearest neighbor search to forest attribute imputation

Treesearch

Andrew O. Finley; Ronald E. McRoberts; Alan R. Ek

2006-01-01

This paper explores the utility of an efficient nearest neighbor (NN) search algorithm for applications in multi-source kNN forest attribute imputation. The search algorithm reduces the number of distance calculations between a given target vector and each reference vector, thereby, decreasing the time needed to discover the NN subset. Results of five trials show gains...
Multi-Source Multi-Target Dictionary Learning for Prediction of Cognitive Decline.

PubMed

Zhang, Jie; Li, Qingyang; Caselli, Richard J; Thompson, Paul M; Ye, Jieping; Wang, Yalin

2017-06-01

Alzheimer's Disease (AD) is the most common type of dementia. Identifying correct biomarkers may determine pre-symptomatic AD subjects and enable early intervention. Recently, Multi-task sparse feature learning has been successfully applied to many computer vision and biomedical informatics researches. It aims to improve the generalization performance by exploiting the shared features among different tasks. However, most of the existing algorithms are formulated as a supervised learning scheme. Its drawback is with either insufficient feature numbers or missing label information. To address these challenges, we formulate an unsupervised framework for multi-task sparse feature learning based on a novel dictionary learning algorithm. To solve the unsupervised learning problem, we propose a two-stage Multi-Source Multi-Target Dictionary Learning (MMDL) algorithm. In stage 1, we propose a multi-source dictionary learning method to utilize the common and individual sparse features in different time slots. In stage 2, supported by a rigorous theoretical analysis, we develop a multi-task learning method to solve the missing label problem. Empirical studies on an N = 3970 longitudinal brain image data set, which involves 2 sources and 5 targets, demonstrate the improved prediction accuracy and speed efficiency of MMDL in comparison with other state-of-the-art algorithms.
FAST: framework for heterogeneous medical image computing and visualization.

PubMed

Smistad, Erik; Bozorgi, Mohammadmehdi; Lindseth, Frank

2015-11-01

Computer systems are becoming increasingly heterogeneous in the sense that they consist of different processors, such as multi-core CPUs and graphic processing units. As the amount of medical image data increases, it is crucial to exploit the computational power of these processors. However, this is currently difficult due to several factors, such as driver errors, processor differences, and the need for low-level memory handling. This paper presents a novel FrAmework for heterogeneouS medical image compuTing and visualization (FAST). The framework aims to make it easier to simultaneously process and visualize medical images efficiently on heterogeneous systems. FAST uses common image processing programming paradigms and hides the details of memory handling from the user, while enabling the use of all processors and cores on a system. The framework is open-source, cross-platform and available online. Code examples and performance measurements are presented to show the simplicity and efficiency of FAST. The results are compared to the insight toolkit (ITK) and the visualization toolkit (VTK) and show that the presented framework is faster with up to 20 times speedup on several common medical imaging algorithms. FAST enables efficient medical image computing and visualization on heterogeneous systems. Code examples and performance evaluations have demonstrated that the toolkit is both easy to use and performs better than existing frameworks, such as ITK and VTK.
A master-slave parallel hybrid multi-objective evolutionary algorithm for groundwater remediation design under general hydrogeological conditions

NASA Astrophysics Data System (ADS)

Wu, J.; Yang, Y.; Luo, Q.; Wu, J.

2012-12-01

This study presents a new hybrid multi-objective evolutionary algorithm, the niched Pareto tabu search combined with a genetic algorithm (NPTSGA), whereby the global search ability of niched Pareto tabu search (NPTS) is improved by the diversification of candidate solutions arose from the evolving nondominated sorting genetic algorithm II (NSGA-II) population. Also, the NPTSGA coupled with the commonly used groundwater flow and transport codes, MODFLOW and MT3DMS, is developed for multi-objective optimal design of groundwater remediation systems. The proposed methodology is then applied to a large-scale field groundwater remediation system for cleanup of large trichloroethylene (TCE) plume at the Massachusetts Military Reservation (MMR) in Cape Cod, Massachusetts. Furthermore, a master-slave (MS) parallelization scheme based on the Message Passing Interface (MPI) is incorporated into the NPTSGA to implement objective function evaluations in distributed processor environment, which can greatly improve the efficiency of the NPTSGA in finding Pareto-optimal solutions to the real-world application. This study shows that the MS parallel NPTSGA in comparison with the original NPTS and NSGA-II can balance the tradeoff between diversity and optimality of solutions during the search process and is an efficient and effective tool for optimizing the multi-objective design of groundwater remediation systems under complicated hydrogeologic conditions.
Accelerating Demand Paging for Local and Remote Out-of-Core Visualization

NASA Technical Reports Server (NTRS)

Ellsworth, David

2001-01-01

This paper describes a new algorithm that improves the performance of application-controlled demand paging for the out-of-core visualization of data sets that are on either local disks or disks on remote servers. The performance improvements come from better overlapping the computation with the page reading process, and by performing multiple page reads in parallel. The new algorithm can be applied to many different visualization algorithms since application-controlled demand paging is not specific to any visualization algorithm. The paper includes measurements that show that the new multi-threaded paging algorithm decreases the time needed to compute visualizations by one third when using one processor and reading data from local disk. The time needed when using one processor and reading data from remote disk decreased by up to 60%. Visualization runs using data from remote disk ran about as fast as ones using data from local disk because the remote runs were able to make use of the remote server's high performance disk array.
Data Acquisition System for Multi-Frequency Radar Flight Operations Preparation

NASA Technical Reports Server (NTRS)

Leachman, Jonathan

2010-01-01

A three-channel data acquisition system was developed for the NASA Multi-Frequency Radar (MFR) system. The system is based on a commercial-off-the-shelf (COTS) industrial PC (personal computer) and two dual-channel 14-bit digital receiver cards. The decimated complex envelope representations of the three radar signals are passed to the host PC via the PCI bus, and then processed in parallel by multiple cores of the PC CPU (central processing unit). The innovation is this parallelization of the radar data processing using multiple cores of a standard COTS multi-core CPU. The data processing portion of the data acquisition software was built using autonomous program modules or threads, which can run simultaneously on different cores. A master program module calculates the optimal number of processing threads, launches them, and continually supplies each with data. The benefit of this new parallel software architecture is that COTS PCs can be used to implement increasingly complex processing algorithms on an increasing number of radar range gates and data rates. As new PCs become available with higher numbers of CPU cores, the software will automatically utilize the additional computational capacity.
An 81.6 μW FastICA processor for epileptic seizure detection.

PubMed

Yang, Chia-Hsiang; Shih, Yi-Hsin; Chiueh, Herming

2015-02-01

To improve the performance of epileptic seizure detection, independent component analysis (ICA) is applied to multi-channel signals to separate artifacts and signals of interest. FastICA is an efficient algorithm to compute ICA. To reduce the energy dissipation, eigenvalue decomposition (EVD) is utilized in the preprocessing stage to reduce the convergence time of iterative calculation of ICA components. EVD is computed efficiently through an array structure of processing elements running in parallel. Area-efficient EVD architecture is realized by leveraging the approximate Jacobi algorithm, leading to a 77.2% area reduction. By choosing proper memory element and reduced wordlength, the power and area of storage memory are reduced by 95.6% and 51.7%, respectively. The chip area is minimized through fixed-point implementation and architectural transformations. Given a latency constraint of 0.1 s, an 86.5% area reduction is achieved compared to the direct-mapped architecture. Fabricated in 90 nm CMOS, the core area of the chip is 0.40 mm(2). The FastICA processor, part of an integrated epileptic control SoC, dissipates 81.6 μW at 0.32 V. The computation delay of a frame of 256 samples for 8 channels is 84.2 ms. Compared to prior work, 0.5% power dissipation, 26.7% silicon area, and 3.4 × computation speedup are achieved. The performance of the chip was verified by human dataset.
Efficient algorithms and implementations of entropy-based moment closures for rarefied gases

DOE Office of Scientific and Technical Information (OSTI.GOV)

Schaerer, Roman Pascal, E-mail: schaerer@mathcces.rwth-aachen.de; Bansal, Pratyuksh; Torrilhon, Manuel

We present efficient algorithms and implementations of the 35-moment system equipped with the maximum-entropy closure in the context of rarefied gases. While closures based on the principle of entropy maximization have been shown to yield very promising results for moderately rarefied gas flows, the computational cost of these closures is in general much higher than for closure theories with explicit closed-form expressions of the closing fluxes, such as Grad's classical closure. Following a similar approach as Garrett et al. (2015) , we investigate efficient implementations of the computationally expensive numerical quadrature method used for the moment evaluations of the maximum-entropymore » distribution by exploiting its inherent fine-grained parallelism with the parallelism offered by multi-core processors and graphics cards. We show that using a single graphics card as an accelerator allows speed-ups of two orders of magnitude when compared to a serial CPU implementation. To accelerate the time-to-solution for steady-state problems, we propose a new semi-implicit time discretization scheme. The resulting nonlinear system of equations is solved with a Newton type method in the Lagrange multipliers of the dual optimization problem in order to reduce the computational cost. Additionally, fully explicit time-stepping schemes of first and second order accuracy are presented. We investigate the accuracy and efficiency of the numerical schemes for several numerical test cases, including a steady-state shock-structure problem.« less
Evaluation of Emerging Energy-Efficient Heterogeneous Computing Platforms for Biomolecular and Cellular Simulation Workloads

PubMed Central

Stone, John E.; Hallock, Michael J.; Phillips, James C.; Peterson, Joseph R.; Luthey-Schulten, Zaida; Schulten, Klaus

2016-01-01

Many of the continuing scientific advances achieved through computational biology are predicated on the availability of ongoing increases in computational power required for detailed simulation and analysis of cellular processes on biologically-relevant timescales. A critical challenge facing the development of future exascale supercomputer systems is the development of new computing hardware and associated scientific applications that dramatically improve upon the energy efficiency of existing solutions, while providing increased simulation, analysis, and visualization performance. Mobile computing platforms have recently become powerful enough to support interactive molecular visualization tasks that were previously only possible on laptops and workstations, creating future opportunities for their convenient use for meetings, remote collaboration, and as head mounted displays for immersive stereoscopic viewing. We describe early experiences adapting several biomolecular simulation and analysis applications for emerging heterogeneous computing platforms that combine power-efficient system-on-chip multi-core CPUs with high-performance massively parallel GPUs. We present low-cost power monitoring instrumentation that provides sufficient temporal resolution to evaluate the power consumption of individual CPU algorithms and GPU kernels. We compare the performance and energy efficiency of scientific applications running on emerging platforms with results obtained on traditional platforms, identify hardware and algorithmic performance bottlenecks that affect the usability of these platforms, and describe avenues for improving both the hardware and applications in pursuit of the needs of molecular modeling tasks on mobile devices and future exascale computers. PMID:27516922
Synthesis of the adaptive continuous system for the multi-axle wheeled vehicle body oscillation damping

NASA Astrophysics Data System (ADS)

Zhileykin, M. M.; Kotiev, G. O.; Nagatsev, M. V.

2018-02-01

In order to meet the growing mobility requirements for the wheeled vehicles on all types of terrain the engineers have to develop a large number of specialized control algorithms for the multi-axle wheeled vehicle (MWV) suspension improving such qualities as ride comfort, handling and stability. The authors have developed an adaptive algorithm of the dynamic damping of the MVW body oscillations. The algorithm provides high ride comfort and high mobility of the vehicle. The article discloses a method for synthesis of an adaptive dynamic continuous algorithm of the MVW body oscillation damping and provides simulation results proving high efficiency of the developed control algorithm.
Multi-scale graph-cut algorithm for efficient water-fat separation.

PubMed

Berglund, Johan; Skorpil, Mikael

2017-09-01

To improve the accuracy and robustness to noise in water-fat separation by unifying the multiscale and graph cut based approaches to B 0 -correction. A previously proposed water-fat separation algorithm that corrects for B 0 field inhomogeneity in 3D by a single quadratic pseudo-Boolean optimization (QPBO) graph cut was incorporated into a multi-scale framework, where field map solutions are propagated from coarse to fine scales for voxels that are not resolved by the graph cut. The accuracy of the single-scale and multi-scale QPBO algorithms was evaluated against benchmark reference datasets. The robustness to noise was evaluated by adding noise to the input data prior to water-fat separation. Both algorithms achieved the highest accuracy when compared with seven previously published methods, while computation times were acceptable for implementation in clinical routine. The multi-scale algorithm was more robust to noise than the single-scale algorithm, while causing only a small increase (+10%) of the reconstruction time. The proposed 3D multi-scale QPBO algorithm offers accurate water-fat separation, robustness to noise, and fast reconstruction. The software implementation is freely available to the research community. Magn Reson Med 78:941-949, 2017. © 2016 International Society for Magnetic Resonance in Medicine. © 2016 International Society for Magnetic Resonance in Medicine.

Multidisciplinary Multiobjective Optimal Design for Turbomachinery Using Evolutionary Algorithm

NASA Technical Reports Server (NTRS)

2005-01-01

This report summarizes Dr. Lian s efforts toward developing a robust and efficient tool for multidisciplinary and multi-objective optimal design for turbomachinery using evolutionary algorithms. This work consisted of two stages. The first stage (from July 2003 to June 2004) Dr. Lian focused on building essential capabilities required for the project. More specifically, Dr. Lian worked on two subjects: an enhanced genetic algorithm (GA) and an integrated optimization system with a GA and a surrogate model. The second stage (from July 2004 to February 2005) Dr. Lian formulated aerodynamic optimization and structural optimization into a multi-objective optimization problem and performed multidisciplinary and multi-objective optimizations on a transonic compressor blade based on the proposed model. Dr. Lian s numerical results showed that the proposed approach can effectively reduce the blade weight and increase the stage pressure ratio in an efficient manner. In addition, the new design was structurally safer than the original design. Five conference papers and three journal papers were published on this topic by Dr. Lian.
Adaptive Numerical Algorithms in Space Weather Modeling

NASA Technical Reports Server (NTRS)

Toth, Gabor; vanderHolst, Bart; Sokolov, Igor V.; DeZeeuw, Darren; Gombosi, Tamas I.; Fang, Fang; Manchester, Ward B.; Meng, Xing; Nakib, Dalal; Powell, Kenneth G.;

2010-01-01

Space weather describes the various processes in the Sun-Earth system that present danger to human health and technology. The goal of space weather forecasting is to provide an opportunity to mitigate these negative effects. Physics-based space weather modeling is characterized by disparate temporal and spatial scales as well as by different physics in different domains. A multi-physics system can be modeled by a software framework comprising of several components. Each component corresponds to a physics domain, and each component is represented by one or more numerical models. The publicly available Space Weather Modeling Framework (SWMF) can execute and couple together several components distributed over a parallel machine in a flexible and efficient manner. The framework also allows resolving disparate spatial and temporal scales with independent spatial and temporal discretizations in the various models. Several of the computationally most expensive domains of the framework are modeled by the Block-Adaptive Tree Solar wind Roe Upwind Scheme (BATS-R-US) code that can solve various forms of the magnetohydrodynamics (MHD) equations, including Hall, semi-relativistic, multi-species and multi-fluid MHD, anisotropic pressure, radiative transport and heat conduction. Modeling disparate scales within BATS-R-US is achieved by a block-adaptive mesh both in Cartesian and generalized coordinates. Most recently we have created a new core for BATS-R-US: the Block-Adaptive Tree Library (BATL) that provides a general toolkit for creating, load balancing and message passing in a 1, 2 or 3 dimensional block-adaptive grid. We describe the algorithms of BATL and demonstrate its efficiency and scaling properties for various problems. BATS-R-US uses several time-integration schemes to address multiple time-scales: explicit time stepping with fixed or local time steps, partially steady-state evolution, point-implicit, semi-implicit, explicit/implicit, and fully implicit numerical schemes. Depending on the application, we find that different time stepping methods are optimal. Several of the time integration schemes exploit the block-based granularity of the grid structure. The framework and the adaptive algorithms enable physics based space weather modeling and even forecasting.

Acceleration of the Particle Swarm Optimization for Peierls-Nabarro modeling of dislocations in conventional and high-entropy alloys

NASA Astrophysics Data System (ADS)

Pei, Zongrui; Eisenbach, Markus

2017-06-01

Dislocations are among the most important defects in determining the mechanical properties of both conventional alloys and high-entropy alloys. The Peierls-Nabarro model supplies an efficient pathway to their geometries and mobility. The difficulty in solving the integro-differential Peierls-Nabarro equation is how to effectively avoid the local minima in the energy landscape of a dislocation core. Among the other methods to optimize the dislocation core structures, we choose the algorithm of Particle Swarm Optimization, an algorithm that simulates the social behaviors of organisms. By employing more particles (bigger swarm) and more iterative steps (allowing them to explore for longer time), the local minima can be effectively avoided. But this would require more computational cost. The advantage of this algorithm is that it is readily parallelized in modern high computing architecture. We demonstrate the performance of our parallelized algorithm scales linearly with the number of employed cores.
Scheduling algorithm for data relay satellite optical communication based on artificial intelligent optimization

NASA Astrophysics Data System (ADS)

Zhao, Wei-hu; Zhao, Jing; Zhao, Shang-hong; Li, Yong-jun; Wang, Xiang; Dong, Yi; Dong, Chen

2013-08-01

Optical satellite communication with the advantages of broadband, large capacity and low power consuming broke the bottleneck of the traditional microwave satellite communication. The formation of the Space-based Information System with the technology of high performance optical inter-satellite communication and the realization of global seamless coverage and mobile terminal accessing are the necessary trend of the development of optical satellite communication. Considering the resources, missions and restraints of Data Relay Satellite Optical Communication System, a model of optical communication resources scheduling is established and a scheduling algorithm based on artificial intelligent optimization is put forwarded. According to the multi-relay-satellite, multi-user-satellite, multi-optical-antenna and multi-mission with several priority weights, the resources are scheduled reasonable by the operation: "Ascertain Current Mission Scheduling Time" and "Refresh Latter Mission Time-Window". The priority weight is considered as the parameter of the fitness function and the scheduling project is optimized by the Genetic Algorithm. The simulation scenarios including 3 relay satellites with 6 optical antennas, 12 user satellites and 30 missions, the simulation result reveals that the algorithm obtain satisfactory results in both efficiency and performance and resources scheduling model and the optimization algorithm are suitable in multi-relay-satellite, multi-user-satellite, and multi-optical-antenna recourses scheduling problem.
Complete synthetic seismograms based on a spherical self-gravitating Earth model with an atmosphere-ocean-mantle-core structure

NASA Astrophysics Data System (ADS)

Wang, Rongjiang; Heimann, Sebastian; Zhang, Yong; Wang, Hansheng; Dahm, Torsten

2017-04-01

A hybrid method is proposed to calculate complete synthetic seismograms based on a spherically symmetric and self-gravitating Earth with a multi-layered structure of atmosphere, ocean, mantle, liquid core and solid core. For large wavelengths, a numerical scheme is used to solve the geodynamic boundary-value problem without any approximation on the deformation and gravity coupling. With the decreasing wavelength, the gravity effect on the deformation becomes negligible and the analytical propagator scheme can be used. Many useful approaches are used to overcome the numerical problems that may arise in both analytical and numerical schemes. Some of these approaches have been established in the seismological community and the others are developed for the first time. Based on the stable and efficient hybrid algorithm, an all-in-one code QSSP is implemented to cover the complete spectrum of seismological interests. The performance of the code is demonstrated by various tests including the curvature effect on teleseismic body and surface waves, the appearance of multiple reflected, teleseismic core phases, the gravity effect on long period surface waves and free oscillations, the simulation of near-field displacement seismograms with the static offset, the coupling of tsunami and infrasound waves, and free oscillations of the solid Earth, the atmosphere and the ocean. QSSP is open source software that can be used as a stand-alone FORTRAN code or may be applied in combination with a Python toolbox to calculate and handle Green's function databases for efficient coding of source inversion problems.
Vectorized algorithms for spiking neural network simulation.

PubMed

Brette, Romain; Goodman, Dan F M

2011-06-01

High-level languages (Matlab, Python) are popular in neuroscience because they are flexible and accelerate development. However, for simulating spiking neural networks, the cost of interpretation is a bottleneck. We describe a set of algorithms to simulate large spiking neural networks efficiently with high-level languages using vector-based operations. These algorithms constitute the core of Brian, a spiking neural network simulator written in the Python language. Vectorized simulation makes it possible to combine the flexibility of high-level languages with the computational efficiency usually associated with compiled languages.
A fast and automatic fusion algorithm for unregistered multi-exposure image sequence

NASA Astrophysics Data System (ADS)

Liu, Yan; Yu, Feihong

2014-09-01

Human visual system (HVS) can visualize all the brightness levels of the scene through visual adaptation. However, the dynamic range of most commercial digital cameras and display devices are smaller than the dynamic range of human eye. This implies low dynamic range (LDR) images captured by normal digital camera may lose image details. We propose an efficient approach to high dynamic (HDR) image fusion that copes with image displacement and image blur degradation in a computationally efficient manner, which is suitable for implementation on mobile devices. The various image registration algorithms proposed in the previous literatures are unable to meet the efficiency and performance requirements in the application of mobile devices. In this paper, we selected Oriented Brief (ORB) detector to extract local image structures. The descriptor selected in multi-exposure image fusion algorithm has to be fast and robust to illumination variations and geometric deformations. ORB descriptor is the best candidate in our algorithm. Further, we perform an improved RANdom Sample Consensus (RANSAC) algorithm to reject incorrect matches. For the fusion of images, a new approach based on Stationary Wavelet Transform (SWT) is used. The experimental results demonstrate that the proposed algorithm generates high quality images at low computational cost. Comparisons with a number of other feature matching methods show that our method gets better performance.
Fine-grained parallel RNAalifold algorithm for RNA secondary structure prediction on FPGA

PubMed Central

Xia, Fei; Dou, Yong; Zhou, Xingming; Yang, Xuejun; Xu, Jiaqing; Zhang, Yang

2009-01-01

Background In the field of RNA secondary structure prediction, the RNAalifold algorithm is one of the most popular methods using free energy minimization. However, general-purpose computers including parallel computers or multi-core computers exhibit parallel efficiency of no more than 50%. Field Programmable Gate-Array (FPGA) chips provide a new approach to accelerate RNAalifold by exploiting fine-grained custom design. Results RNAalifold shows complicated data dependences, in which the dependence distance is variable, and the dependence direction is also across two dimensions. We propose a systolic array structure including one master Processing Element (PE) and multiple slave PEs for fine grain hardware implementation on FPGA. We exploit data reuse schemes to reduce the need to load energy matrices from external memory. We also propose several methods to reduce energy table parameter size by 80%. Conclusion To our knowledge, our implementation with 16 PEs is the only FPGA accelerator implementing the complete RNAalifold algorithm. The experimental results show a factor of 12.2 speedup over the RNAalifold (ViennaPackage – 1.6.5) software for a group of aligned RNA sequences with 2981-residue running on a Personal Computer (PC) platform with Pentium 4 2.6 GHz CPU. PMID:19208138
Block-suffix shifting: fast, simultaneous medical concept set identification in large medical record corpora.

PubMed

Liu, Ying; Lita, Lucian Vlad; Niculescu, Radu Stefan; Mitra, Prasenjit; Giles, C Lee

2008-11-06

Owing to new advances in computer hardware, large text databases have become more prevalent than ever.Automatically mining information from these databases proves to be a challenge due to slow pattern/string matching techniques. In this paper we present a new, fast multi-string pattern matching method based on the well known Aho-Chorasick algorithm. Advantages of our algorithm include:the ability to exploit the natural structure of text, the ability to perform significant character shifting, avoiding backtracking jumps that are not useful, efficiency in terms of matching time and avoiding the typical "sub-string" false positive errors.Our algorithm is applicable to many fields with free text, such as the health care domain and the scientific document field. In this paper, we apply the BSS algorithm to health care data and mine hundreds of thousands of medical concepts from a large Electronic Medical Record (EMR) corpora simultaneously and efficiently. Experimental results show the superiority of our algorithm when compared with the top of the line multi-string matching algorithms.
Optimization of image processing algorithms on mobile platforms

NASA Astrophysics Data System (ADS)

Poudel, Pramod; Shirvaikar, Mukul

2011-03-01

This work presents a technique to optimize popular image processing algorithms on mobile platforms such as cell phones, net-books and personal digital assistants (PDAs). The increasing demand for video applications like context-aware computing on mobile embedded systems requires the use of computationally intensive image processing algorithms. The system engineer has a mandate to optimize them so as to meet real-time deadlines. A methodology to take advantage of the asymmetric dual-core processor, which includes an ARM and a DSP core supported by shared memory, is presented with implementation details. The target platform chosen is the popular OMAP 3530 processor for embedded media systems. It has an asymmetric dual-core architecture with an ARM Cortex-A8 and a TMS320C64x Digital Signal Processor (DSP). The development platform was the BeagleBoard with 256 MB of NAND RAM and 256 MB SDRAM memory. The basic image correlation algorithm is chosen for benchmarking as it finds widespread application for various template matching tasks such as face-recognition. The basic algorithm prototypes conform to OpenCV, a popular computer vision library. OpenCV algorithms can be easily ported to the ARM core which runs a popular operating system such as Linux or Windows CE. However, the DSP is architecturally more efficient at handling DFT algorithms. The algorithms are tested on a variety of images and performance results are presented measuring the speedup obtained due to dual-core implementation. A major advantage of this approach is that it allows the ARM processor to perform important real-time tasks, while the DSP addresses performance-hungry algorithms.
An Energy-Aware Runtime Management of Multi-Core Sensory Swarms.

PubMed

Kim, Sungchan; Yang, Hoeseok

2017-08-24

In sensory swarms, minimizing energy consumption under performance constraint is one of the key objectives. One possible approach to this problem is to monitor application workload that is subject to change at runtime, and to adjust system configuration adaptively to satisfy the performance goal. As today's sensory swarms are usually implemented using multi-core processors with adjustable clock frequency, we propose to monitor the CPU workload periodically and adjust the task-to-core allocation or clock frequency in an energy-efficient way in response to the workload variations. In doing so, we present an online heuristic that determines the most energy-efficient adjustment that satisfies the performance requirement. The proposed method is based on a simple yet effective energy model that is built upon performance prediction using IPC (instructions per cycle) measured online and power equation derived empirically. The use of IPC accounts for memory intensities of a given workload, enabling the accurate prediction of execution time. Hence, the model allows us to rapidly and accurately estimate the effect of the two control knobs, clock frequency adjustment and core allocation. The experiments show that the proposed technique delivers considerable energy saving of up to 45%compared to the state-of-the-art multi-core energy management technique.
An Energy-Aware Runtime Management of Multi-Core Sensory Swarms

PubMed Central

Kim, Sungchan

2017-01-01

In sensory swarms, minimizing energy consumption under performance constraint is one of the key objectives. One possible approach to this problem is to monitor application workload that is subject to change at runtime, and to adjust system configuration adaptively to satisfy the performance goal. As today’s sensory swarms are usually implemented using multi-core processors with adjustable clock frequency, we propose to monitor the CPU workload periodically and adjust the task-to-core allocation or clock frequency in an energy-efficient way in response to the workload variations. In doing so, we present an online heuristic that determines the most energy-efficient adjustment that satisfies the performance requirement. The proposed method is based on a simple yet effective energy model that is built upon performance prediction using IPC (instructions per cycle) measured online and power equation derived empirically. The use of IPC accounts for memory intensities of a given workload, enabling the accurate prediction of execution time. Hence, the model allows us to rapidly and accurately estimate the effect of the two control knobs, clock frequency adjustment and core allocation. The experiments show that the proposed technique delivers considerable energy saving of up to 45%compared to the state-of-the-art multi-core energy management technique. PMID:28837094
Optimization of Land Use Suitability for Agriculture Using Integrated Geospatial Model and Genetic Algorithms

NASA Astrophysics Data System (ADS)

Mansor, S. B.; Pormanafi, S.; Mahmud, A. R. B.; Pirasteh, S.

2012-08-01

In this study, a geospatial model for land use allocation was developed from the view of simulating the biological autonomous adaptability to environment and the infrastructural preference. The model was developed based on multi-agent genetic algorithm. The model was customized to accommodate the constraint set for the study area, namely the resource saving and environmental-friendly. The model was then applied to solve the practical multi-objective spatial optimization allocation problems of land use in the core region of Menderjan Basin in Iran. The first task was to study the dominant crops and economic suitability evaluation of land. Second task was to determine the fitness function for the genetic algorithms. The third objective was to optimize the land use map using economical benefits. The results has indicated that the proposed model has much better performance for solving complex multi-objective spatial optimization allocation problems and it is a promising method for generating land use alternatives for further consideration in spatial decision-making.
Efficient parallel linear scaling construction of the density matrix for Born-Oppenheimer molecular dynamics.

PubMed

Mniszewski, S M; Cawkwell, M J; Wall, M E; Mohd-Yusof, J; Bock, N; Germann, T C; Niklasson, A M N

2015-10-13

We present an algorithm for the calculation of the density matrix that for insulators scales linearly with system size and parallelizes efficiently on multicore, shared memory platforms with small and controllable numerical errors. The algorithm is based on an implementation of the second-order spectral projection (SP2) algorithm [ Niklasson, A. M. N. Phys. Rev. B 2002 , 66 , 155115 ] in sparse matrix algebra with the ELLPACK-R data format. We illustrate the performance of the algorithm within self-consistent tight binding theory by total energy calculations of gas phase poly(ethylene) molecules and periodic liquid water systems containing up to 15,000 atoms on up to 16 CPU cores. We consider algorithm-specific performance aspects, such as local vs nonlocal memory access and the degree of matrix sparsity. Comparisons to sparse matrix algebra implementations using off-the-shelf libraries on multicore CPUs, graphics processing units (GPUs), and the Intel many integrated core (MIC) architecture are also presented. The accuracy and stability of the algorithm are illustrated with long duration Born-Oppenheimer molecular dynamics simulations of 1000 water molecules and a 303 atom Trp cage protein solvated by 2682 water molecules.
Hierarchical layered and semantic-based image segmentation using ergodicity map

NASA Astrophysics Data System (ADS)

Yadegar, Jacob; Liu, Xiaoqing

2010-04-01

Image segmentation plays a foundational role in image understanding and computer vision. Although great strides have been made and progress achieved on automatic/semi-automatic image segmentation algorithms, designing a generic, robust, and efficient image segmentation algorithm is still challenging. Human vision is still far superior compared to computer vision, especially in interpreting semantic meanings/objects in images. We present a hierarchical/layered semantic image segmentation algorithm that can automatically and efficiently segment images into hierarchical layered/multi-scaled semantic regions/objects with contextual topological relationships. The proposed algorithm bridges the gap between high-level semantics and low-level visual features/cues (such as color, intensity, edge, etc.) through utilizing a layered/hierarchical ergodicity map, where ergodicity is computed based on a space filling fractal concept and used as a region dissimilarity measurement. The algorithm applies a highly scalable, efficient, and adaptive Peano- Cesaro triangulation/tiling technique to decompose the given image into a set of similar/homogenous regions based on low-level visual cues in a top-down manner. The layered/hierarchical ergodicity map is built through a bottom-up region dissimilarity analysis. The recursive fractal sweep associated with the Peano-Cesaro triangulation provides efficient local multi-resolution refinement to any level of detail. The generated binary decomposition tree also provides efficient neighbor retrieval mechanisms for contextual topological object/region relationship generation. Experiments have been conducted within the maritime image environment where the segmented layered semantic objects include the basic level objects (i.e. sky/land/water) and deeper level objects in the sky/land/water surfaces. Experimental results demonstrate the proposed algorithm has the capability to robustly and efficiently segment images into layered semantic objects/regions with contextual topological relationships.
Adapting Wave-front Algorithms to Efficiently Utilize Systems with Deep Communication Hierarchies

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kerbyson, Darren J.; Lang, Michael; Pakin, Scott

2011-09-30

Large-scale systems increasingly exhibit a differential between intra-chip and inter-chip communication performance especially in hybrid systems using accelerators. Processorcores on the same socket are able to communicate at lower latencies, and with higher bandwidths, than cores on different sockets either within the same node or between nodes. A key challenge is to efficiently use this communication hierarchy and hence optimize performance. We consider here the class of applications that contains wavefront processing. In these applications data can only be processed after their upstream neighbors have been processed. Similar dependencies result between processors in which communication is required to pass boundarymore » data downstream and whose cost is typically impacted by the slowest communication channel in use. In this work we develop a novel hierarchical wave-front approach that reduces the use of slower communications in the hierarchy but at the cost of additional steps in the parallel computation and higher use of on-chip communications. This tradeoff is explored using a performance model. An implementation using the Reverse-acceleration programming model on the petascale Roadrunner system demonstrates a 27% performance improvement at full system-scale on a kernel application. The approach is generally applicable to large-scale multi-core and accelerated systems where a differential in system communication performance exists.« less
Feature selection method based on multi-fractal dimension and harmony search algorithm and its application

NASA Astrophysics Data System (ADS)

Zhang, Chen; Ni, Zhiwei; Ni, Liping; Tang, Na

2016-10-01

Feature selection is an important method of data preprocessing in data mining. In this paper, a novel feature selection method based on multi-fractal dimension and harmony search algorithm is proposed. Multi-fractal dimension is adopted as the evaluation criterion of feature subset, which can determine the number of selected features. An improved harmony search algorithm is used as the search strategy to improve the efficiency of feature selection. The performance of the proposed method is compared with that of other feature selection algorithms on UCI data-sets. Besides, the proposed method is also used to predict the daily average concentration of PM2.5 in China. Experimental results show that the proposed method can obtain competitive results in terms of both prediction accuracy and the number of selected features.
Newmark local time stepping on high-performance computing architectures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rietmann, Max, E-mail: max.rietmann@erdw.ethz.ch; Institute of Geophysics, ETH Zurich; Grote, Marcus, E-mail: marcus.grote@unibas.ch

In multi-scale complex media, finite element meshes often require areas of local refinement, creating small elements that can dramatically reduce the global time-step for wave-propagation problems due to the CFL condition. Local time stepping (LTS) algorithms allow an explicit time-stepping scheme to adapt the time-step to the element size, allowing near-optimal time-steps everywhere in the mesh. We develop an efficient multilevel LTS-Newmark scheme and implement it in a widely used continuous finite element seismic wave-propagation package. In particular, we extend the standard LTS formulation with adaptations to continuous finite element methods that can be implemented very efficiently with very strongmore » element-size contrasts (more than 100x). Capable of running on large CPU and GPU clusters, we present both synthetic validation examples and large scale, realistic application examples to demonstrate the performance and applicability of the method and implementation on thousands of CPU cores and hundreds of GPUs.« less
An auxiliary graph based dynamic traffic grooming algorithm in spatial division multiplexing enabled elastic optical networks with multi-core fibers

NASA Astrophysics Data System (ADS)

Zhao, Yongli; Tian, Rui; Yu, Xiaosong; Zhang, Jiawei; Zhang, Jie

2017-03-01

A proper traffic grooming strategy in dynamic optical networks can improve the utilization of bandwidth resources. An auxiliary graph (AG) is designed to solve the traffic grooming problem under a dynamic traffic scenario in spatial division multiplexing enabled elastic optical networks (SDM-EON) with multi-core fibers. Five traffic grooming policies achieved by adjusting the edge weights of an AG are proposed and evaluated through simulation: maximal electrical grooming (MEG), maximal optical grooming (MOG), maximal SDM grooming (MSG), minimize virtual hops (MVH), and minimize physical hops (MPH). Numeric results show that each traffic grooming policy has its own features. Among different traffic grooming policies, an MPH policy can achieve the lowest bandwidth blocking ratio, MEG can save the most transponders, and MSG can obtain the fewest cores for each request.
New Scheduling Algorithms for Agile All-Photonic Networks

NASA Astrophysics Data System (ADS)

Mehri, Mohammad Saleh; Ghaffarpour Rahbar, Akbar

2017-12-01

An optical overlaid star network is a class of agile all-photonic networks that consists of one or more core node(s) at the center of the star network and a number of edge nodes around the core node. In this architecture, a core node may use a scheduling algorithm for transmission of traffic through the network. A core node is responsible for scheduling optical packets that arrive from edge nodes and switching them toward their destinations. Nowadays, most edge nodes use virtual output queue (VOQ) architecture for buffering client packets to achieve high throughput. This paper presents two efficient scheduling algorithms called discretionary iterative matching (DIM) and adaptive DIM. These schedulers find maximum matching in a small number of iterations and provide high throughput and incur low delay. The number of arbiters in these schedulers and the number of messages exchanged between inputs and outputs of a core node are reduced. We show that DIM and adaptive DIM can provide better performance in comparison with iterative round-robin matching with SLIP (iSLIP). SLIP means the act of sliding for a short distance to select one of the requested connections based on the scheduling algorithm.

Communication: An efficient approach to compute state-specific nuclear gradients for a generic state-averaged multi-configuration self consistent field wavefunction.

PubMed

Granovsky, Alexander A

2015-12-21

We present a new, very efficient semi-numerical approach for the computation of state-specific nuclear gradients of a generic state-averaged multi-configuration self consistent field wavefunction. Our approach eliminates the costly coupled-perturbed multi-configuration Hartree-Fock step as well as the associated integral transformation stage. The details of the implementation within the Firefly quantum chemistry package are discussed and several sample applications are given. The new approach is routinely applicable to geometry optimization of molecular systems with 1000+ basis functions using a standalone multi-core workstation.
Communication: An efficient approach to compute state-specific nuclear gradients for a generic state-averaged multi-configuration self consistent field wavefunction

DOE Office of Scientific and Technical Information (OSTI.GOV)

Granovsky, Alexander A., E-mail: alex.granovsky@gmail.com

We present a new, very efficient semi-numerical approach for the computation of state-specific nuclear gradients of a generic state-averaged multi-configuration self consistent field wavefunction. Our approach eliminates the costly coupled-perturbed multi-configuration Hartree-Fock step as well as the associated integral transformation stage. The details of the implementation within the Firefly quantum chemistry package are discussed and several sample applications are given. The new approach is routinely applicable to geometry optimization of molecular systems with 1000+ basis functions using a standalone multi-core workstation.
MetAlign 3.0: performance enhancement by efficient use of advances in computer hardware.

PubMed

Lommen, Arjen; Kools, Harrie J

2012-08-01

A new, multi-threaded version of the GC-MS and LC-MS data processing software, metAlign, has been developed which is able to utilize multiple cores on one PC. This new version was tested using three different multi-core PCs with different operating systems. The performance of noise reduction, baseline correction and peak-picking was 8-19 fold faster compared to the previous version on a single core machine from 2008. The alignment was 5-10 fold faster. Factors influencing the performance enhancement are discussed. Our observations show that performance scales with the increase in processor core numbers we currently see in consumer PC hardware development.
Multi-Source Multi-Target Dictionary Learning for Prediction of Cognitive Decline

PubMed Central

Zhang, Jie; Li, Qingyang; Caselli, Richard J.; Thompson, Paul M.; Ye, Jieping; Wang, Yalin

2017-01-01

Alzheimer’s Disease (AD) is the most common type of dementia. Identifying correct biomarkers may determine pre-symptomatic AD subjects and enable early intervention. Recently, Multi-task sparse feature learning has been successfully applied to many computer vision and biomedical informatics researches. It aims to improve the generalization performance by exploiting the shared features among different tasks. However, most of the existing algorithms are formulated as a supervised learning scheme. Its drawback is with either insufficient feature numbers or missing label information. To address these challenges, we formulate an unsupervised framework for multi-task sparse feature learning based on a novel dictionary learning algorithm. To solve the unsupervised learning problem, we propose a two-stage Multi-Source Multi-Target Dictionary Learning (MMDL) algorithm. In stage 1, we propose a multi-source dictionary learning method to utilize the common and individual sparse features in different time slots. In stage 2, supported by a rigorous theoretical analysis, we develop a multi-task learning method to solve the missing label problem. Empirical studies on an N = 3970 longitudinal brain image data set, which involves 2 sources and 5 targets, demonstrate the improved prediction accuracy and speed efficiency of MMDL in comparison with other state-of-the-art algorithms. PMID:28943731
Techniques Analysis of the Interference Suppression Algorithm in Broadband Aeronautical Multi-carrier Communication System

NASA Astrophysics Data System (ADS)

Li, Dong-xia; Ye, Qian-wen

Out-of-band radiation suppression algorithm must be used efficiently for broadband aeronautical communication system in order not to interfere the operation of the existing systems in aviation L-Band. Based on the simple introduction of the broadband aeronautical multi-carrier communication (B-AMC) system model, several sidelobe suppression techniques in orthogonal frequency multiplexing (OFDM) system are presented and analyzed so as to find a suitable algorithm for B-AMC system in this paper. Simulation results show that raise-cosine function windowing can suppress the out-of-band radiation of B-AMC system effectively.
Superiorization-based multi-energy CT image reconstruction

PubMed Central

Yang, Q; Cong, W; Wang, G

2017-01-01

The recently-developed superiorization approach is efficient and robust for solving various constrained optimization problems. This methodology can be applied to multi-energy CT image reconstruction with the regularization in terms of the prior rank, intensity and sparsity model (PRISM). In this paper, we propose a superiorized version of the simultaneous algebraic reconstruction technique (SART) based on the PRISM model. Then, we compare the proposed superiorized algorithm with the Split-Bregman algorithm in numerical experiments. The results show that both the Superiorized-SART and the Split-Bregman algorithms generate good results with weak noise and reduced artefacts. PMID:28983142
Application Performance Analysis and Efficient Execution on Systems with multi-core CPUs, GPUs and MICs: A Case Study with Microscopy Image Analysis

PubMed Central

Teodoro, George; Kurc, Tahsin; Andrade, Guilherme; Kong, Jun; Ferreira, Renato; Saltz, Joel

2015-01-01

We carry out a comparative performance study of multi-core CPUs, GPUs and Intel Xeon Phi (Many Integrated Core-MIC) with a microscopy image analysis application. We experimentally evaluate the performance of computing devices on core operations of the application. We correlate the observed performance with the characteristics of computing devices and data access patterns, computation complexities, and parallelization forms of the operations. The results show a significant variability in the performance of operations with respect to the device used. The performances of operations with regular data access are comparable or sometimes better on a MIC than that on a GPU. GPUs are more efficient than MICs for operations that access data irregularly, because of the lower bandwidth of the MIC for random data accesses. We propose new performance-aware scheduling strategies that consider variabilities in operation speedups. Our scheduling strategies significantly improve application performance compared to classic strategies in hybrid configurations. PMID:28239253
Multi-Agent Patrolling under Uncertainty and Threats.

PubMed

Chen, Shaofei; Wu, Feng; Shen, Lincheng; Chen, Jing; Ramchurn, Sarvapali D

2015-01-01

We investigate a multi-agent patrolling problem where information is distributed alongside threats in environments with uncertainties. Specifically, the information and threat at each location are independently modelled as multi-state Markov chains, whose states are not observed until the location is visited by an agent. While agents will obtain information at a location, they may also suffer damage from the threat at that location. Therefore, the goal of the agents is to gather as much information as possible while mitigating the damage incurred. To address this challenge, we formulate the single-agent patrolling problem as a Partially Observable Markov Decision Process (POMDP) and propose a computationally efficient algorithm to solve this model. Building upon this, to compute patrols for multiple agents, the single-agent algorithm is extended for each agent with the aim of maximising its marginal contribution to the team. We empirically evaluate our algorithm on problems of multi-agent patrolling and show that it outperforms a baseline algorithm up to 44% for 10 agents and by 21% for 15 agents in large domains.
Automated and Assistive Tools for Accelerated Code migration of Scientific Computing on to Heterogeneous MultiCore Systems

DTIC Science & Technology

2017-04-13

modelling code, a parallel benchmark , and a communication avoiding version of the QR algorithm. Further, several improvements to the OmpSs model were...movement; and a port of the dynamic load balancing library to OmpSs. Finally, several updates to the tools infrastructure were accomplished, including: an...OmpSs: a basic algorithm on image processing applications, a mini application representative of an ocean modelling code, a parallel benchmark , and a
Fuzzy-Logic Based Distributed Energy-Efficient Clustering Algorithm for Wireless Sensor Networks.

PubMed

Zhang, Ying; Wang, Jun; Han, Dezhi; Wu, Huafeng; Zhou, Rundong

2017-07-03

Due to the high-energy efficiency and scalability, the clustering routing algorithm has been widely used in wireless sensor networks (WSNs). In order to gather information more efficiently, each sensor node transmits data to its Cluster Head (CH) to which it belongs, by multi-hop communication. However, the multi-hop communication in the cluster brings the problem of excessive energy consumption of the relay nodes which are closer to the CH. These nodes' energy will be consumed more quickly than the farther nodes, which brings the negative influence on load balance for the whole networks. Therefore, we propose an energy-efficient distributed clustering algorithm based on fuzzy approach with non-uniform distribution (EEDCF). During CHs' election, we take nodes' energies, nodes' degree and neighbor nodes' residual energies into consideration as the input parameters. In addition, we take advantage of Takagi, Sugeno and Kang (TSK) fuzzy model instead of traditional method as our inference system to guarantee the quantitative analysis more reasonable. In our scheme, each sensor node calculates the probability of being as CH with the help of fuzzy inference system in a distributed way. The experimental results indicate EEDCF algorithm is better than some current representative methods in aspects of data transmission, energy consumption and lifetime of networks.
MultiMiTar: a novel multi objective optimization based miRNA-target prediction method.

PubMed

Mitra, Ramkrishna; Bandyopadhyay, Sanghamitra

2011-01-01

Machine learning based miRNA-target prediction algorithms often fail to obtain a balanced prediction accuracy in terms of both sensitivity and specificity due to lack of the gold standard of negative examples, miRNA-targeting site context specific relevant features and efficient feature selection process. Moreover, all the sequence, structure and machine learning based algorithms are unable to distribute the true positive predictions preferentially at the top of the ranked list; hence the algorithms become unreliable to the biologists. In addition, these algorithms fail to obtain considerable combination of precision and recall for the target transcripts that are translationally repressed at protein level. In the proposed article, we introduce an efficient miRNA-target prediction system MultiMiTar, a Support Vector Machine (SVM) based classifier integrated with a multiobjective metaheuristic based feature selection technique. The robust performance of the proposed method is mainly the result of using high quality negative examples and selection of biologically relevant miRNA-targeting site context specific features. The features are selected by using a novel feature selection technique AMOSA-SVM, that integrates the multi objective optimization technique Archived Multi-Objective Simulated Annealing (AMOSA) and SVM. MultiMiTar is found to achieve much higher Matthew's correlation coefficient (MCC) of 0.583 and average class-wise accuracy (ACA) of 0.8 compared to the others target prediction methods for a completely independent test data set. The obtained MCC and ACA values of these algorithms range from -0.269 to 0.155 and 0.321 to 0.582, respectively. Moreover, it shows a more balanced result in terms of precision and sensitivity (recall) for the translationally repressed data set as compared to all the other existing methods. An important aspect is that the true positive predictions are distributed preferentially at the top of the ranked list that makes MultiMiTar reliable for the biologists. MultiMiTar is now available as an online tool at www.isical.ac.in/~bioinfo_miu/multimitar.htm. MultiMiTar software can be downloaded from www.isical.ac.in/~bioinfo_miu/multimitar-download.htm.
GPU-completeness: theory and implications

NASA Astrophysics Data System (ADS)

Lin, I.-Jong

2011-01-01

This paper formalizes a major insight into a class of algorithms that relate parallelism and performance. The purpose of this paper is to define a class of algorithms that trades off parallelism for quality of result (e.g. visual quality, compression rate), and we propose a similar method for algorithmic classification based on NP-Completeness techniques, applied toward parallel acceleration. We will define this class of algorithm as "GPU-Complete" and will postulate the necessary properties of the algorithms for admission into this class. We will also formally relate his algorithmic space and imaging algorithms space. This concept is based upon our experience in the print production area where GPUs (Graphic Processing Units) have shown a substantial cost/performance advantage within the context of HPdelivered enterprise services and commercial printing infrastructure. While CPUs and GPUs are converging in their underlying hardware and functional blocks, their system behaviors are clearly distinct in many ways: memory system design, programming paradigms, and massively parallel SIMD architecture. There are applications that are clearly suited to each architecture: for CPU: language compilation, word processing, operating systems, and other applications that are highly sequential in nature; for GPU: video rendering, particle simulation, pixel color conversion, and other problems clearly amenable to massive parallelization. While GPUs establishing themselves as a second, distinct computing architecture from CPUs, their end-to-end system cost/performance advantage in certain parts of computation inform the structure of algorithms and their efficient parallel implementations. While GPUs are merely one type of architecture for parallelization, we show that their introduction into the design space of printing systems demonstrate the trade-offs against competing multi-core, FPGA, and ASIC architectures. While each architecture has its own optimal application, we believe that the selection of architecture can be defined in terms of properties of GPU-Completeness. For a welldefined subset of algorithms, GPU-Completeness is intended to connect the parallelism, algorithms and efficient architectures into a unified framework to show that multiple layers of parallel implementation are guided by the same underlying trade-off.
Multiple feature fusion via covariance matrix for visual tracking

NASA Astrophysics Data System (ADS)

Jin, Zefenfen; Hou, Zhiqiang; Yu, Wangsheng; Wang, Xin; Sun, Hui

2018-04-01

Aiming at the problem of complicated dynamic scenes in visual target tracking, a multi-feature fusion tracking algorithm based on covariance matrix is proposed to improve the robustness of the tracking algorithm. In the frame-work of quantum genetic algorithm, this paper uses the region covariance descriptor to fuse the color, edge and texture features. It also uses a fast covariance intersection algorithm to update the model. The low dimension of region covariance descriptor, the fast convergence speed and strong global optimization ability of quantum genetic algorithm, and the fast computation of fast covariance intersection algorithm are used to improve the computational efficiency of fusion, matching, and updating process, so that the algorithm achieves a fast and effective multi-feature fusion tracking. The experiments prove that the proposed algorithm can not only achieve fast and robust tracking but also effectively handle interference of occlusion, rotation, deformation, motion blur and so on.
Sampling Approaches for Multi-Domain Internet Performance Measurement Infrastructures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Calyam, Prasad

2014-09-15

The next-generation of high-performance networks being developed in DOE communities are critical for supporting current and emerging data-intensive science applications. The goal of this project is to investigate multi-domain network status sampling techniques and tools to measure/analyze performance, and thereby provide “network awareness” to end-users and network operators in DOE communities. We leverage the infrastructure and datasets available through perfSONAR, which is a multi-domain measurement framework that has been widely deployed in high-performance computing and networking communities; the DOE community is a core developer and the largest adopter of perfSONAR. Our investigations include development of semantic scheduling algorithms, measurement federationmore » policies, and tools to sample multi-domain and multi-layer network status within perfSONAR deployments. We validate our algorithms and policies with end-to-end measurement analysis tools for various monitoring objectives such as network weather forecasting, anomaly detection, and fault-diagnosis. In addition, we develop a multi-domain architecture for an enterprise-specific perfSONAR deployment that can implement monitoring-objective based sampling and that adheres to any domain-specific measurement policies.« less
Efficient Boundary Extraction of BSP Solids Based on Clipping Operations.

PubMed

Wang, Charlie C L; Manocha, Dinesh

2013-01-01

We present an efficient algorithm to extract the manifold surface that approximates the boundary of a solid represented by a Binary Space Partition (BSP) tree. Our polygonization algorithm repeatedly performs clipping operations on volumetric cells that correspond to a spatial convex partition and computes the boundary by traversing the connected cells. We use point-based representations along with finite-precision arithmetic to improve the efficiency and generate the B-rep approximation of a BSP solid. The core of our polygonization method is a novel clipping algorithm that uses a set of logical operations to make it resistant to degeneracies resulting from limited precision of floating-point arithmetic. The overall BSP to B-rep conversion algorithm can accurately generate boundaries with sharp and small features, and is faster than prior methods. At the end of this paper, we use this algorithm for a few geometric processing applications including Boolean operations, model repair, and mesh reconstruction.
Multi-Objective Reinforcement Learning-based Deep Neural Networks for Cognitive Space Communications

NASA Technical Reports Server (NTRS)

Ferreria, Paulo; Paffenroth, Randy; Wyglinski, Alexander M.; Hackett, Timothy; Bilen, Sven; Reinhart, Richard; Mortensen, Dale

2017-01-01

Future communication subsystems of space exploration missions can potentially benefit from software-defined radios (SDRs) controlled by machine learning algorithms. In this paper, we propose a novel hybrid radio resource allocation management control algorithm that integrates multi-objective reinforcement learning and deep artificial neural networks. The objective is to efficiently manage communications system resources by monitoring performance functions with common dependent variables that result in conflicting goals. The uncertainty in the performance of thousands of different possible combinations of radio parameters makes the trade-off between exploration and exploitation in reinforcement learning (RL) much more challenging for future critical space-based missions. Thus, the system should spend as little time as possible on exploring actions, and whenever it explores an action, it should perform at acceptable levels most of the time. The proposed approach enables on-line learning by interactions with the environment and restricts poor resource allocation performance through virtual environment exploration. Improvements in the multiobjective performance can be achieved via transmitter parameter adaptation on a packet-basis, with poorly predicted performance promptly resulting in rejected decisions. Simulations presented in this work considered the DVB-S2 standard adaptive transmitter parameters and additional ones expected to be present in future adaptive radio systems. Performance results are provided by analysis of the proposed hybrid algorithm when operating across a satellite communication channel from Earth to GEO orbit during clear sky conditions. The proposed approach constitutes part of the core cognitive engine proof-of-concept to be delivered to the NASA Glenn Research Center SCaN Testbed located onboard the International Space Station.
Multi-Objective Reinforcement Learning-Based Deep Neural Networks for Cognitive Space Communications

NASA Technical Reports Server (NTRS)

Ferreria, Paulo Victor R.; Paffenroth, Randy; Wyglinski, Alexander M.; Hackett, Timothy M.; Bilen, Sven G.; Reinhart, Richard C.; Mortensen, Dale J.

2017-01-01

Future communication subsystems of space exploration missions can potentially benefit from software-defined radios (SDRs) controlled by machine learning algorithms. In this paper, we propose a novel hybrid radio resource allocation management control algorithm that integrates multi-objective reinforcement learning and deep artificial neural networks. The objective is to efficiently manage communications system resources by monitoring performance functions with common dependent variables that result in conflicting goals. The uncertainty in the performance of thousands of different possible combinations of radio parameters makes the trade-off between exploration and exploitation in reinforcement learning (RL) much more challenging for future critical space-based missions. Thus, the system should spend as little time as possible on exploring actions, and whenever it explores an action, it should perform at acceptable levels most of the time. The proposed approach enables on-line learning by interactions with the environment and restricts poor resource allocation performance through virtual environment exploration. Improvements in the multiobjective performance can be achieved via transmitter parameter adaptation on a packet-basis, with poorly predicted performance promptly resulting in rejected decisions. Simulations presented in this work considered the DVB-S2 standard adaptive transmitter parameters and additional ones expected to be present in future adaptive radio systems. Performance results are provided by analysis of the proposed hybrid algorithm when operating across a satellite communication channel from Earth to GEO orbit during clear sky conditions. The proposed approach constitutes part of the core cognitive engine proof-of-concept to be delivered to the NASA Glenn Research Center SCaN Testbed located onboard the International Space Station.
Can magneto-plasmonic nanohybrids efficiently combine photothermia with magnetic hyperthermia?

NASA Astrophysics Data System (ADS)

Espinosa, Ana; Bugnet, Mathieu; Radtke, Guillaume; Neveu, Sophie; Botton, Gianluigi A.; Wilhelm, Claire; Abou-Hassan, Ali

2015-11-01

Multifunctional hybrid-design nanomaterials appear to be a promising route to meet the current therapeutics needs required for efficient cancer treatment. Herein, two efficient heat nano-generators were combined into a multifunctional single nanohybrid (a multi-core iron oxide nanoparticle optimized for magnetic hyperthermia, and a gold branched shell with tunable plasmonic properties in the NIR region, for photothermal therapy) which impressively enhanced heat generation, in suspension or in vivo in tumours, opening up exciting new therapeutic perspectives.Multifunctional hybrid-design nanomaterials appear to be a promising route to meet the current therapeutics needs required for efficient cancer treatment. Herein, two efficient heat nano-generators were combined into a multifunctional single nanohybrid (a multi-core iron oxide nanoparticle optimized for magnetic hyperthermia, and a gold branched shell with tunable plasmonic properties in the NIR region, for photothermal therapy) which impressively enhanced heat generation, in suspension or in vivo in tumours, opening up exciting new therapeutic perspectives. Electronic supplementary information (ESI) available. See DOI: 10.1039/c5nr06168g
GPU Accelerated Chemical Similarity Calculation for Compound Library Comparison

PubMed Central

Ma, Chao; Wang, Lirong; Xie, Xiang-Qun

2012-01-01

Chemical similarity calculation plays an important role in compound library design, virtual screening, and “lead” optimization. In this manuscript, we present a novel GPU-accelerated algorithm for all-vs-all Tanimoto matrix calculation and nearest neighbor search. By taking advantage of multi-core GPU architecture and CUDA parallel programming technology, the algorithm is up to 39 times superior to the existing commercial software that runs on CPUs. Because of the utilization of intrinsic GPU instructions, this approach is nearly 10 times faster than existing GPU-accelerated sparse vector algorithm, when Unity fingerprints are used for Tanimoto calculation. The GPU program that implements this new method takes about 20 minutes to complete the calculation of Tanimoto coefficients between 32M PubChem compounds and 10K Active Probes compounds, i.e., 324G Tanimoto coefficients, on a 128-CUDA-core GPU. PMID:21692447
Dual Super-Systolic Core for Real-Time Reconstructive Algorithms of High-Resolution Radar/SAR Imaging Systems

PubMed Central

Atoche, Alejandro Castillo; Castillo, Javier Vázquez

2012-01-01

A high-speed dual super-systolic core for reconstructive signal processing (SP) operations consists of a double parallel systolic array (SA) machine in which each processing element of the array is also conceptualized as another SA in a bit-level fashion. In this study, we addressed the design of a high-speed dual super-systolic array (SSA) core for the enhancement/reconstruction of remote sensing (RS) imaging of radar/synthetic aperture radar (SAR) sensor systems. The selected reconstructive SP algorithms are efficiently transformed in their parallel representation and then, they are mapped into an efficient high performance embedded computing (HPEC) architecture in reconfigurable Xilinx field programmable gate array (FPGA) platforms. As an implementation test case, the proposed approach was aggregated in a HW/SW co-design scheme in order to solve the nonlinear ill-posed inverse problem of nonparametric estimation of the power spatial spectrum pattern (SSP) from a remotely sensed scene. We show how such dual SSA core, drastically reduces the computational load of complex RS regularization techniques achieving the required real-time operational mode. PMID:22736964

Optimized scheme in coal-fired boiler combustion based on information entropy and modified K-prototypes algorithm

NASA Astrophysics Data System (ADS)

Gu, Hui; Zhu, Hongxia; Cui, Yanfeng; Si, Fengqi; Xue, Rui; Xi, Han; Zhang, Jiayu

2018-06-01

An integrated combustion optimization scheme is proposed for the combined considering the restriction in coal-fired boiler combustion efficiency and outlet NOx emissions. Continuous attribute discretization and reduction techniques are handled as optimization preparation by E-Cluster and C_RED methods, in which the segmentation numbers don't need to be provided in advance and can be continuously adapted with data characters. In order to obtain results of multi-objections with clustering method for mixed data, a modified K-prototypes algorithm is then proposed. This algorithm can be divided into two stages as K-prototypes algorithm for clustering number self-adaptation and clustering for multi-objective optimization, respectively. Field tests were carried out at a 660 MW coal-fired boiler to provide real data as a case study for controllable attribute discretization and reduction in boiler system and obtaining optimization parameters considering [ maxηb, minyNOx ] multi-objective rule.
The accurate particle tracer code

NASA Astrophysics Data System (ADS)

Wang, Yulei; Liu, Jian; Qin, Hong; Yu, Zhi; Yao, Yicun

2017-11-01

The Accurate Particle Tracer (APT) code is designed for systematic large-scale applications of geometric algorithms for particle dynamical simulations. Based on a large variety of advanced geometric algorithms, APT possesses long-term numerical accuracy and stability, which are critical for solving multi-scale and nonlinear problems. To provide a flexible and convenient I/O interface, the libraries of Lua and Hdf5 are used. Following a three-step procedure, users can efficiently extend the libraries of electromagnetic configurations, external non-electromagnetic forces, particle pushers, and initialization approaches by use of the extendible module. APT has been used in simulations of key physical problems, such as runaway electrons in tokamaks and energetic particles in Van Allen belt. As an important realization, the APT-SW version has been successfully distributed on the world's fastest computer, the Sunway TaihuLight supercomputer, by supporting master-slave architecture of Sunway many-core processors. Based on large-scale simulations of a runaway beam under parameters of the ITER tokamak, it is revealed that the magnetic ripple field can disperse the pitch-angle distribution significantly and improve the confinement of energetic runaway beam on the same time.
A Stream Tilling Approach to Surface Area Estimation for Large Scale Spatial Data in a Shared Memory System

NASA Astrophysics Data System (ADS)

Liu, Jiping; Kang, Xiaochen; Dong, Chun; Xu, Shenghua

2017-12-01

Surface area estimation is a widely used tool for resource evaluation in the physical world. When processing large scale spatial data, the input/output (I/O) can easily become the bottleneck in parallelizing the algorithm due to the limited physical memory resources and the very slow disk transfer rate. In this paper, we proposed a stream tilling approach to surface area estimation that first decomposed a spatial data set into tiles with topological expansions. With these tiles, the one-to-one mapping relationship between the input and the computing process was broken. Then, we realized a streaming framework towards the scheduling of the I/O processes and computing units. Herein, each computing unit encapsulated a same copy of the estimation algorithm, and multiple asynchronous computing units could work individually in parallel. Finally, the performed experiment demonstrated that our stream tilling estimation can efficiently alleviate the heavy pressures from the I/O-bound work, and the measured speedup after being optimized have greatly outperformed the directly parallel versions in shared memory systems with multi-core processors.
GPU accelerated dynamic functional connectivity analysis for functional MRI data.

PubMed

Akgün, Devrim; Sakoğlu, Ünal; Esquivel, Johnny; Adinoff, Bryon; Mete, Mutlu

2015-07-01

Recent advances in multi-core processors and graphics card based computational technologies have paved the way for an improved and dynamic utilization of parallel computing techniques. Numerous applications have been implemented for the acceleration of computationally-intensive problems in various computational science fields including bioinformatics, in which big data problems are prevalent. In neuroimaging, dynamic functional connectivity (DFC) analysis is a computationally demanding method used to investigate dynamic functional interactions among different brain regions or networks identified with functional magnetic resonance imaging (fMRI) data. In this study, we implemented and analyzed a parallel DFC algorithm based on thread-based and block-based approaches. The thread-based approach was designed to parallelize DFC computations and was implemented in both Open Multi-Processing (OpenMP) and Compute Unified Device Architecture (CUDA) programming platforms. Another approach developed in this study to better utilize CUDA architecture is the block-based approach, where parallelization involves smaller parts of fMRI time-courses obtained by sliding-windows. Experimental results showed that the proposed parallel design solutions enabled by the GPUs significantly reduce the computation time for DFC analysis. Multicore implementation using OpenMP on 8-core processor provides up to 7.7× speed-up. GPU implementation using CUDA yielded substantial accelerations ranging from 18.5× to 157× speed-up once thread-based and block-based approaches were combined in the analysis. Proposed parallel programming solutions showed that multi-core processor and CUDA-supported GPU implementations accelerated the DFC analyses significantly. Developed algorithms make the DFC analyses more practical for multi-subject studies with more dynamic analyses. Copyright © 2015 Elsevier Ltd. All rights reserved.
An artificial bee colony algorithm for locating the critical slip surface in slope stability analysis

NASA Astrophysics Data System (ADS)

Kang, Fei; Li, Junjie; Ma, Zhenyue

2013-02-01

Determination of the critical slip surface with the minimum factor of safety of a slope is a difficult constrained global optimization problem. In this article, an artificial bee colony algorithm with a multi-slice adjustment method is proposed for locating the critical slip surfaces of soil slopes, and the Spencer method is employed to calculate the factor of safety. Six benchmark examples are presented to illustrate the reliability and efficiency of the proposed technique, and it is also compared with some well-known or recent algorithms for the problem. The results show that the new algorithm is promising in terms of accuracy and efficiency.
MC64-ClustalWP2: A Highly-Parallel Hybrid Strategy to Align Multiple Sequences in Many-Core Architectures

PubMed Central

Díaz, David; Esteban, Francisco J.; Hernández, Pilar; Caballero, Juan Antonio; Guevara, Antonio

2014-01-01

We have developed the MC64-ClustalWP2 as a new implementation of the Clustal W algorithm, integrating a novel parallelization strategy and significantly increasing the performance when aligning long sequences in architectures with many cores. It must be stressed that in such a process, the detailed analysis of both the software and hardware features and peculiarities is of paramount importance to reveal key points to exploit and optimize the full potential of parallelism in many-core CPU systems. The new parallelization approach has focused into the most time-consuming stages of this algorithm. In particular, the so-called progressive alignment has drastically improved the performance, due to a fine-grained approach where the forward and backward loops were unrolled and parallelized. Another key approach has been the implementation of the new algorithm in a hybrid-computing system, integrating both an Intel Xeon multi-core CPU and a Tilera Tile64 many-core card. A comparison with other Clustal W implementations reveals the high-performance of the new algorithm and strategy in many-core CPU architectures, in a scenario where the sequences to align are relatively long (more than 10 kb) and, hence, a many-core GPU hardware cannot be used. Thus, the MC64-ClustalWP2 runs multiple alignments more than 18x than the original Clustal W algorithm, and more than 7x than the best x86 parallel implementation to date, being publicly available through a web service. Besides, these developments have been deployed in cost-effective personal computers and should be useful for life-science researchers, including the identification of identities and differences for mutation/polymorphism analyses, biodiversity and evolutionary studies and for the development of molecular markers for paternity testing, germplasm management and protection, to assist breeding, illegal traffic control, fraud prevention and for the protection of the intellectual property (identification/traceability), including the protected designation of origin, among other applications. PMID:24710354
Acceleration of the Particle Swarm Optimization for Peierls–Nabarro modeling of dislocations in conventional and high-entropy alloys

DOE PAGES

Pei, Zongrui; Max-Planck-Inst. fur Eisenforschung, Duseldorf; Eisenbach, Markus

2017-02-06

Dislocations are among the most important defects in determining the mechanical properties of both conventional alloys and high-entropy alloys. The Peierls-Nabarro model supplies an efficient pathway to their geometries and mobility. The difficulty in solving the integro-differential Peierls-Nabarro equation is how to effectively avoid the local minima in the energy landscape of a dislocation core. Among the other methods to optimize the dislocation core structures, we choose the algorithm of Particle Swarm Optimization, an algorithm that simulates the social behaviors of organisms. By employing more particles (bigger swarm) and more iterative steps (allowing them to explore for longer time), themore » local minima can be effectively avoided. But this would require more computational cost. The advantage of this algorithm is that it is readily parallelized in modern high computing architecture. We demonstrate the performance of our parallelized algorithm scales linearly with the number of employed cores.« less
Deterministic Design Optimization of Structures in OpenMDAO Framework

NASA Technical Reports Server (NTRS)

Coroneos, Rula M.; Pai, Shantaram S.

2012-01-01

Nonlinear programming algorithms play an important role in structural design optimization. Several such algorithms have been implemented in OpenMDAO framework developed at NASA Glenn Research Center (GRC). OpenMDAO is an open source engineering analysis framework, written in Python, for analyzing and solving Multi-Disciplinary Analysis and Optimization (MDAO) problems. It provides a number of solvers and optimizers, referred to as components and drivers, which users can leverage to build new tools and processes quickly and efficiently. Users may download, use, modify, and distribute the OpenMDAO software at no cost. This paper summarizes the process involved in analyzing and optimizing structural components by utilizing the framework s structural solvers and several gradient based optimizers along with a multi-objective genetic algorithm. For comparison purposes, the same structural components were analyzed and optimized using CometBoards, a NASA GRC developed code. The reliability and efficiency of the OpenMDAO framework was compared and reported in this report.
Fast optimal wavefront reconstruction for multi-conjugate adaptive optics using the Fourier domain preconditioned conjugate gradient algorithm.

PubMed

Vogel, Curtis R; Yang, Qiang

2006-08-21

We present two different implementations of the Fourier domain preconditioned conjugate gradient algorithm (FD-PCG) to efficiently solve the large structured linear systems that arise in optimal volume turbulence estimation, or tomography, for multi-conjugate adaptive optics (MCAO). We describe how to deal with several critical technical issues, including the cone coordinate transformation problem and sensor subaperture grid spacing. We also extend the FD-PCG approach to handle the deformable mirror fitting problem for MCAO.
A robust and efficient polyhedron subdivision and intersection algorithm for three-dimensional MMALE remapping

NASA Astrophysics Data System (ADS)

Chen, Xiang; Zhang, Xiong; Jia, Zupeng

2017-06-01

The Multi-Material Arbitrary Lagrangian Eulerian (MMALE) method is an effective way to simulate the multi-material flow with severe surface deformation. Comparing with the traditional Arbitrary Lagrangian Eulerian (ALE) method, the MMALE method allows for multiple materials in a single cell which overcomes the difficulties in grid refinement process. In recent decades, many researches have been conducted for the Lagrangian, rezoning and surface reconstruction phases, but less attention has been paid to the multi-material remapping phase especially for the three-dimensional problems due to two complex geometric problems: the polyhedron subdivision and the polyhedron intersection. In this paper, we propose a ;Clipping and Projecting; algorithm for polyhedron intersection whose basic idea comes from the commonly used method by Grandy (1999) [29] and Jia et al. (2013) [34]. Our new algorithm solves the geometric problem by an incremental modification of the topology based on segment-plane intersections. A comparison with Jia et al. (2013) [34] shows our new method improves the efficiency by 55% to 65% when calculating polyhedron intersections. Moreover, the instability caused by the geometric degeneracy can be thoroughly avoided because the geometry integrity is preserved in the new algorithm. We also focus on the polyhedron subdivision process and describe an algorithm which could automatically and precisely tackle the various situations including convex, non-convex and multiple subdivisions. Numerical studies indicate that by using our polyhedron subdivision and intersection algorithm, the volume conversation of the remapping phase can be exactly preserved in the MMALE simulation.
Multi scales based sparse matrix spectral clustering image segmentation

NASA Astrophysics Data System (ADS)

Liu, Zhongmin; Chen, Zhicai; Li, Zhanming; Hu, Wenjin

2018-04-01

In image segmentation, spectral clustering algorithms have to adopt the appropriate scaling parameter to calculate the similarity matrix between the pixels, which may have a great impact on the clustering result. Moreover, when the number of data instance is large, computational complexity and memory use of the algorithm will greatly increase. To solve these two problems, we proposed a new spectral clustering image segmentation algorithm based on multi scales and sparse matrix. We devised a new feature extraction method at first, then extracted the features of image on different scales, at last, using the feature information to construct sparse similarity matrix which can improve the operation efficiency. Compared with traditional spectral clustering algorithm, image segmentation experimental results show our algorithm have better degree of accuracy and robustness.
Efficient selection of tagging single-nucleotide polymorphisms in multiple populations.

PubMed

Howie, Bryan N; Carlson, Christopher S; Rieder, Mark J; Nickerson, Deborah A

2006-08-01

Common genetic polymorphism may explain a portion of the heritable risk for common diseases, so considerable effort has been devoted to finding and typing common single-nucleotide polymorphisms (SNPs) in the human genome. Many SNPs show correlated genotypes, or linkage disequilibrium (LD), suggesting that only a subset of all SNPs (known as tagging SNPs, or tagSNPs) need to be genotyped for disease association studies. Based on the genetic differences that exist among human populations, most tagSNP sets are defined in a single population and applied only in populations that are closely related. To improve the efficiency of multi-population analyses, we have developed an algorithm called MultiPop-TagSelect that finds a near-minimal union of population-specific tagSNP sets across an arbitrary number of populations. We present this approach as an extension of LD-select, a tagSNP selection method that uses a greedy algorithm to group SNPs into bins based on their pairwise association patterns, although the MultiPop-TagSelect algorithm could be used with any SNP tagging approach that allows choices between nearly equivalent SNPs. We evaluate the algorithm by considering tagSNP selection in candidate-gene resequencing data and lower density whole-chromosome data. Our analysis reveals that an exhaustive search is often intractable, while the developed algorithm can quickly and reliably find near-optimal solutions even for difficult tagSNP selection problems. Using populations of African, Asian, and European ancestry, we also show that an optimal multi-population set of tagSNPs can be substantially smaller (up to 44%) than a typical set obtained through independent or sequential selection.
A matrix-algebraic formulation of distributed-memory maximal cardinality matching algorithms in bipartite graphs

DOE PAGES

Azad, Ariful; Buluç, Aydın

2016-05-16

We describe parallel algorithms for computing maximal cardinality matching in a bipartite graph on distributed-memory systems. Unlike traditional algorithms that match one vertex at a time, our algorithms process many unmatched vertices simultaneously using a matrix-algebraic formulation of maximal matching. This generic matrix-algebraic framework is used to develop three efficient maximal matching algorithms with minimal changes. The newly developed algorithms have two benefits over existing graph-based algorithms. First, unlike existing parallel algorithms, cardinality of matching obtained by the new algorithms stays constant with increasing processor counts, which is important for predictable and reproducible performance. Second, relying on bulk-synchronous matrix operations,more » these algorithms expose a higher degree of parallelism on distributed-memory platforms than existing graph-based algorithms. We report high-performance implementations of three maximal matching algorithms using hybrid OpenMP-MPI and evaluate the performance of these algorithm using more than 35 real and randomly generated graphs. On real instances, our algorithms achieve up to 200 × speedup on 2048 cores of a Cray XC30 supercomputer. Even higher speedups are obtained on larger synthetically generated graphs where our algorithms show good scaling on up to 16,384 cores.« less
Exploiting multi-scale parallelism for large scale numerical modelling of laser wakefield accelerators

NASA Astrophysics Data System (ADS)

Fonseca, R. A.; Vieira, J.; Fiuza, F.; Davidson, A.; Tsung, F. S.; Mori, W. B.; Silva, L. O.

2013-12-01

A new generation of laser wakefield accelerators (LWFA), supported by the extreme accelerating fields generated in the interaction of PW-Class lasers and underdense targets, promises the production of high quality electron beams in short distances for multiple applications. Achieving this goal will rely heavily on numerical modelling to further understand the underlying physics and identify optimal regimes, but large scale modelling of these scenarios is computationally heavy and requires the efficient use of state-of-the-art petascale supercomputing systems. We discuss the main difficulties involved in running these simulations and the new developments implemented in the OSIRIS framework to address these issues, ranging from multi-dimensional dynamic load balancing and hybrid distributed/shared memory parallelism to the vectorization of the PIC algorithm. We present the results of the OASCR Joule Metric program on the issue of large scale modelling of LWFA, demonstrating speedups of over 1 order of magnitude on the same hardware. Finally, scalability to over ˜106 cores and sustained performance over ˜2 P Flops is demonstrated, opening the way for large scale modelling of LWFA scenarios.
Sensor-Based Vibration Signal Feature Extraction Using an Improved Composite Dictionary Matching Pursuit Algorithm

PubMed Central

Cui, Lingli; Wu, Na; Wang, Wenjing; Kang, Chenhui

2014-01-01

This paper presents a new method for a composite dictionary matching pursuit algorithm, which is applied to vibration sensor signal feature extraction and fault diagnosis of a gearbox. Three advantages are highlighted in the new method. First, the composite dictionary in the algorithm has been changed from multi-atom matching to single-atom matching. Compared to non-composite dictionary single-atom matching, the original composite dictionary multi-atom matching pursuit (CD-MaMP) algorithm can achieve noise reduction in the reconstruction stage, but it cannot dramatically reduce the computational cost and improve the efficiency in the decomposition stage. Therefore, the optimized composite dictionary single-atom matching algorithm (CD-SaMP) is proposed. Second, the termination condition of iteration based on the attenuation coefficient is put forward to improve the sparsity and efficiency of the algorithm, which adjusts the parameters of the termination condition constantly in the process of decomposition to avoid noise. Third, composite dictionaries are enriched with the modulation dictionary, which is one of the important structural characteristics of gear fault signals. Meanwhile, the termination condition of iteration settings, sub-feature dictionary selections and operation efficiency between CD-MaMP and CD-SaMP are discussed, aiming at gear simulation vibration signals with noise. The simulation sensor-based vibration signal results show that the termination condition of iteration based on the attenuation coefficient enhances decomposition sparsity greatly and achieves a good effect of noise reduction. Furthermore, the modulation dictionary achieves a better matching effect compared to the Fourier dictionary, and CD-SaMP has a great advantage of sparsity and efficiency compared with the CD-MaMP. The sensor-based vibration signals measured from practical engineering gearbox analyses have further shown that the CD-SaMP decomposition and reconstruction algorithm is feasible and effective. PMID:25207870
Sensor-based vibration signal feature extraction using an improved composite dictionary matching pursuit algorithm.

PubMed

Cui, Lingli; Wu, Na; Wang, Wenjing; Kang, Chenhui

2014-09-09

This paper presents a new method for a composite dictionary matching pursuit algorithm, which is applied to vibration sensor signal feature extraction and fault diagnosis of a gearbox. Three advantages are highlighted in the new method. First, the composite dictionary in the algorithm has been changed from multi-atom matching to single-atom matching. Compared to non-composite dictionary single-atom matching, the original composite dictionary multi-atom matching pursuit (CD-MaMP) algorithm can achieve noise reduction in the reconstruction stage, but it cannot dramatically reduce the computational cost and improve the efficiency in the decomposition stage. Therefore, the optimized composite dictionary single-atom matching algorithm (CD-SaMP) is proposed. Second, the termination condition of iteration based on the attenuation coefficient is put forward to improve the sparsity and efficiency of the algorithm, which adjusts the parameters of the termination condition constantly in the process of decomposition to avoid noise. Third, composite dictionaries are enriched with the modulation dictionary, which is one of the important structural characteristics of gear fault signals. Meanwhile, the termination condition of iteration settings, sub-feature dictionary selections and operation efficiency between CD-MaMP and CD-SaMP are discussed, aiming at gear simulation vibration signals with noise. The simulation sensor-based vibration signal results show that the termination condition of iteration based on the attenuation coefficient enhances decomposition sparsity greatly and achieves a good effect of noise reduction. Furthermore, the modulation dictionary achieves a better matching effect compared to the Fourier dictionary, and CD-SaMP has a great advantage of sparsity and efficiency compared with the CD-MaMP. The sensor-based vibration signals measured from practical engineering gearbox analyses have further shown that the CD-SaMP decomposition and reconstruction algorithm is feasible and effective.
Design and multi-physics optimization of rotary MRF brakes

NASA Astrophysics Data System (ADS)

Topcu, Okan; Taşcıoğlu, Yiğit; Konukseven, Erhan İlhan

2018-03-01

Particle swarm optimization (PSO) is a popular method to solve the optimization problems. However, calculations for each particle will be excessive when the number of particles and complexity of the problem increases. As a result, the execution speed will be too slow to achieve the optimized solution. Thus, this paper proposes an automated design and optimization method for rotary MRF brakes and similar multi-physics problems. A modified PSO algorithm is developed for solving multi-physics engineering optimization problems. The difference between the proposed method and the conventional PSO is to split up the original single population into several subpopulations according to the division of labor. The distribution of tasks and the transfer of information to the next party have been inspired by behaviors of a hunting party. Simulation results show that the proposed modified PSO algorithm can overcome the problem of heavy computational burden of multi-physics problems while improving the accuracy. Wire type, MR fluid type, magnetic core material, and ideal current inputs have been determined by the optimization process. To the best of the authors' knowledge, this multi-physics approach is novel for optimizing rotary MRF brakes and the developed PSO algorithm is capable of solving other multi-physics engineering optimization problems. The proposed method has showed both better performance compared to the conventional PSO and also has provided small, lightweight, high impedance rotary MRF brake designs.
Onboard autonomous mission re-planning for multi-satellite system

NASA Astrophysics Data System (ADS)

Zheng, Zixuan; Guo, Jian; Gill, Eberhard

2018-04-01

This paper presents an onboard autonomous mission re-planning system for Multi-Satellites System (MSS) to perform onboard re-planing in disruptive situations. The proposed re-planning system can deal with different potential emergency situations. This paper uses Multi-Objective Hybrid Dynamic Mutation Genetic Algorithm (MO-HDM GA) combined with re-planning techniques as the core algorithm. The Cyclically Re-planning Method (CRM) and the Near Real-time Re-planning Method (NRRM) are developed to meet different mission requirements. Simulations results show that both methods can provide feasible re-planning sequences under unforeseen situations. The comparisons illustrate that using the CRM is average 20% faster than the NRRM on computation time. However, by using the NRRM more raw data can be observed and transmitted than using the CRM within the same period. The usability of this onboard re-planning system is not limited to multi-satellite system. Other mission planning and re-planning problems related to autonomous multiple vehicles with similar demands are also applicable.
Multi-jagged: A scalable parallel spatial partitioning algorithm

DOE PAGES

Deveci, Mehmet; Rajamanickam, Sivasankaran; Devine, Karen D.; ...

2015-03-18

Geometric partitioning is fast and effective for load-balancing dynamic applications, particularly those requiring geometric locality of data (particle methods, crash simulations). We present, to our knowledge, the first parallel implementation of a multidimensional-jagged geometric partitioner. In contrast to the traditional recursive coordinate bisection algorithm (RCB), which recursively bisects subdomains perpendicular to their longest dimension until the desired number of parts is obtained, our algorithm does recursive multi-section with a given number of parts in each dimension. By computing multiple cut lines concurrently and intelligently deciding when to migrate data while computing the partition, we minimize data movement compared to efficientmore » implementations of recursive bisection. We demonstrate the algorithm's scalability and quality relative to the RCB implementation in Zoltan on both real and synthetic datasets. Our experiments show that the proposed algorithm performs and scales better than RCB in terms of run-time without degrading the load balance. Lastly, our implementation partitions 24 billion points into 65,536 parts within a few seconds and exhibits near perfect weak scaling up to 6K cores.« less
Broadband Gerchberg-Saxton algorithm for freeform diffractive spectral filter design.

PubMed

Vorndran, Shelby; Russo, Juan M; Wu, Yuechen; Pelaez, Silvana Ayala; Kostuk, Raymond K

2015-11-30

A multi-wavelength expansion of the Gerchberg-Saxton (GS) algorithm is developed to design and optimize a surface relief Diffractive Optical Element (DOE). The DOE simultaneously diffracts distinct wavelength bands into separate target regions. A description of the algorithm is provided, and parameters that affect filter performance are examined. Performance is based on the spectral power collected within specified regions on a receiver plane. The modified GS algorithm is used to design spectrum splitting optics for CdSe and Si photovoltaic (PV) cells. The DOE has average optical efficiency of 87.5% over the spectral bands of interest (400-710 nm and 710-1100 nm). Simulated PV conversion efficiency is 37.7%, which is 29.3% higher than the efficiency of the better performing PV cell without spectrum splitting optics.

A high-performance spatial database based approach for pathology imaging algorithm evaluation

PubMed Central

Wang, Fusheng; Kong, Jun; Gao, Jingjing; Cooper, Lee A.D.; Kurc, Tahsin; Zhou, Zhengwen; Adler, David; Vergara-Niedermayr, Cristobal; Katigbak, Bryan; Brat, Daniel J.; Saltz, Joel H.

2013-01-01

Background: Algorithm evaluation provides a means to characterize variability across image analysis algorithms, validate algorithms by comparison with human annotations, combine results from multiple algorithms for performance improvement, and facilitate algorithm sensitivity studies. The sizes of images and image analysis results in pathology image analysis pose significant challenges in algorithm evaluation. We present an efficient parallel spatial database approach to model, normalize, manage, and query large volumes of analytical image result data. This provides an efficient platform for algorithm evaluation. Our experiments with a set of brain tumor images demonstrate the application, scalability, and effectiveness of the platform. Context: The paper describes an approach and platform for evaluation of pathology image analysis algorithms. The platform facilitates algorithm evaluation through a high-performance database built on the Pathology Analytic Imaging Standards (PAIS) data model. Aims: (1) Develop a framework to support algorithm evaluation by modeling and managing analytical results and human annotations from pathology images; (2) Create a robust data normalization tool for converting, validating, and fixing spatial data from algorithm or human annotations; (3) Develop a set of queries to support data sampling and result comparisons; (4) Achieve high performance computation capacity via a parallel data management infrastructure, parallel data loading and spatial indexing optimizations in this infrastructure. Materials and Methods: We have considered two scenarios for algorithm evaluation: (1) algorithm comparison where multiple result sets from different methods are compared and consolidated; and (2) algorithm validation where algorithm results are compared with human annotations. We have developed a spatial normalization toolkit to validate and normalize spatial boundaries produced by image analysis algorithms or human annotations. The validated data were formatted based on the PAIS data model and loaded into a spatial database. To support efficient data loading, we have implemented a parallel data loading tool that takes advantage of multi-core CPUs to accelerate data injection. The spatial database manages both geometric shapes and image features or classifications, and enables spatial sampling, result comparison, and result aggregation through expressive structured query language (SQL) queries with spatial extensions. To provide scalable and efficient query support, we have employed a shared nothing parallel database architecture, which distributes data homogenously across multiple database partitions to take advantage of parallel computation power and implements spatial indexing to achieve high I/O throughput. Results: Our work proposes a high performance, parallel spatial database platform for algorithm validation and comparison. This platform was evaluated by storing, managing, and comparing analysis results from a set of brain tumor whole slide images. The tools we develop are open source and available to download. Conclusions: Pathology image algorithm validation and comparison are essential to iterative algorithm development and refinement. One critical component is the support for queries involving spatial predicates and comparisons. In our work, we develop an efficient data model and parallel database approach to model, normalize, manage and query large volumes of analytical image result data. Our experiments demonstrate that the data partitioning strategy and the grid-based indexing result in good data distribution across database nodes and reduce I/O overhead in spatial join queries through parallel retrieval of relevant data and quick subsetting of datasets. The set of tools in the framework provide a full pipeline to normalize, load, manage and query analytical results for algorithm evaluation. PMID:23599905
Multi-Sensor Registration of Earth Remotely Sensed Imagery

NASA Technical Reports Server (NTRS)

LeMoigne, Jacqueline; Cole-Rhodes, Arlene; Eastman, Roger; Johnson, Kisha; Morisette, Jeffrey; Netanyahu, Nathan S.; Stone, Harold S.; Zavorin, Ilya; Zukor, Dorothy (Technical Monitor)

2001-01-01

Assuming that approximate registration is given within a few pixels by a systematic correction system, we develop automatic image registration methods for multi-sensor data with the goal of achieving sub-pixel accuracy. Automatic image registration is usually defined by three steps; feature extraction, feature matching, and data resampling or fusion. Our previous work focused on image correlation methods based on the use of different features. In this paper, we study different feature matching techniques and present five algorithms where the features are either original gray levels or wavelet-like features, and the feature matching is based on gradient descent optimization, statistical robust matching, and mutual information. These algorithms are tested and compared on several multi-sensor datasets covering one of the EOS Core Sites, the Konza Prairie in Kansas, from four different sensors: IKONOS (4m), Landsat-7/ETM+ (30m), MODIS (500m), and SeaWIFS (1000m).
A new artefacts resistant method for automatic lineament extraction using Multi-Hillshade Hierarchic Clustering (MHHC)

NASA Astrophysics Data System (ADS)

Šilhavý, Jakub; Minár, Jozef; Mentlík, Pavel; Sládek, Ján

2016-07-01

This paper presents a new method of automatic lineament extraction which includes the removal of the 'artefacts effect' which is associated with the process of raster based analysis. The core of the proposed Multi-Hillshade Hierarchic Clustering (MHHC) method incorporates a set of variously illuminated and rotated hillshades in combination with hierarchic clustering of derived 'protolineaments'. The algorithm also includes classification into positive and negative lineaments. MHHC was tested in two different territories in Bohemian Forest and Central Western Carpathians. The original vector-based algorithm was developed for comparison of the individual lineaments proximity. Its use confirms the compatibility of manual and automatic extraction and their similar relationships to structural data in the study areas.
Implementation and evaluation of ILLIAC 4 algorithms for multispectral image processing

NASA Technical Reports Server (NTRS)

Swain, P. H.

1974-01-01

Data concerning a multidisciplinary and multi-organizational effort to implement multispectral data analysis algorithms on a revolutionary computer, the Illiac 4, are reported. The effectiveness and efficiency of implementing the digital multispectral data analysis techniques for producing useful land use classifications from satellite collected data were demonstrated.
A New Algorithm Using the Non-Dominated Tree to Improve Non-Dominated Sorting.

PubMed

Gustavsson, Patrik; Syberfeldt, Anna

2018-01-01

Non-dominated sorting is a technique often used in evolutionary algorithms to determine the quality of solutions in a population. The most common algorithm is the Fast Non-dominated Sort (FNS). This algorithm, however, has the drawback that its performance deteriorates when the population size grows. The same drawback applies also to other non-dominating sorting algorithms such as the Efficient Non-dominated Sort with Binary Strategy (ENS-BS). An algorithm suggested to overcome this drawback is the Divide-and-Conquer Non-dominated Sort (DCNS) which works well on a limited number of objectives but deteriorates when the number of objectives grows. This article presents a new, more efficient algorithm called the Efficient Non-dominated Sort with Non-Dominated Tree (ENS-NDT). ENS-NDT is an extension of the ENS-BS algorithm and uses a novel Non-Dominated Tree (NDTree) to speed up the non-dominated sorting. ENS-NDT is able to handle large population sizes and a large number of objectives more efficiently than existing algorithms for non-dominated sorting. In the article, it is shown that with ENS-NDT the runtime of multi-objective optimization algorithms such as the Non-Dominated Sorting Genetic Algorithm II (NSGA-II) can be substantially reduced.
An Evolutionary Optimization of the Refueling Simulation for a CANDU Reactor

NASA Astrophysics Data System (ADS)

Do, Q. B.; Choi, H.; Roh, G. H.

2006-10-01

This paper presents a multi-cycle and multi-objective optimization method for the refueling simulation of a 713 MWe Canada deuterium uranium (CANDU-6) reactor based on a genetic algorithm, an elitism strategy and a heuristic rule. The proposed algorithm searches for the optimal refueling patterns for a single cycle that maximizes the average discharge burnup, minimizes the maximum channel power and minimizes the change in the zone controller unit water fills while satisfying the most important safety-related neutronic parameters of the reactor core. The heuristic rule generates an initial population of individuals very close to a feasible solution and it reduces the computing time of the optimization process. The multi-cycle optimization is carried out based on a single cycle refueling simulation. The proposed approach was verified by a refueling simulation of a natural uranium CANDU-6 reactor for an operation period of 6 months at an equilibrium state and compared with the experience-based automatic refueling simulation and the generalized perturbation theory. The comparison has shown that the simulation results are consistent from each other and the proposed approach is a reasonable optimization method of the refueling simulation that controls all the safety-related parameters of the reactor core during the simulation
Importance of multi-modal approaches to effectively identify cataract cases from electronic health records.

PubMed

Peissig, Peggy L; Rasmussen, Luke V; Berg, Richard L; Linneman, James G; McCarty, Catherine A; Waudby, Carol; Chen, Lin; Denny, Joshua C; Wilke, Russell A; Pathak, Jyotishman; Carrell, David; Kho, Abel N; Starren, Justin B

2012-01-01

There is increasing interest in using electronic health records (EHRs) to identify subjects for genomic association studies, due in part to the availability of large amounts of clinical data and the expected cost efficiencies of subject identification. We describe the construction and validation of an EHR-based algorithm to identify subjects with age-related cataracts. We used a multi-modal strategy consisting of structured database querying, natural language processing on free-text documents, and optical character recognition on scanned clinical images to identify cataract subjects and related cataract attributes. Extensive validation on 3657 subjects compared the multi-modal results to manual chart review. The algorithm was also implemented at participating electronic MEdical Records and GEnomics (eMERGE) institutions. An EHR-based cataract phenotyping algorithm was successfully developed and validated, resulting in positive predictive values (PPVs) >95%. The multi-modal approach increased the identification of cataract subject attributes by a factor of three compared to single-mode approaches while maintaining high PPV. Components of the cataract algorithm were successfully deployed at three other institutions with similar accuracy. A multi-modal strategy incorporating optical character recognition and natural language processing may increase the number of cases identified while maintaining similar PPVs. Such algorithms, however, require that the needed information be embedded within clinical documents. We have demonstrated that algorithms to identify and characterize cataracts can be developed utilizing data collected via the EHR. These algorithms provide a high level of accuracy even when implemented across multiple EHRs and institutional boundaries.
Investigation of Large Scale Cortical Models on Clustered Multi-Core Processors

DTIC Science & Technology

2013-02-01

with the bias node ( gray ) denoted as ww and the weights associated with the remaining first layer nodes (black) denoted as W. In forming the overall...Implementation of RBF network on GPU Platform 3.5.1 The Cholesky decomposition algorithm We need to invert the matrix multiplication GTG to
Accelerated simulation of stochastic particle removal processes in particle-resolved aerosol models

DOE Office of Scientific and Technical Information (OSTI.GOV)

Curtis, J.H.; Michelotti, M.D.; Riemer, N.

2016-10-01

Stochastic particle-resolved methods have proven useful for simulating multi-dimensional systems such as composition-resolved aerosol size distributions. While particle-resolved methods have substantial benefits for highly detailed simulations, these techniques suffer from high computational cost, motivating efforts to improve their algorithmic efficiency. Here we formulate an algorithm for accelerating particle removal processes by aggregating particles of similar size into bins. We present the Binned Algorithm for particle removal processes and analyze its performance with application to the atmospherically relevant process of aerosol dry deposition. We show that the Binned Algorithm can dramatically improve the efficiency of particle removals, particularly for low removalmore » rates, and that computational cost is reduced without introducing additional error. In simulations of aerosol particle removal by dry deposition in atmospherically relevant conditions, we demonstrate about 50-times increase in algorithm efficiency.« less
Cellular automata-based modelling and simulation of biofilm structure on multi-core computers.

PubMed

Skoneczny, Szymon

2015-01-01

The article presents a mathematical model of biofilm growth for aerobic biodegradation of a toxic carbonaceous substrate. Modelling of biofilm growth has fundamental significance in numerous processes of biotechnology and mathematical modelling of bioreactors. The process following double-substrate kinetics with substrate inhibition proceeding in a biofilm has not been modelled so far by means of cellular automata. Each process in the model proposed, i.e. diffusion of substrates, uptake of substrates, growth and decay of microorganisms and biofilm detachment, is simulated in a discrete manner. It was shown that for flat biofilm of constant thickness, the results of the presented model agree with those of a continuous model. The primary outcome of the study was to propose a mathematical model of biofilm growth; however a considerable amount of focus was also placed on the development of efficient algorithms for its solution. Two parallel algorithms were created, differing in the way computations are distributed. Computer programs were created using OpenMP Application Programming Interface for C++ programming language. Simulations of biofilm growth were performed on three high-performance computers. Speed-up coefficients of computer programs were compared. Both algorithms enabled a significant reduction of computation time. It is important, inter alia, in modelling and simulation of bioreactor dynamics.
Performance impact of mutation operators of a subpopulation-based genetic algorithm for multi-robot task allocation problems.

PubMed

Liu, Chun; Kroll, Andreas

2016-01-01

Multi-robot task allocation determines the task sequence and distribution for a group of robots in multi-robot systems, which is one of constrained combinatorial optimization problems and more complex in case of cooperative tasks because they introduce additional spatial and temporal constraints. To solve multi-robot task allocation problems with cooperative tasks efficiently, a subpopulation-based genetic algorithm, a crossover-free genetic algorithm employing mutation operators and elitism selection in each subpopulation, is developed in this paper. Moreover, the impact of mutation operators (swap, insertion, inversion, displacement, and their various combinations) is analyzed when solving several industrial plant inspection problems. The experimental results show that: (1) the proposed genetic algorithm can obtain better solutions than the tested binary tournament genetic algorithm with partially mapped crossover; (2) inversion mutation performs better than other tested mutation operators when solving problems without cooperative tasks, and the swap-inversion combination performs better than other tested mutation operators/combinations when solving problems with cooperative tasks. As it is difficult to produce all desired effects with a single mutation operator, using multiple mutation operators (including both inversion and swap) is suggested when solving similar combinatorial optimization problems.
Protein complex prediction for large protein protein interaction networks with the Core&Peel method.

PubMed

Pellegrini, Marco; Baglioni, Miriam; Geraci, Filippo

2016-11-08

Biological networks play an increasingly important role in the exploration of functional modularity and cellular organization at a systemic level. Quite often the first tools used to analyze these networks are clustering algorithms. We concentrate here on the specific task of predicting protein complexes (PC) in large protein-protein interaction networks (PPIN). Currently, many state-of-the-art algorithms work well for networks of small or moderate size. However, their performance on much larger networks, which are becoming increasingly common in modern proteome-wise studies, needs to be re-assessed. We present a new fast algorithm for clustering large sparse networks: Core&Peel, which runs essentially in time and storage O(a(G)m+n) for a network G of n nodes and m arcs, where a(G) is the arboricity of G (which is roughly proportional to the maximum average degree of any induced subgraph in G). We evaluated Core&Peel on five PPI networks of large size and one of medium size from both yeast and homo sapiens, comparing its performance against those of ten state-of-the-art methods. We demonstrate that Core&Peel consistently outperforms the ten competitors in its ability to identify known protein complexes and in the functional coherence of its predictions. Our method is remarkably robust, being quite insensible to the injection of random interactions. Core&Peel is also empirically efficient attaining the second best running time over large networks among the tested algorithms. Our algorithm Core&Peel pushes forward the state-of the-art in PPIN clustering providing an algorithmic solution with polynomial running time that attains experimentally demonstrable good output quality and speed on challenging large real networks.
Genetic Algorithms Applied to Multi-Objective Aerodynamic Shape Optimization

NASA Technical Reports Server (NTRS)

Holst, Terry L.

2004-01-01

A genetic algorithm approach suitable for solving multi-objective optimization problems is described and evaluated using a series of aerodynamic shape optimization problems. Several new features including two variations of a binning selection algorithm and a gene-space transformation procedure are included. The genetic algorithm is suitable for finding pareto optimal solutions in search spaces that are defined by any number of genes and that contain any number of local extrema. A new masking array capability is included allowing any gene or gene subset to be eliminated as decision variables from the design space. This allows determination of the effect of a single gene or gene subset on the pareto optimal solution. Results indicate that the genetic algorithm optimization approach is flexible in application and reliable. The binning selection algorithms generally provide pareto front quality enhancements and moderate convergence efficiency improvements for most of the problems solved.
Genetic Algorithms Applied to Multi-Objective Aerodynamic Shape Optimization

NASA Technical Reports Server (NTRS)

Holst, Terry L.

2005-01-01

A genetic algorithm approach suitable for solving multi-objective problems is described and evaluated using a series of aerodynamic shape optimization problems. Several new features including two variations of a binning selection algorithm and a gene-space transformation procedure are included. The genetic algorithm is suitable for finding Pareto optimal solutions in search spaces that are defined by any number of genes and that contain any number of local extrema. A new masking array capability is included allowing any gene or gene subset to be eliminated as decision variables from the design space. This allows determination of the effect of a single gene or gene subset on the Pareto optimal solution. Results indicate that the genetic algorithm optimization approach is flexible in application and reliable. The binning selection algorithms generally provide Pareto front quality enhancements and moderate convergence efficiency improvements for most of the problems solved.
CMS event processing multi-core efficiency status

NASA Astrophysics Data System (ADS)

Jones, C. D.; CMS Collaboration

2017-10-01

In 2015, CMS was the first LHC experiment to begin using a multi-threaded framework for doing event processing. This new framework utilizes Intel’s Thread Building Block library to manage concurrency via a task based processing model. During the 2015 LHC run period, CMS only ran reconstruction jobs using multiple threads because only those jobs were sufficiently thread efficient. Recent work now allows simulation and digitization to be thread efficient. In addition, during 2015 the multi-threaded framework could run events in parallel but could only use one thread per event. Work done in 2016 now allows multiple threads to be used while processing one event. In this presentation we will show how these recent changes have improved CMS’s overall threading and memory efficiency and we will discuss work to be done to further increase those efficiencies.
PGA/MOEAD: a preference-guided evolutionary algorithm for multi-objective decision-making problems with interval-valued fuzzy preferences

NASA Astrophysics Data System (ADS)

Luo, Bin; Lin, Lin; Zhong, ShiSheng

2018-02-01

In this research, we propose a preference-guided optimisation algorithm for multi-criteria decision-making (MCDM) problems with interval-valued fuzzy preferences. The interval-valued fuzzy preferences are decomposed into a series of precise and evenly distributed preference-vectors (reference directions) regarding the objectives to be optimised on the basis of uniform design strategy firstly. Then the preference information is further incorporated into the preference-vectors based on the boundary intersection approach, meanwhile, the MCDM problem with interval-valued fuzzy preferences is reformulated into a series of single-objective optimisation sub-problems (each sub-problem corresponds to a decomposed preference-vector). Finally, a preference-guided optimisation algorithm based on MOEA/D (multi-objective evolutionary algorithm based on decomposition) is proposed to solve the sub-problems in a single run. The proposed algorithm incorporates the preference-vectors within the optimisation process for guiding the search procedure towards a more promising subset of the efficient solutions matching the interval-valued fuzzy preferences. In particular, lots of test instances and an engineering application are employed to validate the performance of the proposed algorithm, and the results demonstrate the effectiveness and feasibility of the algorithm.
A modified approach combining FNEA and watershed algorithms for segmenting remotely-sensed optical images

NASA Astrophysics Data System (ADS)

Liu, Likun

2018-01-01

In the field of remote sensing image processing, remote sensing image segmentation is a preliminary step for later analysis of remote sensing image processing and semi-auto human interpretation, fully-automatic machine recognition and learning. Since 2000, a technique of object-oriented remote sensing image processing method and its basic thought prevails. The core of the approach is Fractal Net Evolution Approach (FNEA) multi-scale segmentation algorithm. The paper is intent on the research and improvement of the algorithm, which analyzes present segmentation algorithms and selects optimum watershed algorithm as an initialization. Meanwhile, the algorithm is modified by modifying an area parameter, and then combining area parameter with a heterogeneous parameter further. After that, several experiments is carried on to prove the modified FNEA algorithm, compared with traditional pixel-based method (FCM algorithm based on neighborhood information) and combination of FNEA and watershed, has a better segmentation result.
Regional-scale calculation of the LS factor using parallel processing

NASA Astrophysics Data System (ADS)

Liu, Kai; Tang, Guoan; Jiang, Ling; Zhu, A.-Xing; Yang, Jianyi; Song, Xiaodong

2015-05-01

With the increase of data resolution and the increasing application of USLE over large areas, the existing serial implementation of algorithms for computing the LS factor is becoming a bottleneck. In this paper, a parallel processing model based on message passing interface (MPI) is presented for the calculation of the LS factor, so that massive datasets at a regional scale can be processed efficiently. The parallel model contains algorithms for calculating flow direction, flow accumulation, drainage network, slope, slope length and the LS factor. According to the existence of data dependence, the algorithms are divided into local algorithms and global algorithms. Parallel strategy are designed according to the algorithm characters including the decomposition method for maintaining the integrity of the results, optimized workflow for reducing the time taken for exporting the unnecessary intermediate data and a buffer-communication-computation strategy for improving the communication efficiency. Experiments on a multi-node system show that the proposed parallel model allows efficient calculation of the LS factor at a regional scale with a massive dataset.
Accelerating Spaceborne SAR Imaging Using Multiple CPU/GPU Deep Collaborative Computing

PubMed Central

Zhang, Fan; Li, Guojun; Li, Wei; Hu, Wei; Hu, Yuxin

2016-01-01

With the development of synthetic aperture radar (SAR) technologies in recent years, the huge amount of remote sensing data brings challenges for real-time imaging processing. Therefore, high performance computing (HPC) methods have been presented to accelerate SAR imaging, especially the GPU based methods. In the classical GPU based imaging algorithm, GPU is employed to accelerate image processing by massive parallel computing, and CPU is only used to perform the auxiliary work such as data input/output (IO). However, the computing capability of CPU is ignored and underestimated. In this work, a new deep collaborative SAR imaging method based on multiple CPU/GPU is proposed to achieve real-time SAR imaging. Through the proposed tasks partitioning and scheduling strategy, the whole image can be generated with deep collaborative multiple CPU/GPU computing. In the part of CPU parallel imaging, the advanced vector extension (AVX) method is firstly introduced into the multi-core CPU parallel method for higher efficiency. As for the GPU parallel imaging, not only the bottlenecks of memory limitation and frequent data transferring are broken, but also kinds of optimized strategies are applied, such as streaming, parallel pipeline and so on. Experimental results demonstrate that the deep CPU/GPU collaborative imaging method enhances the efficiency of SAR imaging on single-core CPU by 270 times and realizes the real-time imaging in that the imaging rate outperforms the raw data generation rate. PMID:27070606
Accelerating Spaceborne SAR Imaging Using Multiple CPU/GPU Deep Collaborative Computing.

PubMed

Zhang, Fan; Li, Guojun; Li, Wei; Hu, Wei; Hu, Yuxin

2016-04-07

With the development of synthetic aperture radar (SAR) technologies in recent years, the huge amount of remote sensing data brings challenges for real-time imaging processing. Therefore, high performance computing (HPC) methods have been presented to accelerate SAR imaging, especially the GPU based methods. In the classical GPU based imaging algorithm, GPU is employed to accelerate image processing by massive parallel computing, and CPU is only used to perform the auxiliary work such as data input/output (IO). However, the computing capability of CPU is ignored and underestimated. In this work, a new deep collaborative SAR imaging method based on multiple CPU/GPU is proposed to achieve real-time SAR imaging. Through the proposed tasks partitioning and scheduling strategy, the whole image can be generated with deep collaborative multiple CPU/GPU computing. In the part of CPU parallel imaging, the advanced vector extension (AVX) method is firstly introduced into the multi-core CPU parallel method for higher efficiency. As for the GPU parallel imaging, not only the bottlenecks of memory limitation and frequent data transferring are broken, but also kinds of optimized strategies are applied, such as streaming, parallel pipeline and so on. Experimental results demonstrate that the deep CPU/GPU collaborative imaging method enhances the efficiency of SAR imaging on single-core CPU by 270 times and realizes the real-time imaging in that the imaging rate outperforms the raw data generation rate.

Crack Damage Detection Method via Multiple Visual Features and Efficient Multi-Task Learning Model.

PubMed

Wang, Baoxian; Zhao, Weigang; Gao, Po; Zhang, Yufeng; Wang, Zhe

2018-06-02

This paper proposes an effective and efficient model for concrete crack detection. The presented work consists of two modules: multi-view image feature extraction and multi-task crack region detection. Specifically, multiple visual features (such as texture, edge, etc.) of image regions are calculated, which can suppress various background noises (such as illumination, pockmark, stripe, blurring, etc.). With the computed multiple visual features, a novel crack region detector is advocated using a multi-task learning framework, which involves restraining the variability for different crack region features and emphasizing the separability between crack region features and complex background ones. Furthermore, the extreme learning machine is utilized to construct this multi-task learning model, thereby leading to high computing efficiency and good generalization. Experimental results of the practical concrete images demonstrate that the developed algorithm can achieve favorable crack detection performance compared with traditional crack detectors.
Efficient Implementation of MrBayes on Multi-GPU

PubMed Central

Zhou, Jianfu; Liu, Xiaoguang; Wang, Gang

2013-01-01

MrBayes, using Metropolis-coupled Markov chain Monte Carlo (MCMCMC or (MC)3), is a popular program for Bayesian inference. As a leading method of using DNA data to infer phylogeny, the (MC)3 Bayesian algorithm and its improved and parallel versions are now not fast enough for biologists to analyze massive real-world DNA data. Recently, graphics processor unit (GPU) has shown its power as a coprocessor (or rather, an accelerator) in many fields. This article describes an efficient implementation a(MC)3 (aMCMCMC) for MrBayes (MC)3 on compute unified device architecture. By dynamically adjusting the task granularity to adapt to input data size and hardware configuration, it makes full use of GPU cores with different data sets. An adaptive method is also developed to split and combine DNA sequences to make full use of a large number of GPU cards. Furthermore, a new “node-by-node” task scheduling strategy is developed to improve concurrency, and several optimizing methods are used to reduce extra overhead. Experimental results show that a(MC)3 achieves up to 63× speedup over serial MrBayes on a single machine with one GPU card, and up to 170× speedup with four GPU cards, and up to 478× speedup with a 32-node GPU cluster. a(MC)3 is dramatically faster than all the previous (MC)3 algorithms and scales well to large GPU clusters. PMID:23493260
Efficient implementation of MrBayes on multi-GPU.

PubMed

Bao, Jie; Xia, Hongju; Zhou, Jianfu; Liu, Xiaoguang; Wang, Gang

2013-06-01

MrBayes, using Metropolis-coupled Markov chain Monte Carlo (MCMCMC or (MC)(3)), is a popular program for Bayesian inference. As a leading method of using DNA data to infer phylogeny, the (MC)(3) Bayesian algorithm and its improved and parallel versions are now not fast enough for biologists to analyze massive real-world DNA data. Recently, graphics processor unit (GPU) has shown its power as a coprocessor (or rather, an accelerator) in many fields. This article describes an efficient implementation a(MC)(3) (aMCMCMC) for MrBayes (MC)(3) on compute unified device architecture. By dynamically adjusting the task granularity to adapt to input data size and hardware configuration, it makes full use of GPU cores with different data sets. An adaptive method is also developed to split and combine DNA sequences to make full use of a large number of GPU cards. Furthermore, a new "node-by-node" task scheduling strategy is developed to improve concurrency, and several optimizing methods are used to reduce extra overhead. Experimental results show that a(MC)(3) achieves up to 63× speedup over serial MrBayes on a single machine with one GPU card, and up to 170× speedup with four GPU cards, and up to 478× speedup with a 32-node GPU cluster. a(MC)(3) is dramatically faster than all the previous (MC)(3) algorithms and scales well to large GPU clusters.
Convergence and Applications of a Gossip-Based Gauss-Newton Algorithm

NASA Astrophysics Data System (ADS)

Li, Xiao; Scaglione, Anna

2013-11-01

The Gauss-Newton algorithm is a popular and efficient centralized method for solving non-linear least squares problems. In this paper, we propose a multi-agent distributed version of this algorithm, named Gossip-based Gauss-Newton (GGN) algorithm, which can be applied in general problems with non-convex objectives. Furthermore, we analyze and present sufficient conditions for its convergence and show numerically that the GGN algorithm achieves performance comparable to the centralized algorithm, with graceful degradation in case of network failures. More importantly, the GGN algorithm provides significant performance gains compared to other distributed first order methods.
Multi-thread parallel algorithm for reconstructing 3D large-scale porous structures

NASA Astrophysics Data System (ADS)

Ju, Yang; Huang, Yaohui; Zheng, Jiangtao; Qian, Xu; Xie, Heping; Zhao, Xi

2017-04-01

Geomaterials inherently contain many discontinuous, multi-scale, geometrically irregular pores, forming a complex porous structure that governs their mechanical and transport properties. The development of an efficient reconstruction method for representing porous structures can significantly contribute toward providing a better understanding of the governing effects of porous structures on the properties of porous materials. In order to improve the efficiency of reconstructing large-scale porous structures, a multi-thread parallel scheme was incorporated into the simulated annealing reconstruction method. In the method, four correlation functions, which include the two-point probability function, the linear-path functions for the pore phase and the solid phase, and the fractal system function for the solid phase, were employed for better reproduction of the complex well-connected porous structures. In addition, a random sphere packing method and a self-developed pre-conditioning method were incorporated to cast the initial reconstructed model and select independent interchanging pairs for parallel multi-thread calculation, respectively. The accuracy of the proposed algorithm was evaluated by examining the similarity between the reconstructed structure and a prototype in terms of their geometrical, topological, and mechanical properties. Comparisons of the reconstruction efficiency of porous models with various scales indicated that the parallel multi-thread scheme significantly shortened the execution time for reconstruction of a large-scale well-connected porous model compared to a sequential single-thread procedure.
Reconstruction of calmodulin single-molecule FRET states, dye interactions, and CaMKII peptide binding by MultiNest and classic maximum entropy

NASA Astrophysics Data System (ADS)

DeVore, Matthew S.; Gull, Stephen F.; Johnson, Carey K.

2013-08-01

We analyzed single molecule FRET burst measurements using Bayesian nested sampling. The MultiNest algorithm produces accurate FRET efficiency distributions from single-molecule data. FRET efficiency distributions recovered by MultiNest and classic maximum entropy are compared for simulated data and for calmodulin labeled at residues 44 and 117. MultiNest compares favorably with maximum entropy analysis for simulated data, judged by the Bayesian evidence. FRET efficiency distributions recovered for calmodulin labeled with two different FRET dye pairs depended on the dye pair and changed upon Ca2+ binding. We also looked at the FRET efficiency distributions of calmodulin bound to the calcium/calmodulin dependent protein kinase II (CaMKII) binding domain. For both dye pairs, the FRET efficiency distribution collapsed to a single peak in the case of calmodulin bound to the CaMKII peptide. These measurements strongly suggest that consideration of dye-protein interactions is crucial in forming an accurate picture of protein conformations from FRET data.
Reconstruction of Calmodulin Single-Molecule FRET States, Dye-Interactions, and CaMKII Peptide Binding by MultiNest and Classic Maximum Entropy

PubMed Central

DeVore, Matthew S.; Gull, Stephen F.; Johnson, Carey K.

2013-01-01

We analyze single molecule FRET burst measurements using Bayesian nested sampling. The MultiNest algorithm produces accurate FRET efficiency distributions from single-molecule data. FRET efficiency distributions recovered by MultiNest and classic maximum entropy are compared for simulated data and for calmodulin labeled at residues 44 and 117. MultiNest compares favorably with maximum entropy analysis for simulated data, judged by the Bayesian evidence. FRET efficiency distributions recovered for calmodulin labeled with two different FRET dye pairs depended on the dye pair and changed upon Ca2+ binding. We also looked at the FRET efficiency distributions of calmodulin bound to the calcium/calmodulin dependent protein kinase II (CaMKII) binding domain. For both dye pairs, the FRET efficiency distribution collapsed to a single peak in the case of calmodulin bound to the CaMKII peptide. These measurements strongly suggest that consideration of dye-protein interactions is crucial in forming an accurate picture of protein conformations from FRET data. PMID:24223465
Reconstruction of Calmodulin Single-Molecule FRET States, Dye-Interactions, and CaMKII Peptide Binding by MultiNest and Classic Maximum Entropy.

PubMed

Devore, Matthew S; Gull, Stephen F; Johnson, Carey K

2013-08-30

We analyze single molecule FRET burst measurements using Bayesian nested sampling. The MultiNest algorithm produces accurate FRET efficiency distributions from single-molecule data. FRET efficiency distributions recovered by MultiNest and classic maximum entropy are compared for simulated data and for calmodulin labeled at residues 44 and 117. MultiNest compares favorably with maximum entropy analysis for simulated data, judged by the Bayesian evidence. FRET efficiency distributions recovered for calmodulin labeled with two different FRET dye pairs depended on the dye pair and changed upon Ca 2+ binding. We also looked at the FRET efficiency distributions of calmodulin bound to the calcium/calmodulin dependent protein kinase II (CaMKII) binding domain. For both dye pairs, the FRET efficiency distribution collapsed to a single peak in the case of calmodulin bound to the CaMKII peptide. These measurements strongly suggest that consideration of dye-protein interactions is crucial in forming an accurate picture of protein conformations from FRET data.
Multi-dimensional, fully implicit, exactly conserving electromagnetic particle-in-cell simulations in curvilinear geometry

NASA Astrophysics Data System (ADS)

Chen, Guangye; Chacon, Luis

2015-11-01

We discuss a new, conservative, fully implicit 2D3V Vlasov-Darwin particle-in-cell algorithm in curvilinear geometry for non-radiative, electromagnetic kinetic plasma simulations. Unlike standard explicit PIC schemes, fully implicit PIC algorithms are unconditionally stable and allow exact discrete energy and charge conservation. Here, we extend these algorithms to curvilinear geometry. The algorithm retains its exact conservation properties in curvilinear grids. The nonlinear iteration is effectively accelerated with a fluid preconditioner for weakly to modestly magnetized plasmas, which allows efficient use of large timesteps, O (√{mi/me}c/veT) larger than the explicit CFL. In this presentation, we will introduce the main algorithmic components of the approach, and demonstrate the accuracy and efficiency properties of the algorithm with various numerical experiments in 1D (slow shock) and 2D (island coalescense).
Parallel, stochastic measurement of molecular surface area.

PubMed

Juba, Derek; Varshney, Amitabh

2008-08-01

Biochemists often wish to compute surface areas of proteins. A variety of algorithms have been developed for this task, but they are designed for traditional single-processor architectures. The current trend in computer hardware is towards increasingly parallel architectures for which these algorithms are not well suited. We describe a parallel, stochastic algorithm for molecular surface area computation that maps well to the emerging multi-core architectures. Our algorithm is also progressive, providing a rough estimate of surface area immediately and refining this estimate as time goes on. Furthermore, the algorithm generates points on the molecular surface which can be used for point-based rendering. We demonstrate a GPU implementation of our algorithm and show that it compares favorably with several existing molecular surface computation programs, giving fast estimates of the molecular surface area with good accuracy.
Efficient search, mapping, and optimization of multi-protein genetic systems in diverse bacteria

PubMed Central

Farasat, Iman; Kushwaha, Manish; Collens, Jason; Easterbrook, Michael; Guido, Matthew; Salis, Howard M

2014-01-01

Developing predictive models of multi-protein genetic systems to understand and optimize their behavior remains a combinatorial challenge, particularly when measurement throughput is limited. We developed a computational approach to build predictive models and identify optimal sequences and expression levels, while circumventing combinatorial explosion. Maximally informative genetic system variants were first designed by the RBS Library Calculator, an algorithm to design sequences for efficiently searching a multi-protein expression space across a > 10,000-fold range with tailored search parameters and well-predicted translation rates. We validated the algorithm's predictions by characterizing 646 genetic system variants, encoded in plasmids and genomes, expressed in six gram-positive and gram-negative bacterial hosts. We then combined the search algorithm with system-level kinetic modeling, requiring the construction and characterization of 73 variants to build a sequence-expression-activity map (SEAMAP) for a biosynthesis pathway. Using model predictions, we designed and characterized 47 additional pathway variants to navigate its activity space, find optimal expression regions with desired activity response curves, and relieve rate-limiting steps in metabolism. Creating sequence-expression-activity maps accelerates the optimization of many protein systems and allows previous measurements to quantitatively inform future designs. PMID:24952589
Application of Multi-Objective Human Learning Optimization Method to Solve AC/DC Multi-Objective Optimal Power Flow Problem

NASA Astrophysics Data System (ADS)

Cao, Jia; Yan, Zheng; He, Guangyu

2016-06-01

This paper introduces an efficient algorithm, multi-objective human learning optimization method (MOHLO), to solve AC/DC multi-objective optimal power flow problem (MOPF). Firstly, the model of AC/DC MOPF including wind farms is constructed, where includes three objective functions, operating cost, power loss, and pollutant emission. Combining the non-dominated sorting technique and the crowding distance index, the MOHLO method can be derived, which involves individual learning operator, social learning operator, random exploration learning operator and adaptive strategies. Both the proposed MOHLO method and non-dominated sorting genetic algorithm II (NSGAII) are tested on an improved IEEE 30-bus AC/DC hybrid system. Simulation results show that MOHLO method has excellent search efficiency and the powerful ability of searching optimal. Above all, MOHLO method can obtain more complete pareto front than that by NSGAII method. However, how to choose the optimal solution from pareto front depends mainly on the decision makers who stand from the economic point of view or from the energy saving and emission reduction point of view.
Multi-Constraint Multi-Variable Optimization of Source-Driven Nuclear Systems

NASA Astrophysics Data System (ADS)

Watkins, Edward Francis

1995-01-01

A novel approach to the search for optimal designs of source-driven nuclear systems is investigated. Such systems include radiation shields, fusion reactor blankets and various neutron spectrum-shaping assemblies. The novel approach involves the replacement of the steepest-descents optimization algorithm incorporated in the code SWAN by a significantly more general and efficient sequential quadratic programming optimization algorithm provided by the code NPSOL. The resulting SWAN/NPSOL code system can be applied to more general, multi-variable, multi-constraint shield optimization problems. The constraints it accounts for may include simple bounds on variables, linear constraints, and smooth nonlinear constraints. It may also be applied to unconstrained, bound-constrained and linearly constrained optimization. The shield optimization capabilities of the SWAN/NPSOL code system is tested and verified in a variety of optimization problems: dose minimization at constant cost, cost minimization at constant dose, and multiple-nonlinear constraint optimization. The replacement of the optimization part of SWAN with NPSOL is found feasible and leads to a very substantial improvement in the complexity of optimization problems which can be efficiently handled.
Design of Compressed Sensing Algorithm for Coal Mine IoT Moving Measurement Data Based on a Multi-Hop Network and Total Variation.

PubMed

Wang, Gang; Zhao, Zhikai; Ning, Yongjie

2018-05-28

As the application of a coal mine Internet of Things (IoT), mobile measurement devices, such as intelligent mine lamps, cause moving measurement data to be increased. How to transmit these large amounts of mobile measurement data effectively has become an urgent problem. This paper presents a compressed sensing algorithm for the large amount of coal mine IoT moving measurement data based on a multi-hop network and total variation. By taking gas data in mobile measurement data as an example, two network models for the transmission of gas data flow, namely single-hop and multi-hop transmission modes, are investigated in depth, and a gas data compressed sensing collection model is built based on a multi-hop network. To utilize the sparse characteristics of gas data, the concept of total variation is introduced and a high-efficiency gas data compression and reconstruction method based on Total Variation Sparsity based on Multi-Hop (TVS-MH) is proposed. According to the simulation results, by using the proposed method, the moving measurement data flow from an underground distributed mobile network can be acquired and transmitted efficiently.
Fuzzy multi-objective chance-constrained programming model for hazardous materials transportation

NASA Astrophysics Data System (ADS)

Du, Jiaoman; Yu, Lean; Li, Xiang

2016-04-01

Hazardous materials transportation is an important and hot issue of public safety. Based on the shortest path model, this paper presents a fuzzy multi-objective programming model that minimizes the transportation risk to life, travel time and fuel consumption. First, we present the risk model, travel time model and fuel consumption model. Furthermore, we formulate a chance-constrained programming model within the framework of credibility theory, in which the lengths of arcs in the transportation network are assumed to be fuzzy variables. A hybrid intelligent algorithm integrating fuzzy simulation and genetic algorithm is designed for finding a satisfactory solution. Finally, some numerical examples are given to demonstrate the efficiency of the proposed model and algorithm.
Fast l₁-SPIRiT compressed sensing parallel imaging MRI: scalable parallel implementation and clinically feasible runtime.

PubMed

Murphy, Mark; Alley, Marcus; Demmel, James; Keutzer, Kurt; Vasanawala, Shreyas; Lustig, Michael

2012-06-01

We present l₁-SPIRiT, a simple algorithm for auto calibrating parallel imaging (acPI) and compressed sensing (CS) that permits an efficient implementation with clinically-feasible runtimes. We propose a CS objective function that minimizes cross-channel joint sparsity in the wavelet domain. Our reconstruction minimizes this objective via iterative soft-thresholding, and integrates naturally with iterative self-consistent parallel imaging (SPIRiT). Like many iterative magnetic resonance imaging reconstructions, l₁-SPIRiT's image quality comes at a high computational cost. Excessively long runtimes are a barrier to the clinical use of any reconstruction approach, and thus we discuss our approach to efficiently parallelizing l₁-SPIRiT and to achieving clinically-feasible runtimes. We present parallelizations of l₁-SPIRiT for both multi-GPU systems and multi-core CPUs, and discuss the software optimization and parallelization decisions made in our implementation. The performance of these alternatives depends on the processor architecture, the size of the image matrix, and the number of parallel imaging channels. Fundamentally, achieving fast runtime requires the correct trade-off between cache usage and parallelization overheads. We demonstrate image quality via a case from our clinical experimentation, using a custom 3DFT spoiled gradient echo (SPGR) sequence with up to 8× acceleration via Poisson-disc undersampling in the two phase-encoded directions.
Fast ℓ1-SPIRiT Compressed Sensing Parallel Imaging MRI: Scalable Parallel Implementation and Clinically Feasible Runtime

PubMed Central

Murphy, Mark; Alley, Marcus; Demmel, James; Keutzer, Kurt; Vasanawala, Shreyas; Lustig, Michael

2012-01-01

We present ℓ1-SPIRiT, a simple algorithm for auto calibrating parallel imaging (acPI) and compressed sensing (CS) that permits an efficient implementation with clinically-feasible runtimes. We propose a CS objective function that minimizes cross-channel joint sparsity in the Wavelet domain. Our reconstruction minimizes this objective via iterative soft-thresholding, and integrates naturally with iterative Self-Consistent Parallel Imaging (SPIRiT). Like many iterative MRI reconstructions, ℓ1-SPIRiT’s image quality comes at a high computational cost. Excessively long runtimes are a barrier to the clinical use of any reconstruction approach, and thus we discuss our approach to efficiently parallelizing ℓ1-SPIRiT and to achieving clinically-feasible runtimes. We present parallelizations of ℓ1-SPIRiT for both multi-GPU systems and multi-core CPUs, and discuss the software optimization and parallelization decisions made in our implementation. The performance of these alternatives depends on the processor architecture, the size of the image matrix, and the number of parallel imaging channels. Fundamentally, achieving fast runtime requires the correct trade-off between cache usage and parallelization overheads. We demonstrate image quality via a case from our clinical experimentation, using a custom 3DFT Spoiled Gradient Echo (SPGR) sequence with up to 8× acceleration via poisson-disc undersampling in the two phase-encoded directions. PMID:22345529
Large Scale Document Inversion using a Multi-threaded Computing System

PubMed Central

Jung, Sungbo; Chang, Dar-Jen; Park, Juw Won

2018-01-01

Current microprocessor architecture is moving towards multi-core/multi-threaded systems. This trend has led to a surge of interest in using multi-threaded computing devices, such as the Graphics Processing Unit (GPU), for general purpose computing. We can utilize the GPU in computation as a massive parallel coprocessor because the GPU consists of multiple cores. The GPU is also an affordable, attractive, and user-programmable commodity. Nowadays a lot of information has been flooded into the digital domain around the world. Huge volume of data, such as digital libraries, social networking services, e-commerce product data, and reviews, etc., is produced or collected every moment with dramatic growth in size. Although the inverted index is a useful data structure that can be used for full text searches or document retrieval, a large number of documents will require a tremendous amount of time to create the index. The performance of document inversion can be improved by multi-thread or multi-core GPU. Our approach is to implement a linear-time, hash-based, single program multiple data (SPMD), document inversion algorithm on the NVIDIA GPU/CUDA programming platform utilizing the huge computational power of the GPU, to develop high performance solutions for document indexing. Our proposed parallel document inversion system shows 2-3 times faster performance than a sequential system on two different test datasets from PubMed abstract and e-commerce product reviews. CCS Concepts •Information systems➝Information retrieval • Computing methodologies➝Massively parallel and high-performance simulations. PMID:29861701
Large Scale Document Inversion using a Multi-threaded Computing System.

PubMed

Jung, Sungbo; Chang, Dar-Jen; Park, Juw Won

2017-06-01

Current microprocessor architecture is moving towards multi-core/multi-threaded systems. This trend has led to a surge of interest in using multi-threaded computing devices, such as the Graphics Processing Unit (GPU), for general purpose computing. We can utilize the GPU in computation as a massive parallel coprocessor because the GPU consists of multiple cores. The GPU is also an affordable, attractive, and user-programmable commodity. Nowadays a lot of information has been flooded into the digital domain around the world. Huge volume of data, such as digital libraries, social networking services, e-commerce product data, and reviews, etc., is produced or collected every moment with dramatic growth in size. Although the inverted index is a useful data structure that can be used for full text searches or document retrieval, a large number of documents will require a tremendous amount of time to create the index. The performance of document inversion can be improved by multi-thread or multi-core GPU. Our approach is to implement a linear-time, hash-based, single program multiple data (SPMD), document inversion algorithm on the NVIDIA GPU/CUDA programming platform utilizing the huge computational power of the GPU, to develop high performance solutions for document indexing. Our proposed parallel document inversion system shows 2-3 times faster performance than a sequential system on two different test datasets from PubMed abstract and e-commerce product reviews. •Information systems➝Information retrieval • Computing methodologies➝Massively parallel and high-performance simulations.
On-Line Temperature Estimation for Noisy Thermal Sensors Using a Smoothing Filter-Based Kalman Predictor

PubMed Central

Li, Zhi; Wei, Henglu; Zhou, Wei; Duan, Zhemin

2018-01-01

Dynamic thermal management (DTM) mechanisms utilize embedded thermal sensors to collect fine-grained temperature information for monitoring the real-time thermal behavior of multi-core processors. However, embedded thermal sensors are very susceptible to a variety of sources of noise, including environmental uncertainty and process variation. This causes the discrepancies between actual temperatures and those observed by on-chip thermal sensors, which seriously affect the efficiency of DTM. In this paper, a smoothing filter-based Kalman prediction technique is proposed to accurately estimate the temperatures from noisy sensor readings. For the multi-sensor estimation scenario, the spatial correlations among different sensor locations are exploited. On this basis, a multi-sensor synergistic calibration algorithm (known as MSSCA) is proposed to improve the simultaneous prediction accuracy of multiple sensors. Moreover, an infrared imaging-based temperature measurement technique is also proposed to capture the thermal traces of an advanced micro devices (AMD) quad-core processor in real time. The acquired real temperature data are used to evaluate our prediction performance. Simulation shows that the proposed synergistic calibration scheme can reduce the root-mean-square error (RMSE) by 1.2 ∘C and increase the signal-to-noise ratio (SNR) by 15.8 dB (with a very small average runtime overhead) compared with assuming the thermal sensor readings to be ideal. Additionally, the average false alarm rate (FAR) of the corrected sensor temperature readings can be reduced by 28.6%. These results clearly demonstrate that if our approach is used to perform temperature estimation, the response mechanisms of DTM can be triggered to adjust the voltages, frequencies, and cooling fan speeds at more appropriate times. PMID:29393862

On-Line Temperature Estimation for Noisy Thermal Sensors Using a Smoothing Filter-Based Kalman Predictor.

PubMed

Li, Xin; Ou, Xingtao; Li, Zhi; Wei, Henglu; Zhou, Wei; Duan, Zhemin

2018-02-02

Dynamic thermal management (DTM) mechanisms utilize embedded thermal sensors to collect fine-grained temperature information for monitoring the real-time thermal behavior of multi-core processors. However, embedded thermal sensors are very susceptible to a variety of sources of noise, including environmental uncertainty and process variation. This causes the discrepancies between actual temperatures and those observed by on-chip thermal sensors, which seriously affect the efficiency of DTM. In this paper, a smoothing filter-based Kalman prediction technique is proposed to accurately estimate the temperatures from noisy sensor readings. For the multi-sensor estimation scenario, the spatial correlations among different sensor locations are exploited. On this basis, a multi-sensor synergistic calibration algorithm (known as MSSCA) is proposed to improve the simultaneous prediction accuracy of multiple sensors. Moreover, an infrared imaging-based temperature measurement technique is also proposed to capture the thermal traces of an advanced micro devices (AMD) quad-core processor in real time. The acquired real temperature data are used to evaluate our prediction performance. Simulation shows that the proposed synergistic calibration scheme can reduce the root-mean-square error (RMSE) by 1.2 ∘ C and increase the signal-to-noise ratio (SNR) by 15.8 dB (with a very small average runtime overhead) compared with assuming the thermal sensor readings to be ideal. Additionally, the average false alarm rate (FAR) of the corrected sensor temperature readings can be reduced by 28.6%. These results clearly demonstrate that if our approach is used to perform temperature estimation, the response mechanisms of DTM can be triggered to adjust the voltages, frequencies, and cooling fan speeds at more appropriate times.
Memetic Algorithm-Based Multi-Objective Coverage Optimization for Wireless Sensor Networks

PubMed Central

Chen, Zhi; Li, Shuai; Yue, Wenjing

2014-01-01

Maintaining effective coverage and extending the network lifetime as much as possible has become one of the most critical issues in the coverage of WSNs. In this paper, we propose a multi-objective coverage optimization algorithm for WSNs, namely MOCADMA, which models the coverage control of WSNs as the multi-objective optimization problem. MOCADMA uses a memetic algorithm with a dynamic local search strategy to optimize the coverage of WSNs and achieve the objectives such as high network coverage, effective node utilization and more residual energy. In MOCADMA, the alternative solutions are represented as the chromosomes in matrix form, and the optimal solutions are selected through numerous iterations of the evolution process, including selection, crossover, mutation, local enhancement, and fitness evaluation. The experiment and evaluation results show MOCADMA can have good capabilities in maintaining the sensing coverage, achieve higher network coverage while improving the energy efficiency and effectively prolonging the network lifetime, and have a significant improvement over some existing algorithms. PMID:25360579
Memetic algorithm-based multi-objective coverage optimization for wireless sensor networks.

PubMed

Chen, Zhi; Li, Shuai; Yue, Wenjing

2014-10-30

Maintaining effective coverage and extending the network lifetime as much as possible has become one of the most critical issues in the coverage of WSNs. In this paper, we propose a multi-objective coverage optimization algorithm for WSNs, namely MOCADMA, which models the coverage control of WSNs as the multi-objective optimization problem. MOCADMA uses a memetic algorithm with a dynamic local search strategy to optimize the coverage of WSNs and achieve the objectives such as high network coverage, effective node utilization and more residual energy. In MOCADMA, the alternative solutions are represented as the chromosomes in matrix form, and the optimal solutions are selected through numerous iterations of the evolution process, including selection, crossover, mutation, local enhancement, and fitness evaluation. The experiment and evaluation results show MOCADMA can have good capabilities in maintaining the sensing coverage, achieve higher network coverage while improving the energy efficiency and effectively prolonging the network lifetime, and have a significant improvement over some existing algorithms.
Selection of core animals in the Algorithm for Proven and Young using a simulation model.

PubMed

Bradford, H L; Pocrnić, I; Fragomeni, B O; Lourenco, D A L; Misztal, I

2017-12-01

The Algorithm for Proven and Young (APY) enables the implementation of single-step genomic BLUP (ssGBLUP) in large, genotyped populations by separating genotyped animals into core and non-core subsets and creating a computationally efficient inverse for the genomic relationship matrix (G). As APY became the choice for large-scale genomic evaluations in BLUP-based methods, a common question is how to choose the animals in the core subset. We compared several core definitions to answer this question. Simulations comprised a moderately heritable trait for 95,010 animals and 50,000 genotypes for animals across five generations. Genotypes consisted of 25,500 SNP distributed across 15 chromosomes. Genotyping errors and missing pedigree were also mimicked. Core animals were defined based on individual generations, equal representation across generations, and at random. For a sufficiently large core size, core definitions had the same accuracies and biases, even if the core animals had imperfect genotypes. When genotyped animals had unknown parents, accuracy and bias were significantly better (p ≤ .05) for random and across generation core definitions. © 2017 The Authors. Journal of Animal Breeding and Genetics Published by Blackwell Verlag GmbH.
Importance of multi-modal approaches to effectively identify cataract cases from electronic health records

PubMed Central

Rasmussen, Luke V; Berg, Richard L; Linneman, James G; McCarty, Catherine A; Waudby, Carol; Chen, Lin; Denny, Joshua C; Wilke, Russell A; Pathak, Jyotishman; Carrell, David; Kho, Abel N; Starren, Justin B

2012-01-01

Objective There is increasing interest in using electronic health records (EHRs) to identify subjects for genomic association studies, due in part to the availability of large amounts of clinical data and the expected cost efficiencies of subject identification. We describe the construction and validation of an EHR-based algorithm to identify subjects with age-related cataracts. Materials and methods We used a multi-modal strategy consisting of structured database querying, natural language processing on free-text documents, and optical character recognition on scanned clinical images to identify cataract subjects and related cataract attributes. Extensive validation on 3657 subjects compared the multi-modal results to manual chart review. The algorithm was also implemented at participating electronic MEdical Records and GEnomics (eMERGE) institutions. Results An EHR-based cataract phenotyping algorithm was successfully developed and validated, resulting in positive predictive values (PPVs) >95%. The multi-modal approach increased the identification of cataract subject attributes by a factor of three compared to single-mode approaches while maintaining high PPV. Components of the cataract algorithm were successfully deployed at three other institutions with similar accuracy. Discussion A multi-modal strategy incorporating optical character recognition and natural language processing may increase the number of cases identified while maintaining similar PPVs. Such algorithms, however, require that the needed information be embedded within clinical documents. Conclusion We have demonstrated that algorithms to identify and characterize cataracts can be developed utilizing data collected via the EHR. These algorithms provide a high level of accuracy even when implemented across multiple EHRs and institutional boundaries. PMID:22319176
[GNU Pattern: open source pattern hunter for biological sequences based on SPLASH algorithm].

PubMed

Xu, Ying; Li, Yi-xue; Kong, Xiang-yin

2005-06-01

To construct a high performance open source software engine based on IBM SPLASH algorithm for later research on pattern discovery. Gpat, which is based on SPLASH algorithm, was developed by using open source software. GNU Pattern (Gpat) software was developped, which efficiently implemented the core part of SPLASH algorithm. Full source code of Gpat was also available for other researchers to modify the program under the GNU license. Gpat is a successful implementation of SPLASH algorithm and can be used as a basic framework for later research on pattern recognition in biological sequences.
Optimizing the Performance of Reactive Molecular Dynamics Simulations for Multi-core Architectures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aktulga, Hasan Metin; Coffman, Paul; Shan, Tzu-Ray

2015-12-01

Hybrid parallelism allows high performance computing applications to better leverage the increasing on-node parallelism of modern supercomputers. In this paper, we present a hybrid parallel implementation of the widely used LAMMPS/ReaxC package, where the construction of bonded and nonbonded lists and evaluation of complex ReaxFF interactions are implemented efficiently using OpenMP parallelism. Additionally, the performance of the QEq charge equilibration scheme is examined and a dual-solver is implemented. We present the performance of the resulting ReaxC-OMP package on a state-of-the-art multi-core architecture Mira, an IBM BlueGene/Q supercomputer. For system sizes ranging from 32 thousand to 16.6 million particles, speedups inmore » the range of 1.5-4.5x are observed using the new ReaxC-OMP software. Sustained performance improvements have been observed for up to 262,144 cores (1,048,576 processes) of Mira with a weak scaling efficiency of 91.5% in larger simulations containing 16.6 million particles.« less
Hop Optimization and Relay Node Selection in Multi-hop Wireless Ad-Hoc Networks

NASA Astrophysics Data System (ADS)

Li, Xiaohua(Edward)

In this paper we propose an efficient approach to determine the optimal hops for multi-hop ad hoc wireless networks. Based on the assumption that nodes use successive interference cancellation (SIC) and maximal ratio combining (MRC) to deal with mutual interference and to utilize all the received signal energy, we show that the signal-to-interference-plus-noise ratio (SINR) of a node is determined only by the nodes before it, not the nodes after it, along a packet forwarding path. Based on this observation, we propose an iterative procedure to select the relay nodes and to calculate the path SINR as well as capacity of an arbitrary multi-hop packet forwarding path. The complexity of the algorithm is extremely low, and scaling well with network size. The algorithm is applicable in arbitrarily large networks. Its performance is demonstrated as desirable by simulations. The algorithm can be helpful in analyzing the performance of multi-hop wireless networks.
Scalable Parallel Density-based Clustering and Applications

NASA Astrophysics Data System (ADS)

Patwary, Mostofa Ali

2014-04-01

Recently, density-based clustering algorithms (DBSCAN and OPTICS) have gotten significant attention of the scientific community due to their unique capability of discovering arbitrary shaped clusters and eliminating noise data. These algorithms have several applications, which require high performance computing, including finding halos and subhalos (clusters) from massive cosmology data in astrophysics, analyzing satellite images, X-ray crystallography, and anomaly detection. However, parallelization of these algorithms are extremely challenging as they exhibit inherent sequential data access order, unbalanced workload resulting in low parallel efficiency. To break the data access sequentiality and to achieve high parallelism, we develop new parallel algorithms, both for DBSCAN and OPTICS, designed using graph algorithmic techniques. For example, our parallel DBSCAN algorithm exploits the similarities between DBSCAN and computing connected components. Using datasets containing up to a billion floating point numbers, we show that our parallel density-based clustering algorithms significantly outperform the existing algorithms, achieving speedups up to 27.5 on 40 cores on shared memory architecture and speedups up to 5,765 using 8,192 cores on distributed memory architecture. In our experiments, we found that while achieving the scalability, our algorithms produce clustering results with comparable quality to the classical algorithms.
Optical interconnection network for parallel access to multi-rank memory in future computing systems.

PubMed

Wang, Kang; Gu, Huaxi; Yang, Yintang; Wang, Kun

2015-08-10

With the number of cores increasing, there is an emerging need for a high-bandwidth low-latency interconnection network, serving core-to-memory communication. In this paper, aiming at the goal of simultaneous access to multi-rank memory, we propose an optical interconnection network for core-to-memory communication. In the proposed network, the wavelength usage is delicately arranged so that cores can communicate with different ranks at the same time and broadcast for flow control can be achieved. A distributed memory controller architecture that works in a pipeline mode is also designed for efficient optical communication and transaction address processes. The scaling method and wavelength assignment for the proposed network are investigated. Compared with traditional electronic bus-based core-to-memory communication, the simulation results based on the PARSEC benchmark show that the bandwidth enhancement and latency reduction are apparent.
Quantum algorithm for support matrix machines

NASA Astrophysics Data System (ADS)

Duan, Bojia; Yuan, Jiabin; Liu, Ying; Li, Dan

2017-09-01

We propose a quantum algorithm for support matrix machines (SMMs) that efficiently addresses an image classification problem by introducing a least-squares reformulation. This algorithm consists of two core subroutines: a quantum matrix inversion (Harrow-Hassidim-Lloyd, HHL) algorithm and a quantum singular value thresholding (QSVT) algorithm. The two algorithms can be implemented on a universal quantum computer with complexity O[log(npq) ] and O[log(pq)], respectively, where n is the number of the training data and p q is the size of the feature space. By iterating the algorithms, we can find the parameters for the SMM classfication model. Our analysis shows that both HHL and QSVT algorithms achieve an exponential increase of speed over their classical counterparts.
Classifying epileptic EEG signals with delay permutation entropy and Multi-Scale K-means.

PubMed

Zhu, Guohun; Li, Yan; Wen, Peng Paul; Wang, Shuaifang

2015-01-01

Most epileptic EEG classification algorithms are supervised and require large training datasets, that hinder their use in real time applications. This chapter proposes an unsupervised Multi-Scale K-means (MSK-means) MSK-means algorithm to distinguish epileptic EEG signals and identify epileptic zones. The random initialization of the K-means algorithm can lead to wrong clusters. Based on the characteristics of EEGs, the MSK-means MSK-means algorithm initializes the coarse-scale centroid of a cluster with a suitable scale factor. In this chapter, the MSK-means algorithm is proved theoretically superior to the K-means algorithm on efficiency. In addition, three classifiers: the K-means, MSK-means MSK-means and support vector machine (SVM), are used to identify seizure and localize epileptogenic zone using delay permutation entropy features. The experimental results demonstrate that identifying seizure with the MSK-means algorithm and delay permutation entropy achieves 4. 7 % higher accuracy than that of K-means, and 0. 7 % higher accuracy than that of the SVM.
Optimization of the p-xylene oxidation process by a multi-objective differential evolution algorithm with adaptive parameters co-derived with the population-based incremental learning algorithm

NASA Astrophysics Data System (ADS)

Guo, Zhan; Yan, Xuefeng

2018-04-01

Different operating conditions of p-xylene oxidation have different influences on the product, purified terephthalic acid. It is necessary to obtain the optimal combination of reaction conditions to ensure the quality of the products, cut down on consumption and increase revenues. A multi-objective differential evolution (MODE) algorithm co-evolved with the population-based incremental learning (PBIL) algorithm, called PBMODE, is proposed. The PBMODE algorithm was designed as a co-evolutionary system. Each individual has its own parameter individual, which is co-evolved by PBIL. PBIL uses statistical analysis to build a model based on the corresponding symbiotic individuals of the superior original individuals during the main evolutionary process. The results of simulations and statistical analysis indicate that the overall performance of the PBMODE algorithm is better than that of the compared algorithms and it can be used to optimize the operating conditions of the p-xylene oxidation process effectively and efficiently.
Accelerating Pathology Image Data Cross-Comparison on CPU-GPU Hybrid Systems

PubMed Central

Wang, Kaibo; Huai, Yin; Lee, Rubao; Wang, Fusheng; Zhang, Xiaodong; Saltz, Joel H.

2012-01-01

As an important application of spatial databases in pathology imaging analysis, cross-comparing the spatial boundaries of a huge amount of segmented micro-anatomic objects demands extremely data- and compute-intensive operations, requiring high throughput at an affordable cost. However, the performance of spatial database systems has not been satisfactory since their implementations of spatial operations cannot fully utilize the power of modern parallel hardware. In this paper, we provide a customized software solution that exploits GPUs and multi-core CPUs to accelerate spatial cross-comparison in a cost-effective way. Our solution consists of an efficient GPU algorithm and a pipelined system framework with task migration support. Extensive experiments with real-world data sets demonstrate the effectiveness of our solution, which improves the performance of spatial cross-comparison by over 18 times compared with a parallelized spatial database approach. PMID:23355955
Cache Locality Optimization for Recursive Programs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lifflander, Jonathan; Krishnamoorthy, Sriram

We present an approach to optimize the cache locality for recursive programs by dynamically splicing--recursively interleaving--the execution of distinct function invocations. By utilizing data effect annotations, we identify concurrency and data reuse opportunities across function invocations and interleave them to reduce reuse distance. We present algorithms that efficiently track effects in recursive programs, detect interference and dependencies, and interleave execution of function invocations using user-level (non-kernel) lightweight threads. To enable multi-core execution, a program is parallelized using a nested fork/join programming model. Our cache optimization strategy is designed to work in the context of a random work stealing scheduler. Wemore » present an implementation using the MIT Cilk framework that demonstrates significant improvements in sequential and parallel performance, competitive with a state-of-the-art compile-time optimizer for loop programs and a domain- specific optimizer for stencil programs.« less
SAMI Automated Plug Plate Configuration

NASA Astrophysics Data System (ADS)

Lorente, N. P. F.; Farrell, T.; Goodwin, M.

2013-10-01

The Sydney-AAO Multi-object Integral field spectrograph (SAMI) is a prototype wide-field system at the Anglo-Australian Telescope (AAT) which uses a plug-plate to mount its 13×61-core imaging fibre bundles (hexabundles) in the optical path at the telescope's prime focus. In this paper we describe the process of determining the positions of the plug-plate holes, where plates contain three or more stacked observation configurations. The process, which up until now has involved several separate processes and has required significant manual configuration and checking, is now being automated to increase efficiency and reduce error. This is carried out by means of a thin Java controller layer which drives the configuration cycle. This layer controls the user interface and the C++ algorithm layer where the plate configuration and optimisation is carried out. Additionally, through the Aladin display package, it provides visualisation and facilitates user verification of the resulting plates.
Decision support tool for used oil regeneration technologies assessment and selection.

PubMed

Khelifi, Olfa; Dalla Giovanna, Fabio; Vranes, Sanja; Lodolo, Andrea; Miertus, Stanislav

2006-09-01

Regeneration is the most efficient way of managing used oil. It saves money by preventing costly cleanups and liabilities that are associated with mismanagement of used oil, it helps to protect the environment and it produces a technically renewable resource by enabling an indefinite recycling potential. There are a variety of processes and licensors currently offering ways to deal with used oils. Selecting a regeneration technology for used oil involves "cross-matching" key criteria. Therefore, the first prototype of spent oil regeneration (SPORE), a decision support tool, has been developed to help decision-makers to assess the available technologies and select the preferred used oil regeneration options. The analysis is based on technical, economical and environmental criteria. These criteria are ranked to determine their relative importance for a particular used oil regeneration project. The multi-criteria decision analysis (MCDA) is the core of the SPORE using the PROMETHEE II algorithm.
Multi-rendezvous low-thrust trajectory optimization using costate transforming and homotopic approach

NASA Astrophysics Data System (ADS)

Chen, Shiyu; Li, Haiyang; Baoyin, Hexi

2018-06-01

This paper investigates a method for optimizing multi-rendezvous low-thrust trajectories using indirect methods. An efficient technique, labeled costate transforming, is proposed to optimize multiple trajectory legs simultaneously rather than optimizing each trajectory leg individually. Complex inner-point constraints and a large number of free variables are one main challenge in optimizing multi-leg transfers via shooting algorithms. Such a difficulty is reduced by first optimizing each trajectory leg individually. The results may be, next, utilized as an initial guess in the simultaneous optimization of multiple trajectory legs. In this paper, the limitations of similar techniques in previous research is surpassed and a homotopic approach is employed to improve the convergence efficiency of the shooting process in multi-rendezvous low-thrust trajectory optimization. Numerical examples demonstrate that newly introduced techniques are valid and efficient.
Parallel mutual information estimation for inferring gene regulatory networks on GPUs

PubMed Central

2011-01-01

Background Mutual information is a measure of similarity between two variables. It has been widely used in various application domains including computational biology, machine learning, statistics, image processing, and financial computing. Previously used simple histogram based mutual information estimators lack the precision in quality compared to kernel based methods. The recently introduced B-spline function based mutual information estimation method is competitive to the kernel based methods in terms of quality but at a lower computational complexity. Results We present a new approach to accelerate the B-spline function based mutual information estimation algorithm with commodity graphics hardware. To derive an efficient mapping onto this type of architecture, we have used the Compute Unified Device Architecture (CUDA) programming model to design and implement a new parallel algorithm. Our implementation, called CUDA-MI, can achieve speedups of up to 82 using double precision on a single GPU compared to a multi-threaded implementation on a quad-core CPU for large microarray datasets. We have used the results obtained by CUDA-MI to infer gene regulatory networks (GRNs) from microarray data. The comparisons to existing methods including ARACNE and TINGe show that CUDA-MI produces GRNs of higher quality in less time. Conclusions CUDA-MI is publicly available open-source software, written in CUDA and C++ programming languages. It obtains significant speedup over sequential multi-threaded implementation by fully exploiting the compute capability of commonly used CUDA-enabled low-cost GPUs. PMID:21672264
Scalable Probabilistic Inference for Global Seismic Monitoring

NASA Astrophysics Data System (ADS)

Arora, N. S.; Dear, T.; Russell, S.

2011-12-01

We describe a probabilistic generative model for seismic events, their transmission through the earth, and their detection (or mis-detection) at seismic stations. We also describe an inference algorithm that constructs the most probable event bulletin explaining the observed set of detections. The model and inference are called NET-VISA (network processing vertically integrated seismic analysis) and is designed to replace the current automated network processing at the IDC, the SEL3 bulletin. Our results (attached table) demonstrate that NET-VISA significantly outperforms SEL3 by reducing the missed events from 30.3% down to 12.5%. The difference is even more dramatic for smaller magnitude events. NET-VISA has no difficulty in locating nuclear explosions as well. The attached figure demonstrates the location predicted by NET-VISA versus other bulletins for the second DPRK event. Further evaluation on dense regional networks demonstrates that NET-VISA finds many events missed in the LEB bulletin, which is produced by the human analysts. Large aftershock sequences, as produced by the 2004 December Sumatra earthquake and the 2011 March Tohoku earthquake, can pose a significant load for automated processing, often delaying the IDC bulletins by weeks or months. Indeed these sequences can overload the serial NET-VISA inference as well. We describe an enhancement to NET-VISA to make it multi-threaded, and hence take full advantage of the processing power of multi-core and -cpu machines. Our experiments show that the new inference algorithm is able to achieve 80% efficiency in parallel speedup.

Simple formalism for efficient derivatives and multi-determinant expansions in quantum Monte Carlo

DOE Office of Scientific and Technical Information (OSTI.GOV)

Filippi, Claudia, E-mail: c.filippi@utwente.nl; Assaraf, Roland, E-mail: assaraf@lct.jussieu.fr; Moroni, Saverio, E-mail: moroni@democritos.it

2016-05-21

We present a simple and general formalism to compute efficiently the derivatives of a multi-determinant Jastrow-Slater wave function, the local energy, the interatomic forces, and similar quantities needed in quantum Monte Carlo. Through a straightforward manipulation of matrices evaluated on the occupied and virtual orbitals, we obtain an efficiency equivalent to algorithmic differentiation in the computation of the interatomic forces and the optimization of the orbital parameters. Furthermore, for a large multi-determinant expansion, the significant computational gain afforded by a recently introduced table method is here extended to the local value of any one-body operator and to its derivatives, inmore » both all-electron and pseudopotential calculations.« less
Workflow as a Service in the Cloud: Architecture and Scheduling Algorithms.

PubMed

Wang, Jianwu; Korambath, Prakashan; Altintas, Ilkay; Davis, Jim; Crawl, Daniel

2014-01-01

With more and more workflow systems adopting cloud as their execution environment, it becomes increasingly challenging on how to efficiently manage various workflows, virtual machines (VMs) and workflow execution on VM instances. To make the system scalable and easy-to-extend, we design a Workflow as a Service (WFaaS) architecture with independent services. A core part of the architecture is how to efficiently respond continuous workflow requests from users and schedule their executions in the cloud. Based on different targets, we propose four heuristic workflow scheduling algorithms for the WFaaS architecture, and analyze the differences and best usages of the algorithms in terms of performance, cost and the price/performance ratio via experimental studies.
Simulation of co-phase error correction of optical multi-aperture imaging system based on stochastic parallel gradient decent algorithm

NASA Astrophysics Data System (ADS)

He, Xiaojun; Ma, Haotong; Luo, Chuanxin

2016-10-01

The optical multi-aperture imaging system is an effective way to magnify the aperture and increase the resolution of telescope optical system, the difficulty of which lies in detecting and correcting of co-phase error. This paper presents a method based on stochastic parallel gradient decent algorithm (SPGD) to correct the co-phase error. Compared with the current method, SPGD method can avoid detecting the co-phase error. This paper analyzed the influence of piston error and tilt error on image quality based on double-aperture imaging system, introduced the basic principle of SPGD algorithm, and discuss the influence of SPGD algorithm's key parameters (the gain coefficient and the disturbance amplitude) on error control performance. The results show that SPGD can efficiently correct the co-phase error. The convergence speed of the SPGD algorithm is improved with the increase of gain coefficient and disturbance amplitude, but the stability of the algorithm reduced. The adaptive gain coefficient can solve this problem appropriately. This paper's results can provide the theoretical reference for the co-phase error correction of the multi-aperture imaging system.
Eavesdropping-aware routing and spectrum allocation based on multi-flow virtual concatenation for confidential information service in elastic optical networks

NASA Astrophysics Data System (ADS)

Bai, Wei; Yang, Hui; Yu, Ao; Xiao, Hongyun; He, Linkuan; Feng, Lei; Zhang, Jie

2018-01-01

The leakage of confidential information is one of important issues in the network security area. Elastic Optical Networks (EON) as a promising technology in the optical transport network is under threat from eavesdropping attacks. It is a great demand to support confidential information service (CIS) and design efficient security strategy against the eavesdropping attacks. In this paper, we propose a solution to cope with the eavesdropping attacks in routing and spectrum allocation. Firstly, we introduce probability theory to describe eavesdropping issue and achieve awareness of eavesdropping attacks. Then we propose an eavesdropping-aware routing and spectrum allocation (ES-RSA) algorithm to guarantee information security. For further improving security and network performance, we employ multi-flow virtual concatenation (MFVC) and propose an eavesdropping-aware MFVC-based secure routing and spectrum allocation (MES-RSA) algorithm. The presented simulation results show that the proposed two RSA algorithms can both achieve greater security against the eavesdropping attacks and MES-RSA can also improve the network performance efficiently.
Core Hunter 3: flexible core subset selection.

PubMed

De Beukelaer, Herman; Davenport, Guy F; Fack, Veerle

2018-05-31

Core collections provide genebank curators and plant breeders a way to reduce size of their collections and populations, while minimizing impact on genetic diversity and allele frequency. Many methods have been proposed to generate core collections, often using distance metrics to quantify the similarity of two accessions, based on genetic marker data or phenotypic traits. Core Hunter is a multi-purpose core subset selection tool that uses local search algorithms to generate subsets relying on one or more metrics, including several distance metrics and allelic richness. In version 3 of Core Hunter (CH3) we have incorporated two new, improved methods for summarizing distances to quantify diversity or representativeness of the core collection. A comparison of CH3 and Core Hunter 2 (CH2) showed that these new metrics can be effectively optimized with less complex algorithms, as compared to those used in CH2. CH3 is more effective at maximizing the improved diversity metric than CH2, still ensures a high average and minimum distance, and is faster for large datasets. Using CH3, a simple stochastic hill-climber is able to find highly diverse core collections, and the more advanced parallel tempering algorithm further increases the quality of the core and further reduces variability across independent samples. We also evaluate the ability of CH3 to simultaneously maximize diversity, and either representativeness or allelic richness, and compare the results with those of the GDOpt and SimEli methods. CH3 can sample equally representative cores as GDOpt, which was specifically designed for this purpose, and is able to construct cores that are simultaneously more diverse, and either are more representative or have higher allelic richness, than those obtained by SimEli. In version 3, Core Hunter has been updated to include two new core subset selection metrics that construct cores for representativeness or diversity, with improved performance. It combines and outperforms the strengths of other methods, as it (simultaneously) optimizes a variety of metrics. In addition, CH3 is an improvement over CH2, with the option to use genetic marker data or phenotypic traits, or both, and improved speed. Core Hunter 3 is freely available on http://www.corehunter.org .
Application of a hybrid MPI/OpenMP approach for parallel groundwater model calibration using multi-core computers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tang, Guoping; D'Azevedo, Ed F; Zhang, Fan

2010-01-01

Calibration of groundwater models involves hundreds to thousands of forward solutions, each of which may solve many transient coupled nonlinear partial differential equations, resulting in a computationally intensive problem. We describe a hybrid MPI/OpenMP approach to exploit two levels of parallelisms in software and hardware to reduce calibration time on multi-core computers. HydroGeoChem 5.0 (HGC5) is parallelized using OpenMP for direct solutions for a reactive transport model application, and a field-scale coupled flow and transport model application. In the reactive transport model, a single parallelizable loop is identified to account for over 97% of the total computational time using GPROF.more » Addition of a few lines of OpenMP compiler directives to the loop yields a speedup of about 10 on a 16-core compute node. For the field-scale model, parallelizable loops in 14 of 174 HGC5 subroutines that require 99% of the execution time are identified. As these loops are parallelized incrementally, the scalability is found to be limited by a loop where Cray PAT detects over 90% cache missing rates. With this loop rewritten, similar speedup as the first application is achieved. The OpenMP-parallelized code can be run efficiently on multiple workstations in a network or multiple compute nodes on a cluster as slaves using parallel PEST to speedup model calibration. To run calibration on clusters as a single task, the Levenberg Marquardt algorithm is added to HGC5 with the Jacobian calculation and lambda search parallelized using MPI. With this hybrid approach, 100 200 compute cores are used to reduce the calibration time from weeks to a few hours for these two applications. This approach is applicable to most of the existing groundwater model codes for many applications.« less
Super-Nyquist shaping and processing technologies for high-spectral-efficiency optical systems

NASA Astrophysics Data System (ADS)

Jia, Zhensheng; Chien, Hung-Chang; Zhang, Junwen; Dong, Ze; Cai, Yi; Yu, Jianjun

2013-12-01

The implementations of super-Nyquist pulse generation, both in a digital field using a digital-to-analog converter (DAC) or an optical filter at transmitter side, are introduced. Three corresponding signal processing algorithms at receiver are presented and compared for high spectral-efficiency (SE) optical systems employing the spectral prefiltering. Those algorithms are designed for the mitigation towards inter-symbol-interference (ISI) and inter-channel-interference (ICI) impairments by the bandwidth constraint, including 1-tap constant modulus algorithm (CMA) and 3-tap maximum likelihood sequence estimation (MLSE), regular CMA and digital filter with 2-tap MLSE, and constant multi-modulus algorithm (CMMA) with 2-tap MLSE. The principles and prefiltering tolerance are given through numerical and experimental results.
A novel model-based evolutionary algorithm for multi-objective deformable image registration with content mismatch and large deformations: benchmarking efficiency and quality

NASA Astrophysics Data System (ADS)

Bouter, Anton; Alderliesten, Tanja; Bosman, Peter A. N.

2017-02-01

Taking a multi-objective optimization approach to deformable image registration has recently gained attention, because such an approach removes the requirement of manually tuning the weights of all the involved objectives. Especially for problems that require large complex deformations, this is a non-trivial task. From the resulting Pareto set of solutions one can then much more insightfully select a registration outcome that is most suitable for the problem at hand. To serve as an internal optimization engine, currently used multi-objective algorithms are competent, but rather inefficient. In this paper we largely improve upon this by introducing a multi-objective real-valued adaptation of the recently introduced Gene-pool Optimal Mixing Evolutionary Algorithm (GOMEA) for discrete optimization. In this work, GOMEA is tailored specifically to the problem of deformable image registration to obtain substantially improved efficiency. This improvement is achieved by exploiting a key strength of GOMEA: iteratively improving small parts of solutions, allowing to faster exploit the impact of such updates on the objectives at hand through partial evaluations. We performed experiments on three registration problems. In particular, an artificial problem containing a disappearing structure, a pair of pre- and post-operative breast CT scans, and a pair of breast MRI scans acquired in prone and supine position were considered. Results show that compared to the previously used evolutionary algorithm, GOMEA obtains a speed-up of up to a factor of 1600 on the tested registration problems while achieving registration outcomes of similar quality.
Variable cycle control model for intersection based on multi-source information

NASA Astrophysics Data System (ADS)

Sun, Zhi-Yuan; Li, Yue; Qu, Wen-Cong; Chen, Yan-Yan

2018-05-01

In order to improve the efficiency of traffic control system in the era of big data, a new variable cycle control model based on multi-source information is presented for intersection in this paper. Firstly, with consideration of multi-source information, a unified framework based on cyber-physical system is proposed. Secondly, taking into account the variable length of cell, hysteresis phenomenon of traffic flow and the characteristics of lane group, a Lane group-based Cell Transmission Model is established to describe the physical properties of traffic flow under different traffic signal control schemes. Thirdly, the variable cycle control problem is abstracted into a bi-level programming model. The upper level model is put forward for cycle length optimization considering traffic capacity and delay. The lower level model is a dynamic signal control decision model based on fairness analysis. Then, a Hybrid Intelligent Optimization Algorithm is raised to solve the proposed model. Finally, a case study shows the efficiency and applicability of the proposed model and algorithm.
An efficient multi-resolution GA approach to dental image alignment

NASA Astrophysics Data System (ADS)

Nassar, Diaa Eldin; Ogirala, Mythili; Adjeroh, Donald; Ammar, Hany

2006-02-01

Automating the process of postmortem identification of individuals using dental records is receiving an increased attention in forensic science, especially with the large volume of victims encountered in mass disasters. Dental radiograph alignment is a key step required for automating the dental identification process. In this paper, we address the problem of dental radiograph alignment using a Multi-Resolution Genetic Algorithm (MR-GA) approach. We use location and orientation information of edge points as features; we assume that affine transformations suffice to restore geometric discrepancies between two images of a tooth, we efficiently search the 6D space of affine parameters using GA progressively across multi-resolution image versions, and we use a Hausdorff distance measure to compute the similarity between a reference tooth and a query tooth subject to a possible alignment transform. Testing results based on 52 teeth-pair images suggest that our algorithm converges to reasonable solutions in more than 85% of the test cases, with most of the error in the remaining cases due to excessive misalignments.
A theoretical framework for negotiating the path of emergency management multi-agency coordination.

PubMed

Curnin, Steven; Owen, Christine; Paton, Douglas; Brooks, Benjamin

2015-03-01

Multi-agency coordination represents a significant challenge in emergency management. The need for liaison officers working in strategic level emergency operations centres to play organizational boundary spanning roles within multi-agency coordination arrangements that are enacted in complex and dynamic emergency response scenarios creates significant research and practical challenges. The aim of the paper is to address a gap in the literature regarding the concept of multi-agency coordination from a human-environment interaction perspective. We present a theoretical framework for facilitating multi-agency coordination in emergency management that is grounded in human factors and ergonomics using the methodology of core-task analysis. As a result we believe the framework will enable liaison officers to cope more efficiently within the work domain. In addition, we provide suggestions for extending the theory of core-task analysis to an alternate high reliability environment. Copyright © 2014 Elsevier Ltd and The Ergonomics Society. All rights reserved.
Compound Event Barrier Coverage in Wireless Sensor Networks under Multi-Constraint Conditions.

PubMed

Zhuang, Yaoming; Wu, Chengdong; Zhang, Yunzhou; Jia, Zixi

2016-12-24

It is important to monitor compound event by barrier coverage issues in wireless sensor networks (WSNs). Compound event barrier coverage (CEBC) is a novel coverage problem. Unlike traditional ones, the data of compound event barrier coverage comes from different types of sensors. It will be subject to multiple constraints under complex conditions in real-world applications. The main objective of this paper is to design an efficient algorithm for complex conditions that can combine the compound event confidence. Moreover, a multiplier method based on an active-set strategy (ASMP) is proposed to optimize the multiple constraints in compound event barrier coverage. The algorithm can calculate the coverage ratio efficiently and allocate the sensor resources reasonably in compound event barrier coverage. The proposed algorithm can simplify complex problems to reduce the computational load of the network and improve the network efficiency. The simulation results demonstrate that the proposed algorithm is more effective and efficient than existing methods, especially in the allocation of sensor resources.
Compound Event Barrier Coverage in Wireless Sensor Networks under Multi-Constraint Conditions

PubMed Central

Zhuang, Yaoming; Wu, Chengdong; Zhang, Yunzhou; Jia, Zixi

2016-01-01

It is important to monitor compound event by barrier coverage issues in wireless sensor networks (WSNs). Compound event barrier coverage (CEBC) is a novel coverage problem. Unlike traditional ones, the data of compound event barrier coverage comes from different types of sensors. It will be subject to multiple constraints under complex conditions in real-world applications. The main objective of this paper is to design an efficient algorithm for complex conditions that can combine the compound event confidence. Moreover, a multiplier method based on an active-set strategy (ASMP) is proposed to optimize the multiple constraints in compound event barrier coverage. The algorithm can calculate the coverage ratio efficiently and allocate the sensor resources reasonably in compound event barrier coverage. The proposed algorithm can simplify complex problems to reduce the computational load of the network and improve the network efficiency. The simulation results demonstrate that the proposed algorithm is more effective and efficient than existing methods, especially in the allocation of sensor resources. PMID:28029118
Effective channel estimation and efficient symbol detection for multi-input multi-output underwater acoustic communications

NASA Astrophysics Data System (ADS)

Ling, Jun

Achieving reliable underwater acoustic communications (UAC) has long been recognized as a challenging problem owing to the scarce bandwidth available and the reverberant spread in both time and frequency domains. To pursue high data rates, we consider a multi-input multi-output (MIMO) UAC system, and our focus is placed on two main issues regarding a MIMO UAC system: (1) channel estimation, which involves the design of the training sequences and the development of a reliable channel estimation algorithm, and (2) symbol detection, which requires interference cancelation schemes due to simultaneous transmission from multiple transducers. To enhance channel estimation performance, we present a cyclic approach for designing training sequences with good auto- and cross-correlation properties, and a channel estimation algorithm called the iterative adaptive approach (IAA). Sparse channel estimates can be obtained by combining IAA with the Bayesian information criterion (BIC). Moreover, we present sparse learning via iterative minimization (SLIM) and demonstrate that SLIM gives similar performance to IAA but at a much lower computational cost. Furthermore, an extension of the SLIM algorithm is introduced to estimate the sparse and frequency modulated acoustic channels. The extended algorithm is referred to as generalization of SLIM (GoSLIM). Regarding symbol detection, a linear minimum mean-squared error based detection scheme, called RELAX-BLAST, which is a combination of vertical Bell Labs layered space-time (V-BLAST) algorithm and the cyclic principle of the RELAX algorithm, is presented and it is shown that RELAX-BLAST outperforms V-BLAST. We show that RELAX-BLAST can be implemented efficiently by making use of the conjugate gradient method and diagonalization properties of circulant matrices. This fast implementation approach requires only simple fast Fourier transform operations and facilitates parallel implementations. The effectiveness of the proposed MIMO schemes is verified by both computer simulations and experimental results obtained by analyzing the measurements acquired in multiple in-water experiments.
Multi-phase classification by a least-squares support vector machine approach in tomography images of geological samples

NASA Astrophysics Data System (ADS)

Khan, Faisal; Enzmann, Frieder; Kersten, Michael

2016-03-01

Image processing of X-ray-computed polychromatic cone-beam micro-tomography (μXCT) data of geological samples mainly involves artefact reduction and phase segmentation. For the former, the main beam-hardening (BH) artefact is removed by applying a best-fit quadratic surface algorithm to a given image data set (reconstructed slice), which minimizes the BH offsets of the attenuation data points from that surface. A Matlab code for this approach is provided in the Appendix. The final BH-corrected image is extracted from the residual data or from the difference between the surface elevation values and the original grey-scale values. For the segmentation, we propose a novel least-squares support vector machine (LS-SVM, an algorithm for pixel-based multi-phase classification) approach. A receiver operating characteristic (ROC) analysis was performed on BH-corrected and uncorrected samples to show that BH correction is in fact an important prerequisite for accurate multi-phase classification. The combination of the two approaches was thus used to classify successfully three different more or less complex multi-phase rock core samples.
A Parallel Saturation Algorithm on Shared Memory Architectures

NASA Technical Reports Server (NTRS)

Ezekiel, Jonathan; Siminiceanu

2007-01-01

Symbolic state-space generators are notoriously hard to parallelize. However, the Saturation algorithm implemented in the SMART verification tool differs from other sequential symbolic state-space generators in that it exploits the locality of ring events in asynchronous system models. This paper explores whether event locality can be utilized to efficiently parallelize Saturation on shared-memory architectures. Conceptually, we propose to parallelize the ring of events within a decision diagram node, which is technically realized via a thread pool. We discuss the challenges involved in our parallel design and conduct experimental studies on its prototypical implementation. On a dual-processor dual core PC, our studies show speed-ups for several example models, e.g., of up to 50% for a Kanban model, when compared to running our algorithm only on a single core.
Simultaneous optimization of loading pattern and burnable poison placement for PWRs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Alim, F.; Ivanov, K.; Yilmaz, S.

2006-07-01

To solve in-core fuel management optimization problem, GARCO-PSU (Genetic Algorithm Reactor Core Optimization - Pennsylvania State Univ.) is developed. This code is applicable for all types and geometry of PWR core structures with unlimited number of fuel assembly (FA) types in the inventory. For this reason an innovative genetic algorithm is developed with modifying the classical representation of the genotype. In-core fuel management heuristic rules are introduced into GARCO. The core re-load design optimization has two parts, loading pattern (LP) optimization and burnable poison (BP) placement optimization. These parts depend on each other, but it is difficult to solve themore » combined problem due to its large size. Separating the problem into two parts provides a practical way to solve the problem. However, the result of this method does not reflect the real optimal solution. GARCO-PSU achieves to solve LP optimization and BP placement optimization simultaneously in an efficient manner. (authors)« less
A multi-characteristic based algorithm for classifying vegetation in a plateau area: Qinghai Lake watershed, northwestern China

NASA Astrophysics Data System (ADS)

Ma, Weiwei; Gong, Cailan; Hu, Yong; Li, Long; Meng, Peng

2015-10-01

Remote sensing technology has been broadly recognized for its convenience and efficiency in mapping vegetation, particularly in high-altitude and inaccessible areas where there are lack of in-situ observations. In this study, Landsat Thematic Mapper (TM) images and Chinese environmental mitigation satellite CCD sensor (HJ-1 CCD) images, both of which are at 30m spatial resolution were employed for identifying and monitoring of vegetation types in a area of Western China——Qinghai Lake Watershed(QHLW). A decision classification tree (DCT) algorithm using multi-characteristic including seasonal TM/HJ-1 CCD time series data combined with digital elevation models (DEMs) dataset, and a supervised maximum likelihood classification (MLC) algorithm with single-data TM image were applied vegetation classification. Accuracy of the two algorithms was assessed using field observation data. Based on produced vegetation classification maps, it was found that the DCT using multi-season data and geomorphologic parameters was superior to the MLC algorithm using single-data image, improving the overall accuracy by 11.86% at second class level and significantly reducing the "salt and pepper" noise. The DCT algorithm applied to TM /HJ-1 CCD time series data geomorphologic parameters appeared as a valuable and reliable tool for monitoring vegetation at first class level (5 vegetation classes) and second class level(8 vegetation subclasses). The DCT algorithm using multi-characteristic might provide a theoretical basis and general approach to automatic extraction of vegetation types from remote sensing imagery over plateau areas.
ESAM: Endocrine inspired Sensor Activation Mechanism for multi-target tracking in WSNs

NASA Astrophysics Data System (ADS)

Adil Mahdi, Omar; Wahab, Ainuddin Wahid Abdul; Idris, Mohd Yamani Idna; Znaid, Ammar Abu; Khan, Suleman; Al-Mayouf, Yusor Rafid Bahar

2016-10-01

Target tracking is a significant application of wireless sensor networks (WSNs) in which deployment of self-organizing and energy efficient algorithms is required. The tracking accuracy increases as more sensor nodes are activated around the target but more energy is consumed. Thus, in this study, we focus on limiting the number of sensors by forming an ad-hoc network that operates autonomously. This will reduce the energy consumption and prolong the sensor network lifetime. In this paper, we propose a fully distributed algorithm, an Endocrine inspired Sensor Activation Mechanism for multi target-tracking (ESAM) which reflecting the properties of real life sensor activation system based on the information circulating principle in the endocrine system of the human body. Sensor nodes in our network are secreting different hormones according to certain rules. The hormone level enables the nodes to regulate an efficient sleep and wake up cycle of nodes to reduce the energy consumption. It is evident from the simulation results that the proposed ESAM in autonomous sensor network exhibits a stable performance without the need of commands from a central controller. Moreover, the proposed ESAM generates more efficient and persistent results as compared to other algorithms for tracking an invading object.
Concurrent computation of attribute filters on shared memory parallel machines.

PubMed

Wilkinson, Michael H F; Gao, Hui; Hesselink, Wim H; Jonker, Jan-Eppo; Meijster, Arnold

2008-10-01

Morphological attribute filters have not previously been parallelized, mainly because they are both global and non-separable. We propose a parallel algorithm that achieves efficient parallelism for a large class of attribute filters, including attribute openings, closings, thinnings and thickenings, based on Salembier's Max-Trees and Min-trees. The image or volume is first partitioned in multiple slices. We then compute the Max-trees of each slice using any sequential Max-Tree algorithm. Subsequently, the Max-trees of the slices can be merged to obtain the Max-tree of the image. A C-implementation yielded good speed-ups on both a 16-processor MIPS 14000 parallel machine, and a dual-core Opteron-based machine. It is shown that the speed-up of the parallel algorithm is a direct measure of the gain with respect to the sequential algorithm used. Furthermore, the concurrent algorithm shows a speed gain of up to 72 percent on a single-core processor, due to reduced cache thrashing.

Advanced Dynamically Adaptive Algorithms for Stochastic Simulations on Extreme Scales

DOE Office of Scientific and Technical Information (OSTI.GOV)

Xiu, Dongbin

2017-03-03

The focus of the project is the development of mathematical methods and high-performance computational tools for stochastic simulations, with a particular emphasis on computations on extreme scales. The core of the project revolves around the design of highly efficient and scalable numerical algorithms that can adaptively and accurately, in high dimensional spaces, resolve stochastic problems with limited smoothness, even containing discontinuities.
Ion Structure Near a Core-Shell Dielectric Nanoparticle

NASA Astrophysics Data System (ADS)

Ma, Manman; Gan, Zecheng; Xu, Zhenli

2017-02-01

A generalized image charge formulation is proposed for the Green's function of a core-shell dielectric nanoparticle for which theoretical and simulation investigations are rarely reported due to the difficulty of resolving the dielectric heterogeneity. Based on the formulation, an efficient and accurate algorithm is developed for calculating electrostatic polarization charges of mobile ions, allowing us to study related physical systems using the Monte Carlo algorithm. The computer simulations show that a fine-tuning of the shell thickness or the ion-interface correlation strength can greatly alter electric double-layer structures and capacitances, owing to the complicated interplay between dielectric boundary effects and ion-interface correlations.
An efficient non-dominated sorting method for evolutionary algorithms.

PubMed

Fang, Hongbing; Wang, Qian; Tu, Yi-Cheng; Horstemeyer, Mark F

2008-01-01

We present a new non-dominated sorting algorithm to generate the non-dominated fronts in multi-objective optimization with evolutionary algorithms, particularly the NSGA-II. The non-dominated sorting algorithm used by NSGA-II has a time complexity of O(MN(2)) in generating non-dominated fronts in one generation (iteration) for a population size N and M objective functions. Since generating non-dominated fronts takes the majority of total computational time (excluding the cost of fitness evaluations) of NSGA-II, making this algorithm faster will significantly improve the overall efficiency of NSGA-II and other genetic algorithms using non-dominated sorting. The new non-dominated sorting algorithm proposed in this study reduces the number of redundant comparisons existing in the algorithm of NSGA-II by recording the dominance information among solutions from their first comparisons. By utilizing a new data structure called the dominance tree and the divide-and-conquer mechanism, the new algorithm is faster than NSGA-II for different numbers of objective functions. Although the number of solution comparisons by the proposed algorithm is close to that of NSGA-II when the number of objectives becomes large, the total computational time shows that the proposed algorithm still has better efficiency because of the adoption of the dominance tree structure and the divide-and-conquer mechanism.
Computer-intensive simulation of solid-state NMR experiments using SIMPSON.

PubMed

Tošner, Zdeněk; Andersen, Rasmus; Stevensson, Baltzar; Edén, Mattias; Nielsen, Niels Chr; Vosegaard, Thomas

2014-09-01

Conducting large-scale solid-state NMR simulations requires fast computer software potentially in combination with efficient computational resources to complete within a reasonable time frame. Such simulations may involve large spin systems, multiple-parameter fitting of experimental spectra, or multiple-pulse experiment design using parameter scan, non-linear optimization, or optimal control procedures. To efficiently accommodate such simulations, we here present an improved version of the widely distributed open-source SIMPSON NMR simulation software package adapted to contemporary high performance hardware setups. The software is optimized for fast performance on standard stand-alone computers, multi-core processors, and large clusters of identical nodes. We describe the novel features for fast computation including internal matrix manipulations, propagator setups and acquisition strategies. For efficient calculation of powder averages, we implemented interpolation method of Alderman, Solum, and Grant, as well as recently introduced fast Wigner transform interpolation technique. The potential of the optimal control toolbox is greatly enhanced by higher precision gradients in combination with the efficient optimization algorithm known as limited memory Broyden-Fletcher-Goldfarb-Shanno. In addition, advanced parallelization can be used in all types of calculations, providing significant time reductions. SIMPSON is thus reflecting current knowledge in the field of numerical simulations of solid-state NMR experiments. The efficiency and novel features are demonstrated on the representative simulations. Copyright © 2014 Elsevier Inc. All rights reserved.
The accurate particle tracer code

DOE PAGES

Wang, Yulei; Liu, Jian; Qin, Hong; ...

2017-07-20

The Accurate Particle Tracer (APT) code is designed for systematic large-scale applications of geometric algorithms for particle dynamical simulations. Based on a large variety of advanced geometric algorithms, APT possesses long-term numerical accuracy and stability, which are critical for solving multi-scale and nonlinear problems. To provide a flexible and convenient I/O interface, the libraries of Lua and Hdf5 are used. Following a three-step procedure, users can efficiently extend the libraries of electromagnetic configurations, external non-electromagnetic forces, particle pushers, and initialization approaches by use of the extendible module. APT has been used in simulations of key physical problems, such as runawaymore » electrons in tokamaks and energetic particles in Van Allen belt. As an important realization, the APT-SW version has been successfully distributed on the world’s fastest computer, the Sunway TaihuLight supercomputer, by supporting master–slave architecture of Sunway many-core processors. Here, based on large-scale simulations of a runaway beam under parameters of the ITER tokamak, it is revealed that the magnetic ripple field can disperse the pitch-angle distribution significantly and improve the confinement of energetic runaway beam on the same time.« less
The accurate particle tracer code

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wang, Yulei; Liu, Jian; Qin, Hong

The Accurate Particle Tracer (APT) code is designed for systematic large-scale applications of geometric algorithms for particle dynamical simulations. Based on a large variety of advanced geometric algorithms, APT possesses long-term numerical accuracy and stability, which are critical for solving multi-scale and nonlinear problems. To provide a flexible and convenient I/O interface, the libraries of Lua and Hdf5 are used. Following a three-step procedure, users can efficiently extend the libraries of electromagnetic configurations, external non-electromagnetic forces, particle pushers, and initialization approaches by use of the extendible module. APT has been used in simulations of key physical problems, such as runawaymore » electrons in tokamaks and energetic particles in Van Allen belt. As an important realization, the APT-SW version has been successfully distributed on the world’s fastest computer, the Sunway TaihuLight supercomputer, by supporting master–slave architecture of Sunway many-core processors. Here, based on large-scale simulations of a runaway beam under parameters of the ITER tokamak, it is revealed that the magnetic ripple field can disperse the pitch-angle distribution significantly and improve the confinement of energetic runaway beam on the same time.« less
Optimizing a multi-product closed-loop supply chain using NSGA-II, MOSA, and MOPSO meta-heuristic algorithms

NASA Astrophysics Data System (ADS)

Babaveisi, Vahid; Paydar, Mohammad Mahdi; Safaei, Abdul Sattar

2018-07-01

This study aims to discuss the solution methodology for a closed-loop supply chain (CLSC) network that includes the collection of used products as well as distribution of the new products. This supply chain is presented on behalf of the problems that can be solved by the proposed meta-heuristic algorithms. A mathematical model is designed for a CLSC that involves three objective functions of maximizing the profit, minimizing the total risk and shortages of products. Since three objective functions are considered, a multi-objective solution methodology can be advantageous. Therefore, several approaches have been studied and an NSGA-II algorithm is first utilized, and then the results are validated using an MOSA and MOPSO algorithms. Priority-based encoding, which is used in all the algorithms, is the core of the solution computations. To compare the performance of the meta-heuristics, random numerical instances are evaluated by four criteria involving mean ideal distance, spread of non-dominance solution, the number of Pareto solutions, and CPU time. In order to enhance the performance of the algorithms, Taguchi method is used for parameter tuning. Finally, sensitivity analyses are performed and the computational results are presented based on the sensitivity analyses in parameter tuning.
Optimizing a multi-product closed-loop supply chain using NSGA-II, MOSA, and MOPSO meta-heuristic algorithms

NASA Astrophysics Data System (ADS)

Babaveisi, Vahid; Paydar, Mohammad Mahdi; Safaei, Abdul Sattar

2017-07-01

This study aims to discuss the solution methodology for a closed-loop supply chain (CLSC) network that includes the collection of used products as well as distribution of the new products. This supply chain is presented on behalf of the problems that can be solved by the proposed meta-heuristic algorithms. A mathematical model is designed for a CLSC that involves three objective functions of maximizing the profit, minimizing the total risk and shortages of products. Since three objective functions are considered, a multi-objective solution methodology can be advantageous. Therefore, several approaches have been studied and an NSGA-II algorithm is first utilized, and then the results are validated using an MOSA and MOPSO algorithms. Priority-based encoding, which is used in all the algorithms, is the core of the solution computations. To compare the performance of the meta-heuristics, random numerical instances are evaluated by four criteria involving mean ideal distance, spread of non-dominance solution, the number of Pareto solutions, and CPU time. In order to enhance the performance of the algorithms, Taguchi method is used for parameter tuning. Finally, sensitivity analyses are performed and the computational results are presented based on the sensitivity analyses in parameter tuning.
Development and validation of a two-dimensional fast-response flood estimation model

DOE Office of Scientific and Technical Information (OSTI.GOV)

Judi, David R; Mcpherson, Timothy N; Burian, Steven J

2009-01-01

A finite difference formulation of the shallow water equations using an upwind differencing method was developed maintaining computational efficiency and accuracy such that it can be used as a fast-response flood estimation tool. The model was validated using both laboratory controlled experiments and an actual dam breach. Through the laboratory experiments, the model was shown to give good estimations of depth and velocity when compared to the measured data, as well as when compared to a more complex two-dimensional model. Additionally, the model was compared to high water mark data obtained from the failure of the Taum Sauk dam. Themore » simulated inundation extent agreed well with the observed extent, with the most notable differences resulting from the inability to model sediment transport. The results of these validation studies complex two-dimensional model. Additionally, the model was compared to high water mark data obtained from the failure of the Taum Sauk dam. The simulated inundation extent agreed well with the observed extent, with the most notable differences resulting from the inability to model sediment transport. The results of these validation studies show that a relatively numerical scheme used to solve the complete shallow water equations can be used to accurately estimate flood inundation. Future work will focus on further reducing the computation time needed to provide flood inundation estimates for fast-response analyses. This will be accomplished through the efficient use of multi-core, multi-processor computers coupled with an efficient domain-tracking algorithm, as well as an understanding of the impacts of grid resolution on model results.« less
Tunable arbitrary unitary transformer based on multiple sections of multicore fibers with phase control.

PubMed

Zhou, Junhe; Wu, Jianjie; Hu, Qinsong

2018-02-05

In this paper, we propose a novel tunable unitary transformer, which can achieve arbitrary discrete unitary transforms. The unitary transformer is composed of multiple sections of multi-core fibers with closely aligned coupled cores. Phase shifters are inserted before and after the sections to control the phases of the waves in the cores. A simple algorithm is proposed to find the optimal phase setup for the phase shifters to realize the desired unitary transforms. The proposed device is fiber based and is particularly suitable for the mode division multiplexing systems. A tunable mode MUX/DEMUX for a three-mode fiber is designed based on the proposed structure.
The application of dynamic programming in production planning

NASA Astrophysics Data System (ADS)

Wu, Run

2017-05-01

Nowadays, with the popularity of the computers, various industries and fields are widely applying computer information technology, which brings about huge demand for a variety of application software. In order to develop software meeting various needs with most economical cost and best quality, programmers must design efficient algorithms. A superior algorithm can not only soul up one thing, but also maximize the benefits and generate the smallest overhead. As one of the common algorithms, dynamic programming algorithms are used to solving problems with some sort of optimal properties. When solving problems with a large amount of sub-problems that needs repetitive calculations, the ordinary sub-recursive method requires to consume exponential time, and dynamic programming algorithm can reduce the time complexity of the algorithm to the polynomial level, according to which we can conclude that dynamic programming algorithm is a very efficient compared to other algorithms reducing the computational complexity and enriching the computational results. In this paper, we expound the concept, basic elements, properties, core, solving steps and difficulties of the dynamic programming algorithm besides, establish the dynamic programming model of the production planning problem.
A pluggable framework for parallel pairwise sequence search.

PubMed

Archuleta, Jeremy; Feng, Wu-chun; Tilevich, Eli

2007-01-01

The current and near future of the computing industry is one of multi-core and multi-processor technology. Most existing sequence-search tools have been designed with a focus on single-core, single-processor systems. This discrepancy between software design and hardware architecture substantially hinders sequence-search performance by not allowing full utilization of the hardware. This paper presents a novel framework that will aid the conversion of serial sequence-search tools into a parallel version that can take full advantage of the available hardware. The framework, which is based on a software architecture called mixin layers with refined roles, enables modules to be plugged into the framework with minimal effort. The inherent modular design improves maintenance and extensibility, thus opening up a plethora of opportunities for advanced algorithmic features to be developed and incorporated while routine maintenance of the codebase persists.
A deconvolution extraction method for 2D multi-object fibre spectroscopy based on the regularized least-squares QR-factorization algorithm

NASA Astrophysics Data System (ADS)

Yu, Jian; Yin, Qian; Guo, Ping; Luo, A.-li

2014-09-01

This paper presents an efficient method for the extraction of astronomical spectra from two-dimensional (2D) multifibre spectrographs based on the regularized least-squares QR-factorization (LSQR) algorithm. We address two issues: we propose a modified Gaussian point spread function (PSF) for modelling the 2D PSF from multi-emission-line gas-discharge lamp images (arc images), and we develop an efficient deconvolution method to extract spectra in real circumstances. The proposed modified 2D Gaussian PSF model can fit various types of 2D PSFs, including different radial distortion angles and ellipticities. We adopt the regularized LSQR algorithm to solve the sparse linear equations constructed from the sparse convolution matrix, which we designate the deconvolution spectrum extraction method. Furthermore, we implement a parallelized LSQR algorithm based on graphics processing unit programming in the Compute Unified Device Architecture to accelerate the computational processing. Experimental results illustrate that the proposed extraction method can greatly reduce the computational cost and memory use of the deconvolution method and, consequently, increase its efficiency and practicability. In addition, the proposed extraction method has a stronger noise tolerance than other methods, such as the boxcar (aperture) extraction and profile extraction methods. Finally, we present an analysis of the sensitivity of the extraction results to the radius and full width at half-maximum of the 2D PSF.
Improved Gravitation Field Algorithm and Its Application in Hierarchical Clustering

PubMed Central

Zheng, Ming; Sun, Ying; Liu, Gui-xia; Zhou, You; Zhou, Chun-guang

2012-01-01

Background Gravitation field algorithm (GFA) is a new optimization algorithm which is based on an imitation of natural phenomena. GFA can do well both for searching global minimum and multi-minima in computational biology. But GFA needs to be improved for increasing efficiency, and modified for applying to some discrete data problems in system biology. Method An improved GFA called IGFA was proposed in this paper. Two parts were improved in IGFA. The first one is the rule of random division, which is a reasonable strategy and makes running time shorter. The other one is rotation factor, which can improve the accuracy of IGFA. And to apply IGFA to the hierarchical clustering, the initial part and the movement operator were modified. Results Two kinds of experiments were used to test IGFA. And IGFA was applied to hierarchical clustering. The global minimum experiment was used with IGFA, GFA, GA (genetic algorithm) and SA (simulated annealing). Multi-minima experiment was used with IGFA and GFA. The two experiments results were compared with each other and proved the efficiency of IGFA. IGFA is better than GFA both in accuracy and running time. For the hierarchical clustering, IGFA is used to optimize the smallest distance of genes pairs, and the results were compared with GA and SA, singular-linkage clustering, UPGMA. The efficiency of IGFA is proved. PMID:23173043
Reliable and Efficient Parallel Processing Algorithms and Architectures for Modern Signal Processing. Ph.D. Thesis

NASA Technical Reports Server (NTRS)

Liu, Kuojuey Ray

1990-01-01

Least-squares (LS) estimations and spectral decomposition algorithms constitute the heart of modern signal processing and communication problems. Implementations of recursive LS and spectral decomposition algorithms onto parallel processing architectures such as systolic arrays with efficient fault-tolerant schemes are the major concerns of this dissertation. There are four major results in this dissertation. First, we propose the systolic block Householder transformation with application to the recursive least-squares minimization. It is successfully implemented on a systolic array with a two-level pipelined implementation at the vector level as well as at the word level. Second, a real-time algorithm-based concurrent error detection scheme based on the residual method is proposed for the QRD RLS systolic array. The fault diagnosis, order degraded reconfiguration, and performance analysis are also considered. Third, the dynamic range, stability, error detection capability under finite-precision implementation, order degraded performance, and residual estimation under faulty situations for the QRD RLS systolic array are studied in details. Finally, we propose the use of multi-phase systolic algorithms for spectral decomposition based on the QR algorithm. Two systolic architectures, one based on triangular array and another based on rectangular array, are presented for the multiphase operations with fault-tolerant considerations. Eigenvectors and singular vectors can be easily obtained by using the multi-pase operations. Performance issues are also considered.
Electrosprayed Multi-Core Alginate Microcapsules as Novel Self-Healing Containers

NASA Astrophysics Data System (ADS)

Hia, Iee Lee; Pasbakhsh, Pooria; Chan, Eng-Seng; Chai, Siang-Piao

2016-10-01

Alginate microcapsules containing epoxy resin were developed through electrospraying method and embedded into epoxy matrix to produce a capsule-based self-healing composite system. These formaldehyde free alginate/epoxy microcapsules were characterized via light microscope, field emission scanning electron microscope, fourier transform infrared spectroscopy and thermogravimetric analysis. Results showed that epoxy resin was successfully encapsulated within alginate matrix to form porous (multi-core) microcapsules with pore size ranged from 5-100 μm. The microcapsules had an average size of 320 ± 20 μm with decomposition temperature at 220 °C. The loading capacity of these capsules was estimated to be 79%. Under in situ healing test, impact specimens showed healing efficiency as high as 86% and the ability to heal up to 3 times due to the multi-core capsule structure and the high impact energy test that triggered the released of epoxy especially in the second and third healings. TDCB specimens showed one-time healing only with the highest healing efficiency of 76%. The single healing event was attributed by the constant crack propagation rate of TDCB fracture test. For the first time, a cost effective, environmentally benign and sustainable capsule-based self-healing system with multiple healing capabilities and high healing performance was developed.
Electrosprayed Multi-Core Alginate Microcapsules as Novel Self-Healing Containers.

PubMed

Hia, Iee Lee; Pasbakhsh, Pooria; Chan, Eng-Seng; Chai, Siang-Piao

2016-10-03

Alginate microcapsules containing epoxy resin were developed through electrospraying method and embedded into epoxy matrix to produce a capsule-based self-healing composite system. These formaldehyde free alginate/epoxy microcapsules were characterized via light microscope, field emission scanning electron microscope, fourier transform infrared spectroscopy and thermogravimetric analysis. Results showed that epoxy resin was successfully encapsulated within alginate matrix to form porous (multi-core) microcapsules with pore size ranged from 5-100 μm. The microcapsules had an average size of 320 ± 20 μm with decomposition temperature at 220 °C. The loading capacity of these capsules was estimated to be 79%. Under in situ healing test, impact specimens showed healing efficiency as high as 86% and the ability to heal up to 3 times due to the multi-core capsule structure and the high impact energy test that triggered the released of epoxy especially in the second and third healings. TDCB specimens showed one-time healing only with the highest healing efficiency of 76%. The single healing event was attributed by the constant crack propagation rate of TDCB fracture test. For the first time, a cost effective, environmentally benign and sustainable capsule-based self-healing system with multiple healing capabilities and high healing performance was developed.
Electrosprayed Multi-Core Alginate Microcapsules as Novel Self-Healing Containers

PubMed Central

Hia, Iee Lee; Pasbakhsh, Pooria; Chan, Eng-Seng; Chai, Siang-Piao

2016-01-01

Alginate microcapsules containing epoxy resin were developed through electrospraying method and embedded into epoxy matrix to produce a capsule-based self-healing composite system. These formaldehyde free alginate/epoxy microcapsules were characterized via light microscope, field emission scanning electron microscope, fourier transform infrared spectroscopy and thermogravimetric analysis. Results showed that epoxy resin was successfully encapsulated within alginate matrix to form porous (multi-core) microcapsules with pore size ranged from 5–100 μm. The microcapsules had an average size of 320 ± 20 μm with decomposition temperature at 220 °C. The loading capacity of these capsules was estimated to be 79%. Under in situ healing test, impact specimens showed healing efficiency as high as 86% and the ability to heal up to 3 times due to the multi-core capsule structure and the high impact energy test that triggered the released of epoxy especially in the second and third healings. TDCB specimens showed one-time healing only with the highest healing efficiency of 76%. The single healing event was attributed by the constant crack propagation rate of TDCB fracture test. For the first time, a cost effective, environmentally benign and sustainable capsule-based self-healing system with multiple healing capabilities and high healing performance was developed. PMID:27694922
Machine Learning-based Intelligent Formal Reasoning and Proving System

NASA Astrophysics Data System (ADS)

Chen, Shengqing; Huang, Xiaojian; Fang, Jiaze; Liang, Jia

2018-03-01

The reasoning system can be used in many fields. How to improve reasoning efficiency is the core of the design of system. Through the formal description of formal proof and the regular matching algorithm, after introducing the machine learning algorithm, the system of intelligent formal reasoning and verification has high efficiency. The experimental results show that the system can verify the correctness of propositional logic reasoning and reuse the propositional logical reasoning results, so as to obtain the implicit knowledge in the knowledge base and provide the basic reasoning model for the construction of intelligent system.
Automatic detection of multi-level acetowhite regions in RGB color images of the uterine cervix

NASA Astrophysics Data System (ADS)

Lange, Holger

2005-04-01

Uterine cervical cancer is the second most common cancer among women worldwide. Colposcopy is a diagnostic method used to detect cancer precursors and cancer of the uterine cervix, whereby a physician (colposcopist) visually inspects the metaplastic epithelium on the cervix for certain distinctly abnormal morphologic features. A contrast agent, a 3-5% acetic acid solution, is used, causing abnormal and metaplastic epithelia to turn white. The colposcopist considers diagnostic features such as the acetowhite, blood vessel structure, and lesion margin to derive a clinical diagnosis. STI Medical Systems is developing a Computer-Aided-Diagnosis (CAD) system for colposcopy -- ColpoCAD, a complex image analysis system that at its core assesses the same visual features as used by colposcopists. The acetowhite feature has been identified as one of the most important individual predictors of lesion severity. Here, we present the details and preliminary results of a multi-level acetowhite region detection algorithm for RGB color images of the cervix, including the detection of the anatomic features: cervix, os and columnar region, which are used for the acetowhite region detection. The RGB images are assumed to be glare free, either obtained by cross-polarized image acquisition or glare removal pre-processing. The basic approach of the algorithm is to extract a feature image from the RGB image that provides a good acetowhite to cervix background ratio, to segment the feature image using novel pixel grouping and multi-stage region-growing algorithms that provide region segmentations with different levels of detail, to extract the acetowhite regions from the region segmentations using a novel region selection algorithm, and then finally to extract the multi-levels from the acetowhite regions using multiple thresholds. The performance of the algorithm is demonstrated using human subject data.

Design and Optimization Method of a Two-Disk Rotor System

NASA Astrophysics Data System (ADS)

Huang, Jingjing; Zheng, Longxi; Mei, Qing

2016-04-01

An integrated analytical method based on multidisciplinary optimization software Isight and general finite element software ANSYS was proposed in this paper. Firstly, a two-disk rotor system was established and the mode, humorous response and transient response at acceleration condition were analyzed with ANSYS. The dynamic characteristics of the two-disk rotor system were achieved. On this basis, the two-disk rotor model was integrated to the multidisciplinary design optimization software Isight. According to the design of experiment (DOE) and the dynamic characteristics, the optimization variables, optimization objectives and constraints were confirmed. After that, the multi-objective design optimization of the transient process was carried out with three different global optimization algorithms including Evolutionary Optimization Algorithm, Multi-Island Genetic Algorithm and Pointer Automatic Optimizer. The optimum position of the two-disk rotor system was obtained at the specified constraints. Meanwhile, the accuracy and calculation numbers of different optimization algorithms were compared. The optimization results indicated that the rotor vibration reached the minimum value and the design efficiency and quality were improved by the multidisciplinary design optimization in the case of meeting the design requirements, which provided the reference to improve the design efficiency and reliability of the aero-engine rotor.
Genetic algorithms used for the optimization of light-emitting diodes and solar thermal collectors

NASA Astrophysics Data System (ADS)

Mayer, Alexandre; Bay, Annick; Gaouyat, Lucie; Nicolay, Delphine; Carletti, Timoteo; Deparis, Olivier

2014-09-01

We present a genetic algorithm (GA) we developed for the optimization of light-emitting diodes (LED) and solar thermal collectors. The surface of a LED can be covered by periodic structures whose geometrical and material parameters must be adjusted in order to maximize the extraction of light. The optimization of these parameters by the GA enabled us to get a light-extraction efficiency η of 11.0% from a GaN LED (for comparison, the flat material has a light-extraction efficiency η of only 3.7%). The solar thermal collector we considered consists of a waffle-shaped Al substrate with NiCrOx and SnO2 conformal coatings. We must in this case maximize the solar absorption α while minimizing the thermal emissivity ɛ in the infrared. A multi-objective genetic algorithm has to be implemented in this case in order to determine optimal geometrical parameters. The parameters we obtained using the multi-objective GA enable α~97.8% and ɛ~4.8%, which improves results achieved previously when considering a flat substrate. These two applications demonstrate the interest of genetic algorithms for addressing complex problems in physics.
An efficient spectral method for the simulation of dynamos in Cartesian geometry and its implementation on massively parallel computers

NASA Astrophysics Data System (ADS)

Stellmach, Stephan; Hansen, Ulrich

2008-05-01

Numerical simulations of the process of convection and magnetic field generation in planetary cores still fail to reach geophysically realistic control parameter values. Future progress in this field depends crucially on efficient numerical algorithms which are able to take advantage of the newest generation of parallel computers. Desirable features of simulation algorithms include (1) spectral accuracy, (2) an operation count per time step that is small and roughly proportional to the number of grid points, (3) memory requirements that scale linear with resolution, (4) an implicit treatment of all linear terms including the Coriolis force, (5) the ability to treat all kinds of common boundary conditions, and (6) reasonable efficiency on massively parallel machines with tens of thousands of processors. So far, algorithms for fully self-consistent dynamo simulations in spherical shells do not achieve all these criteria simultaneously, resulting in strong restrictions on the possible resolutions. In this paper, we demonstrate that local dynamo models in which the process of convection and magnetic field generation is only simulated for a small part of a planetary core in Cartesian geometry can achieve the above goal. We propose an algorithm that fulfills the first five of the above criteria and demonstrate that a model implementation of our method on an IBM Blue Gene/L system scales impressively well for up to O(104) processors. This allows for numerical simulations at rather extreme parameter values.
Informationally Efficient Multi-User Communication

DTIC Science & Technology

2010-01-01

DSM algorithms, the Op- timal Spectrum Balancing ( OSB ) algorithm and the Iterative Spectrum Balanc- ing (ISB) algorithm, were proposed to solve the...problem of maximization of a weighted rate-sum across all users [CYM06, YL06]. OSB has an exponential complexity in the number of users. ISB only has a...the duality gap min λ1,λ2 D (λ1, λ2) − max P1,P2 f (P1,P2) is not zero. Fig. 3.3 summarizes the three key steps of a dual method, the OSB algorithm
Workflow as a Service in the Cloud: Architecture and Scheduling Algorithms

PubMed Central

Wang, Jianwu; Korambath, Prakashan; Altintas, Ilkay; Davis, Jim; Crawl, Daniel

2017-01-01

With more and more workflow systems adopting cloud as their execution environment, it becomes increasingly challenging on how to efficiently manage various workflows, virtual machines (VMs) and workflow execution on VM instances. To make the system scalable and easy-to-extend, we design a Workflow as a Service (WFaaS) architecture with independent services. A core part of the architecture is how to efficiently respond continuous workflow requests from users and schedule their executions in the cloud. Based on different targets, we propose four heuristic workflow scheduling algorithms for the WFaaS architecture, and analyze the differences and best usages of the algorithms in terms of performance, cost and the price/performance ratio via experimental studies. PMID:29399237
Fault Tolerance Middleware for a Multi-Core System

NASA Technical Reports Server (NTRS)

Some, Raphael R.; Springer, Paul L.; Zima, Hans P.; James, Mark; Wagner, David A.

2012-01-01

Fault Tolerance Middleware (FTM) provides a framework to run on a dedicated core of a multi-core system and handles detection of single-event upsets (SEUs), and the responses to those SEUs, occurring in an application running on multiple cores of the processor. This software was written expressly for a multi-core system and can support different kinds of fault strategies, such as introspection, algorithm-based fault tolerance (ABFT), and triple modular redundancy (TMR). It focuses on providing fault tolerance for the application code, and represents the first step in a plan to eventually include fault tolerance in message passing and the FTM itself. In the multi-core system, the FTM resides on a single, dedicated core, separate from the cores used by the application. This is done in order to isolate the FTM from application faults and to allow it to swap out any application core for a substitute. The structure of the FTM consists of an interface to a fault tolerant strategy module, a responder module, a fault manager module, an error factory, and an error mapper that determines the severity of the error. In the present reference implementation, the only fault tolerant strategy implemented is introspection. The introspection code waits for an application node to send an error notification to it. It then uses the error factory to create an error object, and at this time, a severity level is assigned to the error. The introspection code uses its built-in knowledge base to generate a recommended response to the error. Responses might include ignoring the error, logging it, rolling back the application to a previously saved checkpoint, swapping in a new node to replace a bad one, or restarting the application. The original error and recommended response are passed to the top-level fault manager module, which invokes the response. The responder module also notifies the introspection module of the generated response. This provides additional information to the introspection module that it can use in generating its next response. For example, if the responder triggers an application rollback and errors are still occurring, the introspection module may decide to recommend an application restart.
Multi-agent systems design for aerospace applications

NASA Astrophysics Data System (ADS)

Waslander, Steven L.

2007-12-01

Engineering systems with independent decision makers are becoming increasingly prevalent and present many challenges in coordinating actions to achieve systems goals. In particular, this work investigates the applications of air traffic flow control and autonomous vehicles as motivation to define algorithms that allow agents to agree to safe, efficient and equitable solutions in a distributed manner. To ensure system requirements will be satisfied in practice, each method is evaluated for a specific model of agent behavior, be it cooperative or non-cooperative. The air traffic flow control problem is investigated from the point of view of the airlines, whose costs are directly affected by resource allocation decisions made by the Federal Aviation Administration in order to mitigate traffic disruptions caused by weather. Airlines are first modeled as cooperative, and a distributed algorithm is presented with various global cost metrics which balance efficient and equitable use of resources differently. Next, a competitive airline model is assumed and two market mechanisms are developed for allocating contested airspace resources. The resource market mechanism provides a solution for which convergence to an efficient solution can be guaranteed, and each airline will improve on the solution that would occur without its inclusion in the decision process. A lump-sum market is then introduced as an alternative mechanism, for which efficiency loss bounds exist if airlines attempt to manipulate prices. Initial convergence results for lump-sum markets are presented for simplified problems with a single resource. To validate these algorithms, two air traffic flow models are developed which extend previous techniques, the first a convenient convex model made possible by assuming constant velocity flow, and the second a more complex flow model with full inflow, velocity and rerouting control. Autonomous vehicle teams are envisaged for many applications including mobile sensing and search and rescue. To enable these high-level applications, multi-vehicle collision avoidance is solved using a cooperative, decentralized algorithm. For the development of coordination algorithms for autonomous vehicles, the Stanford Testbed of Autonomous Rotorcraft for Multi-Agent Control (STARMAC) is presented. This testbed provides significant advantages over other aerial testbeds due to its small size and low maintenance requirements.
Service-Oriented Node Scheduling Scheme for Wireless Sensor Networks Using Markov Random Field Model

PubMed Central

Cheng, Hongju; Su, Zhihuang; Lloret, Jaime; Chen, Guolong

2014-01-01

Future wireless sensor networks are expected to provide various sensing services and energy efficiency is one of the most important criterions. The node scheduling strategy aims to increase network lifetime by selecting a set of sensor nodes to provide the required sensing services in a periodic manner. In this paper, we are concerned with the service-oriented node scheduling problem to provide multiple sensing services while maximizing the network lifetime. We firstly introduce how to model the data correlation for different services by using Markov Random Field (MRF) model. Secondly, we formulate the service-oriented node scheduling issue into three different problems, namely, the multi-service data denoising problem which aims at minimizing the noise level of sensed data, the representative node selection problem concerning with selecting a number of active nodes while determining the services they provide, and the multi-service node scheduling problem which aims at maximizing the network lifetime. Thirdly, we propose a Multi-service Data Denoising (MDD) algorithm, a novel multi-service Representative node Selection and service Determination (RSD) algorithm, and a novel MRF-based Multi-service Node Scheduling (MMNS) scheme to solve the above three problems respectively. Finally, extensive experiments demonstrate that the proposed scheme efficiently extends the network lifetime. PMID:25384005
Particle Swarm Optimization for Programming Deep Brain Stimulation Arrays

PubMed Central

Peña, Edgar; Zhang, Simeng; Deyo, Steve; Xiao, YiZi; Johnson, Matthew D.

2017-01-01

Objective Deep brain stimulation (DBS) therapy relies on both precise neurosurgical targeting and systematic optimization of stimulation settings to achieve beneficial clinical outcomes. One recent advance to improve targeting is the development of DBS arrays (DBSAs) with electrodes segmented both along and around the DBS lead. However, increasing the number of independent electrodes creates the logistical challenge of optimizing stimulation parameters efficiently. Approach Solving such complex problems with multiple solutions and objectives is well known to occur in biology, in which complex collective behaviors emerge out of swarms of individual organisms engaged in learning through social interactions. Here, we developed a particle swarm optimization (PSO) algorithm to program DBSAs using a swarm of individual particles representing electrode configurations and stimulation amplitudes. Using a finite element model of motor thalamic DBS, we demonstrate how the PSO algorithm can efficiently optimize a multi-objective function that maximizes predictions of axonal activation in regions of interest (ROI, cerebellar-receiving area of motor thalamus), minimizes predictions of axonal activation in regions of avoidance (ROA, somatosensory thalamus), and minimizes power consumption. Main Results The algorithm solved the multi-objective problem by producing a Pareto front. ROI and ROA activation predictions were consistent across swarms (<1% median discrepancy in axon activation). The algorithm was able to accommodate for (1) lead displacement (1 mm) with relatively small ROI (≤9.2%) and ROA (≤1%) activation changes, irrespective of shift direction; (2) reduction in maximum per-electrode current (by 50% and 80%) with ROI activation decreasing by 5.6% and 16%, respectively; and (3) disabling electrodes (n=3 and 12) with ROI activation reduction by 1.8% and 14%, respectively. Additionally, comparison between PSO predictions and multi-compartment axon model simulations showed discrepancies of <1% between approaches. Significance The PSO algorithm provides a computationally efficient way to program DBS systems especially those with higher electrode counts. PMID:28068291
Particle swarm optimization for programming deep brain stimulation arrays

NASA Astrophysics Data System (ADS)

Peña, Edgar; Zhang, Simeng; Deyo, Steve; Xiao, YiZi; Johnson, Matthew D.

2017-02-01

Objective. Deep brain stimulation (DBS) therapy relies on both precise neurosurgical targeting and systematic optimization of stimulation settings to achieve beneficial clinical outcomes. One recent advance to improve targeting is the development of DBS arrays (DBSAs) with electrodes segmented both along and around the DBS lead. However, increasing the number of independent electrodes creates the logistical challenge of optimizing stimulation parameters efficiently. Approach. Solving such complex problems with multiple solutions and objectives is well known to occur in biology, in which complex collective behaviors emerge out of swarms of individual organisms engaged in learning through social interactions. Here, we developed a particle swarm optimization (PSO) algorithm to program DBSAs using a swarm of individual particles representing electrode configurations and stimulation amplitudes. Using a finite element model of motor thalamic DBS, we demonstrate how the PSO algorithm can efficiently optimize a multi-objective function that maximizes predictions of axonal activation in regions of interest (ROI, cerebellar-receiving area of motor thalamus), minimizes predictions of axonal activation in regions of avoidance (ROA, somatosensory thalamus), and minimizes power consumption. Main results. The algorithm solved the multi-objective problem by producing a Pareto front. ROI and ROA activation predictions were consistent across swarms (<1% median discrepancy in axon activation). The algorithm was able to accommodate for (1) lead displacement (1 mm) with relatively small ROI (⩽9.2%) and ROA (⩽1%) activation changes, irrespective of shift direction; (2) reduction in maximum per-electrode current (by 50% and 80%) with ROI activation decreasing by 5.6% and 16%, respectively; and (3) disabling electrodes (n = 3 and 12) with ROI activation reduction by 1.8% and 14%, respectively. Additionally, comparison between PSO predictions and multi-compartment axon model simulations showed discrepancies of <1% between approaches. Significance. The PSO algorithm provides a computationally efficient way to program DBS systems especially those with higher electrode counts.
An observation planning algorithm applied to multi-objective astronomical observations and its simulation in COSMOS field

NASA Astrophysics Data System (ADS)

Jin, Yi; Gu, Yonggang; Zhai, Chao

2012-09-01

Multi-Object Fiber Spectroscopic sky surveys are now booming, such as LAMOST already built by China, BIGBOSS project put forward by the U.S. Lawrence Berkeley National Lab and GTC (Gran Telescopio Canarias) telescope developed by the United States, Mexico and Spain. They all use or will use this approach and each fiber can be moved within a certain area for one astrology target, so observation planning is particularly important for this Sky Surveys. One observation planning algorithm used in multi-objective astronomical observations is developed. It can avoid the collision and interference between the fiber positioning units in the focal plane during the observation in one field of view, and the interested objects can be ovserved in a limited round with the maximize efficiency. Also, the observation simulation can be made for wide field of view through multi-FOV observation. After the observation planning is built ,the simulation is made in COSMOS field using GTC telescope. Interested galaxies, stars and high-redshift LBG galaxies are selected after the removal of the mask area, which may be bright stars. Then 9 FOV simulation is completed and observation efficiency and fiber utilization ratio for every round are given. Otherwise,allocating a certain number of fibers for background sky, giving different weights for different objects and how to move the FOV to improve the overall observation efficiency are discussed.
Case-Based Multi-Sensor Intrusion Detection

NASA Astrophysics Data System (ADS)

Schwartz, Daniel G.; Long, Jidong

2009-08-01

Multi-sensor intrusion detection systems (IDSs) combine the alerts raised by individual IDSs and possibly other kinds of devices such as firewalls and antivirus software. A critical issue in building a multi-sensor IDS is alert-correlation, i.e., determining which alerts are caused by the same attack. This paper explores a novel approach to alert correlation using case-based reasoning (CBR). Each case in the CBR system's library contains a pattern of alerts raised by some known attack type, together with the identity of the attack. Then during run time, the alert streams gleaned from the sensors are compared with the patterns in the cases, and a match indicates that the attack described by that case has occurred. For this purpose the design of a fast and accurate matching algorithm is imperative. Two such algorithms were explored: (i) the well-known Hungarian algorithm, and (ii) an order-preserving matching of our own device. Tests were conducted using the DARPA Grand Challenge Problem attack simulator. These showed that the both matching algorithms are effective in detecting attacks; but the Hungarian algorithm is inefficient; whereas the order-preserving one is very efficient, in fact runs in linear time.
Multiple sequence alignment using multi-objective based bacterial foraging optimization algorithm.

PubMed

Rani, R Ranjani; Ramyachitra, D

2016-12-01

Multiple sequence alignment (MSA) is a widespread approach in computational biology and bioinformatics. MSA deals with how the sequences of nucleotides and amino acids are sequenced with possible alignment and minimum number of gaps between them, which directs to the functional, evolutionary and structural relationships among the sequences. Still the computation of MSA is a challenging task to provide an efficient accuracy and statistically significant results of alignments. In this work, the Bacterial Foraging Optimization Algorithm was employed to align the biological sequences which resulted in a non-dominated optimal solution. It employs Multi-objective, such as: Maximization of Similarity, Non-gap percentage, Conserved blocks and Minimization of gap penalty. BAliBASE 3.0 benchmark database was utilized to examine the proposed algorithm against other methods In this paper, two algorithms have been proposed: Hybrid Genetic Algorithm with Artificial Bee Colony (GA-ABC) and Bacterial Foraging Optimization Algorithm. It was found that Hybrid Genetic Algorithm with Artificial Bee Colony performed better than the existing optimization algorithms. But still the conserved blocks were not obtained using GA-ABC. Then BFO was used for the alignment and the conserved blocks were obtained. The proposed Multi-Objective Bacterial Foraging Optimization Algorithm (MO-BFO) was compared with widely used MSA methods Clustal Omega, Kalign, MUSCLE, MAFFT, Genetic Algorithm (GA), Ant Colony Optimization (ACO), Artificial Bee Colony (ABC), Particle Swarm Optimization (PSO) and Hybrid Genetic Algorithm with Artificial Bee Colony (GA-ABC). The final results show that the proposed MO-BFO algorithm yields better alignment than most widely used methods. Copyright Â© 2016 Elsevier Ireland Ltd. All rights reserved.
A decision tree algorithm for investigation of model biases related to dynamical cores and physical parameterizations: CESM/CAM EVALUATION BY DECISION TREES

DOE Office of Scientific and Technical Information (OSTI.GOV)

Soner Yorgun, M.; Rood, Richard B.

An object-based evaluation method using a pattern recognition algorithm (i.e., classification trees) is applied to the simulated orographic precipitation for idealized experimental setups using the National Center of Atmospheric Research (NCAR) Community Atmosphere Model (CAM) with the finite volume (FV) and the Eulerian spectral transform dynamical cores with varying resolutions. Daily simulations were analyzed and three different types of precipitation features were identified by the classification tree algorithm. The statistical characteristics of these features (i.e., maximum value, mean value, and variance) were calculated to quantify the difference between the dynamical cores and changing resolutions. Even with the simple and smoothmore » topography in the idealized setups, complexity in the precipitation fields simulated by the models develops quickly. The classification tree algorithm using objective thresholding successfully detected different types of precipitation features even as the complexity of the precipitation field increased. The results show that the complexity and the bias introduced in small-scale phenomena due to the spectral transform method of CAM Eulerian spectral dynamical core is prominent, and is an important reason for its dissimilarity from the FV dynamical core. The resolvable scales, both in horizontal and vertical dimensions, have significant effect on the simulation of precipitation. The results of this study also suggest that an efficient and informative study about the biases produced by GCMs should involve daily (or even hourly) output (rather than monthly mean) analysis over local scales.« less
A decision tree algorithm for investigation of model biases related to dynamical cores and physical parameterizations: CESM/CAM EVALUATION BY DECISION TREES

DOE PAGES

Soner Yorgun, M.; Rood, Richard B.

2016-11-11

An object-based evaluation method using a pattern recognition algorithm (i.e., classification trees) is applied to the simulated orographic precipitation for idealized experimental setups using the National Center of Atmospheric Research (NCAR) Community Atmosphere Model (CAM) with the finite volume (FV) and the Eulerian spectral transform dynamical cores with varying resolutions. Daily simulations were analyzed and three different types of precipitation features were identified by the classification tree algorithm. The statistical characteristics of these features (i.e., maximum value, mean value, and variance) were calculated to quantify the difference between the dynamical cores and changing resolutions. Even with the simple and smoothmore » topography in the idealized setups, complexity in the precipitation fields simulated by the models develops quickly. The classification tree algorithm using objective thresholding successfully detected different types of precipitation features even as the complexity of the precipitation field increased. The results show that the complexity and the bias introduced in small-scale phenomena due to the spectral transform method of CAM Eulerian spectral dynamical core is prominent, and is an important reason for its dissimilarity from the FV dynamical core. The resolvable scales, both in horizontal and vertical dimensions, have significant effect on the simulation of precipitation. The results of this study also suggest that an efficient and informative study about the biases produced by GCMs should involve daily (or even hourly) output (rather than monthly mean) analysis over local scales.« less
The systems biology simulation core algorithm

PubMed Central

2013-01-01

Background With the increasing availability of high dimensional time course data for metabolites, genes, and fluxes, the mathematical description of dynamical systems has become an essential aspect of research in systems biology. Models are often encoded in formats such as SBML, whose structure is very complex and difficult to evaluate due to many special cases. Results This article describes an efficient algorithm to solve SBML models that are interpreted in terms of ordinary differential equations. We begin our consideration with a formal representation of the mathematical form of the models and explain all parts of the algorithm in detail, including several preprocessing steps. We provide a flexible reference implementation as part of the Systems Biology Simulation Core Library, a community-driven project providing a large collection of numerical solvers and a sophisticated interface hierarchy for the definition of custom differential equation systems. To demonstrate the capabilities of the new algorithm, it has been tested with the entire SBML Test Suite and all models of BioModels Database. Conclusions The formal description of the mathematics behind the SBML format facilitates the implementation of the algorithm within specifically tailored programs. The reference implementation can be used as a simulation backend for Java™-based programs. Source code, binaries, and documentation can be freely obtained under the terms of the LGPL version 3 from http://simulation-core.sourceforge.net. Feature requests, bug reports, contributions, or any further discussion can be directed to the mailing list simulation-core-development@lists.sourceforge.net. PMID:23826941
Clustering of Multi-Temporal Fully Polarimetric L-Band SAR Data for Agricultural Land Cover Mapping

NASA Astrophysics Data System (ADS)

Tamiminia, H.; Homayouni, S.; Safari, A.

2015-12-01

Recently, the unique capabilities of Polarimetric Synthetic Aperture Radar (PolSAR) sensors make them an important and efficient tool for natural resources and environmental applications, such as land cover and crop classification. The aim of this paper is to classify multi-temporal full polarimetric SAR data using kernel-based fuzzy C-means clustering method, over an agricultural region. This method starts with transforming input data into the higher dimensional space using kernel functions and then clustering them in the feature space. Feature space, due to its inherent properties, has the ability to take in account the nonlinear and complex nature of polarimetric data. Several SAR polarimetric features extracted using target decomposition algorithms. Features from Cloude-Pottier, Freeman-Durden and Yamaguchi algorithms used as inputs for the clustering. This method was applied to multi-temporal UAVSAR L-band images acquired over an agricultural area near Winnipeg, Canada, during June and July in 2012. The results demonstrate the efficiency of this approach with respect to the classical methods. In addition, using multi-temporal data in the clustering process helped to investigate the phenological cycle of plants and significantly improved the performance of agricultural land cover mapping.
Fast algorithm of adaptive Fourier series

NASA Astrophysics Data System (ADS)

Gao, You; Ku, Min; Qian, Tao

2018-05-01

Adaptive Fourier decomposition (AFD, precisely 1-D AFD or Core-AFD) was originated for the goal of positive frequency representations of signals. It achieved the goal and at the same time offered fast decompositions of signals. There then arose several types of AFDs. AFD merged with the greedy algorithm idea, and in particular, motivated the so-called pre-orthogonal greedy algorithm (Pre-OGA) that was proven to be the most efficient greedy algorithm. The cost of the advantages of the AFD type decompositions is, however, the high computational complexity due to the involvement of maximal selections of the dictionary parameters. The present paper offers one formulation of the 1-D AFD algorithm by building the FFT algorithm into it. Accordingly, the algorithm complexity is reduced, from the original $\\mathcal{O}(M N^2)$ to $\\mathcal{O}(M N\\log_2 N)$, where $N$ denotes the number of the discretization points on the unit circle and $M$ denotes the number of points in $[0,1)$. This greatly enhances the applicability of AFD. Experiments are carried out to show the high efficiency of the proposed algorithm.
Unified transform architecture for AVC, AVS, VC-1 and HEVC high-performance codecs

NASA Astrophysics Data System (ADS)

Dias, Tiago; Roma, Nuno; Sousa, Leonel

2014-12-01

A unified architecture for fast and efficient computation of the set of two-dimensional (2-D) transforms adopted by the most recent state-of-the-art digital video standards is presented in this paper. Contrasting to other designs with similar functionality, the presented architecture is supported on a scalable, modular and completely configurable processing structure. This flexible structure not only allows to easily reconfigure the architecture to support different transform kernels, but it also permits its resizing to efficiently support transforms of different orders (e.g. order-4, order-8, order-16 and order-32). Consequently, not only is it highly suitable to realize high-performance multi-standard transform cores, but it also offers highly efficient implementations of specialized processing structures addressing only a reduced subset of transforms that are used by a specific video standard. The experimental results that were obtained by prototyping several configurations of this processing structure in a Xilinx Virtex-7 FPGA show the superior performance and hardware efficiency levels provided by the proposed unified architecture for the implementation of transform cores for the Advanced Video Coding (AVC), Audio Video coding Standard (AVS), VC-1 and High Efficiency Video Coding (HEVC) standards. In addition, such results also demonstrate the ability of this processing structure to realize multi-standard transform cores supporting all the standards mentioned above and that are capable of processing the 8k Ultra High Definition Television (UHDTV) video format (7,680 × 4,320 at 30 fps) in real time.
Parallelization strategies for continuum-generalized method of moments on the multi-thread systems

NASA Astrophysics Data System (ADS)

Bustamam, A.; Handhika, T.; Ernastuti, Kerami, D.

2017-07-01

Continuum-Generalized Method of Moments (C-GMM) covers the Generalized Method of Moments (GMM) shortfall which is not as efficient as Maximum Likelihood estimator by using the continuum set of moment conditions in a GMM framework. However, this computation would take a very long time since optimizing regularization parameter. Unfortunately, these calculations are processed sequentially whereas in fact all modern computers are now supported by hierarchical memory systems and hyperthreading technology, which allowing for parallel computing. This paper aims to speed up the calculation process of C-GMM by designing a parallel algorithm for C-GMM on the multi-thread systems. First, parallel regions are detected for the original C-GMM algorithm. There are two parallel regions in the original C-GMM algorithm, that are contributed significantly to the reduction of computational time: the outer-loop and the inner-loop. Furthermore, this parallel algorithm will be implemented with standard shared-memory application programming interface, i.e. Open Multi-Processing (OpenMP). The experiment shows that the outer-loop parallelization is the best strategy for any number of observations.

Dynamic Task Allocation in Multi-Hop Multimedia Wireless Sensor Networks with Low Mobility

PubMed Central

Jin, Yichao; Vural, Serdar; Gluhak, Alexander; Moessner, Klaus

2013-01-01

This paper presents a task allocation-oriented framework to enable efficient in-network processing and cost-effective multi-hop resource sharing for dynamic multi-hop multimedia wireless sensor networks with low node mobility, e.g., pedestrian speeds. The proposed system incorporates a fast task reallocation algorithm to quickly recover from possible network service disruptions, such as node or link failures. An evolutional self-learning mechanism based on a genetic algorithm continuously adapts the system parameters in order to meet the desired application delay requirements, while also achieving a sufficiently long network lifetime. Since the algorithm runtime incurs considerable time delay while updating task assignments, we introduce an adaptive window size to limit the delay periods and ensure an up-to-date solution based on node mobility patterns and device processing capabilities. To the best of our knowledge, this is the first study that yields multi-objective task allocation in a mobile multi-hop wireless environment under dynamic conditions. Simulations are performed in various settings, and the results show considerable performance improvement in extending network lifetime compared to heuristic mechanisms. Furthermore, the proposed framework provides noticeable reduction in the frequency of missing application deadlines. PMID:24135992
Architecture of optical sensor for recognition of multiple toxic metal ions from water.

PubMed

Shenashen, M A; El-Safty, S A; Elshehy, E A

2013-09-15

Here, we designed novel optical sensor based on the wormhole hexagonal mesoporous core/multi-shell silica nanoparticles that enabled the selective recognition and removal of these extremely toxic metals from drinking water. The surface-coating process of a mesoporous core/double-shell silica platforms by several consequence decorations using a cationic surfactant with double alkyl tails (CS-DAT) and then a synthesized dicarboxylate 1,5-diphenyl-3-thiocarbazone (III) signaling probe enabled us to create a unique hierarchical multi-shell sensor. In this design, the high loading capacity and wrapping of the CS-DAT and III organic moieties could be achieved, leading to the formation of silica core with multi-shells that formed from double-silica, CS-DAT, and III dressing layers. In this sensing system, notable changes in color and reflectance intensity of the multi-shelled sensor for Cu(2+), Co(2+), Cd(2+), and Hg(2+) ions, were observed at pH 2, 8, 9.5 and 11.5, respectively. The multi-shelled sensor is added to enable accessibility for continuous monitoring of several different toxic metal ions and efficient multi-ion sensing and removal capabilities with respect to reversibility, selectivity, and signal stability. Copyright © 2013 Elsevier B.V. All rights reserved.
Polarization image segmentation of radiofrequency ablated porcine myocardial tissue

PubMed Central

Ahmad, Iftikhar; Gribble, Adam; Murtza, Iqbal; Ikram, Masroor; Pop, Mihaela; Vitkin, Alex

2017-01-01

Optical polarimetry has previously imaged the spatial extent of a typical radiofrequency ablated (RFA) lesion in myocardial tissue, exhibiting significantly lower total depolarization at the necrotic core compared to healthy tissue, and intermediate values at the RFA rim region. Here, total depolarization in ablated myocardium was used to segment the total depolarization image into three (core, rim and healthy) zones. A local fuzzy thresholding algorithm was used for this multi-region segmentation, and then compared with a ground truth segmentation obtained from manual demarcation of RFA core and rim regions on the histopathology image. Quantitative comparison of the algorithm segmentation results was performed with evaluation metrics such as dice similarity coefficient (DSC = 0.78 ± 0.02 and 0.80 ± 0.02), sensitivity (Sn = 0.83 ± 0.10 and 0.91 ± 0.08), specificity (Sp = 0.76 ± 0.17 and 0.72 ± 0.17) and accuracy (Acc = 0.81 ± 0.09 and 0.71 ± 0.10) for RFA core and rim regions, respectively. This automatic segmentation of parametric depolarization images suggests a novel application of optical polarimetry, namely its use in objective RFA image quantification. PMID:28380013
Multi-exemplar affinity propagation.

PubMed

Wang, Chang-Dong; Lai, Jian-Huang; Suen, Ching Y; Zhu, Jun-Yong

2013-09-01

The affinity propagation (AP) clustering algorithm has received much attention in the past few years. AP is appealing because it is efficient, insensitive to initialization, and it produces clusters at a lower error rate than other exemplar-based methods. However, its single-exemplar model becomes inadequate when applied to model multisubclasses in some situations such as scene analysis and character recognition. To remedy this deficiency, we have extended the single-exemplar model to a multi-exemplar one to create a new multi-exemplar affinity propagation (MEAP) algorithm. This new model automatically determines the number of exemplars in each cluster associated with a super exemplar to approximate the subclasses in the category. Solving the model is NP-hard and we tackle it with the max-sum belief propagation to produce neighborhood maximum clusters, with no need to specify beforehand the number of clusters, multi-exemplars, and superexemplars. Also, utilizing the sparsity in the data, we are able to reduce substantially the computational time and storage. Experimental studies have shown MEAP's significant improvements over other algorithms on unsupervised image categorization and the clustering of handwritten digits.
Implicit gas-kinetic unified algorithm based on multi-block docking grid for multi-body reentry flows covering all flow regimes

NASA Astrophysics Data System (ADS)

Peng, Ao-Ping; Li, Zhi-Hui; Wu, Jun-Lin; Jiang, Xin-Yu

2016-12-01

Based on the previous researches of the Gas-Kinetic Unified Algorithm (GKUA) for flows from highly rarefied free-molecule transition to continuum, a new implicit scheme of cell-centered finite volume method is presented for directly solving the unified Boltzmann model equation covering various flow regimes. In view of the difficulty in generating the single-block grid system with high quality for complex irregular bodies, a multi-block docking grid generation method is designed on the basis of data transmission between blocks, and the data structure is constructed for processing arbitrary connection relations between blocks with high efficiency and reliability. As a result, the gas-kinetic unified algorithm with the implicit scheme and multi-block docking grid has been firstly established and used to solve the reentry flow problems around the multi-bodies covering all flow regimes with the whole range of Knudsen numbers from 10 to 3.7E-6. The implicit and explicit schemes are applied to computing and analyzing the supersonic flows in near-continuum and continuum regimes around a circular cylinder with careful comparison each other. It is shown that the present algorithm and modelling possess much higher computational efficiency and faster converging properties. The flow problems including two and three side-by-side cylinders are simulated from highly rarefied to near-continuum flow regimes, and the present computed results are found in good agreement with the related DSMC simulation and theoretical analysis solutions, which verify the good accuracy and reliability of the present method. It is observed that the spacing of the multi-body is smaller, the cylindrical throat obstruction is greater with the flow field of single-body asymmetrical more obviously and the normal force coefficient bigger. While in the near-continuum transitional flow regime of near-space flying surroundings, the spacing of the multi-body increases to six times of the diameter of the single-body, the interference effects of the multi-bodies tend to be negligible. The computing practice has confirmed that it is feasible for the present method to compute the aerodynamics and reveal flow mechanism around complex multi-body vehicles covering all flow regimes from the gas-kinetic point of view of solving the unified Boltzmann model velocity distribution function equation.
Density-based parallel skin lesion border detection with webCL

PubMed Central

2015-01-01

Background Dermoscopy is a highly effective and noninvasive imaging technique used in diagnosis of melanoma and other pigmented skin lesions. Many aspects of the lesion under consideration are defined in relation to the lesion border. This makes border detection one of the most important steps in dermoscopic image analysis. In current practice, dermatologists often delineate borders through a hand drawn representation based upon visual inspection. Due to the subjective nature of this technique, intra- and inter-observer variations are common. Because of this, the automated assessment of lesion borders in dermoscopic images has become an important area of study. Methods Fast density based skin lesion border detection method has been implemented in parallel with a new parallel technology called WebCL. WebCL utilizes client side computing capabilities to use available hardware resources such as multi cores and GPUs. Developed WebCL-parallel density based skin lesion border detection method runs efficiently from internet browsers. Results Previous research indicates that one of the highest accuracy rates can be achieved using density based clustering techniques for skin lesion border detection. While these algorithms do have unfavorable time complexities, this effect could be mitigated when implemented in parallel. In this study, density based clustering technique for skin lesion border detection is parallelized and redesigned to run very efficiently on the heterogeneous platforms (e.g. tablets, SmartPhones, multi-core CPUs, GPUs, and fully-integrated Accelerated Processing Units) by transforming the technique into a series of independent concurrent operations. Heterogeneous computing is adopted to support accessibility, portability and multi-device use in the clinical settings. For this, we used WebCL, an emerging technology that enables a HTML5 Web browser to execute code in parallel for heterogeneous platforms. We depicted WebCL and our parallel algorithm design. In addition, we tested parallel code on 100 dermoscopy images and showed the execution speedups with respect to the serial version. Results indicate that parallel (WebCL) version and serial version of density based lesion border detection methods generate the same accuracy rates for 100 dermoscopy images, in which mean of border error is 6.94%, mean of recall is 76.66%, and mean of precision is 99.29% respectively. Moreover, WebCL version's speedup factor for 100 dermoscopy images' lesion border detection averages around ~491.2. Conclusions When large amount of high resolution dermoscopy images considered in a usual clinical setting along with the critical importance of early detection and diagnosis of melanoma before metastasis, the importance of fast processing dermoscopy images become obvious. In this paper, we introduce WebCL and the use of it for biomedical image processing applications. WebCL is a javascript binding of OpenCL, which takes advantage of GPU computing from a web browser. Therefore, WebCL parallel version of density based skin lesion border detection introduced in this study can supplement expert dermatologist, and aid them in early diagnosis of skin lesions. While WebCL is currently an emerging technology, a full adoption of WebCL into the HTML5 standard would allow for this implementation to run on a very large set of hardware and software systems. WebCL takes full advantage of parallel computational resources including multi-cores and GPUs on a local machine, and allows for compiled code to run directly from the Web Browser. PMID:26423836
Density-based parallel skin lesion border detection with webCL.

PubMed

Lemon, James; Kockara, Sinan; Halic, Tansel; Mete, Mutlu

2015-01-01

Dermoscopy is a highly effective and noninvasive imaging technique used in diagnosis of melanoma and other pigmented skin lesions. Many aspects of the lesion under consideration are defined in relation to the lesion border. This makes border detection one of the most important steps in dermoscopic image analysis. In current practice, dermatologists often delineate borders through a hand drawn representation based upon visual inspection. Due to the subjective nature of this technique, intra- and inter-observer variations are common. Because of this, the automated assessment of lesion borders in dermoscopic images has become an important area of study. Fast density based skin lesion border detection method has been implemented in parallel with a new parallel technology called WebCL. WebCL utilizes client side computing capabilities to use available hardware resources such as multi cores and GPUs. Developed WebCL-parallel density based skin lesion border detection method runs efficiently from internet browsers. Previous research indicates that one of the highest accuracy rates can be achieved using density based clustering techniques for skin lesion border detection. While these algorithms do have unfavorable time complexities, this effect could be mitigated when implemented in parallel. In this study, density based clustering technique for skin lesion border detection is parallelized and redesigned to run very efficiently on the heterogeneous platforms (e.g. tablets, SmartPhones, multi-core CPUs, GPUs, and fully-integrated Accelerated Processing Units) by transforming the technique into a series of independent concurrent operations. Heterogeneous computing is adopted to support accessibility, portability and multi-device use in the clinical settings. For this, we used WebCL, an emerging technology that enables a HTML5 Web browser to execute code in parallel for heterogeneous platforms. We depicted WebCL and our parallel algorithm design. In addition, we tested parallel code on 100 dermoscopy images and showed the execution speedups with respect to the serial version. Results indicate that parallel (WebCL) version and serial version of density based lesion border detection methods generate the same accuracy rates for 100 dermoscopy images, in which mean of border error is 6.94%, mean of recall is 76.66%, and mean of precision is 99.29% respectively. Moreover, WebCL version's speedup factor for 100 dermoscopy images' lesion border detection averages around ~491.2. When large amount of high resolution dermoscopy images considered in a usual clinical setting along with the critical importance of early detection and diagnosis of melanoma before metastasis, the importance of fast processing dermoscopy images become obvious. In this paper, we introduce WebCL and the use of it for biomedical image processing applications. WebCL is a javascript binding of OpenCL, which takes advantage of GPU computing from a web browser. Therefore, WebCL parallel version of density based skin lesion border detection introduced in this study can supplement expert dermatologist, and aid them in early diagnosis of skin lesions. While WebCL is currently an emerging technology, a full adoption of WebCL into the HTML5 standard would allow for this implementation to run on a very large set of hardware and software systems. WebCL takes full advantage of parallel computational resources including multi-cores and GPUs on a local machine, and allows for compiled code to run directly from the Web Browser.
A mass, momentum, and energy conserving, fully implicit, scalable algorithm for the multi-dimensional, multi-species Rosenbluth-Fokker-Planck equation

NASA Astrophysics Data System (ADS)

Taitano, W. T.; Chacón, L.; Simakov, A. N.; Molvig, K.

2015-09-01

In this study, we demonstrate a fully implicit algorithm for the multi-species, multidimensional Rosenbluth-Fokker-Planck equation which is exactly mass-, momentum-, and energy-conserving, and which preserves positivity. Unlike most earlier studies, we base our development on the Rosenbluth (rather than Landau) form of the Fokker-Planck collision operator, which reduces complexity while allowing for an optimal fully implicit treatment. Our discrete conservation strategy employs nonlinear constraints that force the continuum symmetries of the collision operator to be satisfied upon discretization. We converge the resulting nonlinear system iteratively using Jacobian-free Newton-Krylov methods, effectively preconditioned with multigrid methods for efficiency. Single- and multi-species numerical examples demonstrate the advertised accuracy properties of the scheme, and the superior algorithmic performance of our approach. In particular, the discretization approach is numerically shown to be second-order accurate in time and velocity space and to exhibit manifestly positive entropy production. That is, H-theorem behavior is indicated for all the examples we have tested. The solution approach is demonstrated to scale optimally with respect to grid refinement (with CPU time growing linearly with the number of mesh points), and timestep (showing very weak dependence of CPU time with time-step size). As a result, the proposed algorithm delivers several orders-of-magnitude speedup vs. explicit algorithms.
An accurate and computationally efficient algorithm for ground peak identification in large footprint waveform LiDAR data

NASA Astrophysics Data System (ADS)

Zhuang, Wei; Mountrakis, Giorgos

2014-09-01

Large footprint waveform LiDAR sensors have been widely used for numerous airborne studies. Ground peak identification in a large footprint waveform is a significant bottleneck in exploring full usage of the waveform datasets. In the current study, an accurate and computationally efficient algorithm was developed for ground peak identification, called Filtering and Clustering Algorithm (FICA). The method was evaluated on Land, Vegetation, and Ice Sensor (LVIS) waveform datasets acquired over Central NY. FICA incorporates a set of multi-scale second derivative filters and a k-means clustering algorithm in order to avoid detecting false ground peaks. FICA was tested in five different land cover types (deciduous trees, coniferous trees, shrub, grass and developed area) and showed more accurate results when compared to existing algorithms. More specifically, compared with Gaussian decomposition, the RMSE ground peak identification by FICA was 2.82 m (5.29 m for GD) in deciduous plots, 3.25 m (4.57 m for GD) in coniferous plots, 2.63 m (2.83 m for GD) in shrub plots, 0.82 m (0.93 m for GD) in grass plots, and 0.70 m (0.51 m for GD) in plots of developed areas. FICA performance was also relatively consistent under various slope and canopy coverage (CC) conditions. In addition, FICA showed better computational efficiency compared to existing methods. FICA's major computational and accuracy advantage is a result of the adopted multi-scale signal processing procedures that concentrate on local portions of the signal as opposed to the Gaussian decomposition that uses a curve-fitting strategy applied in the entire signal. The FICA algorithm is a good candidate for large-scale implementation on future space-borne waveform LiDAR sensors.
A multi-dimensional nonlinearly implicit, electromagnetic Vlasov-Darwin particle-in-cell (PIC) algorithm

NASA Astrophysics Data System (ADS)

Chen, Guangye; Chacón, Luis; CoCoMans Team

2014-10-01

For decades, the Vlasov-Darwin model has been recognized to be attractive for PIC simulations (to avoid radiative noise issues) in non-radiative electromagnetic regimes. However, the Darwin model results in elliptic field equations that renders explicit time integration unconditionally unstable. Improving on linearly implicit schemes, fully implicit PIC algorithms for both electrostatic and electromagnetic regimes, with exact discrete energy and charge conservation properties, have been recently developed in 1D. This study builds on these recent algorithms to develop an implicit, orbit-averaged, time-space-centered finite difference scheme for the particle-field equations in multiple dimensions. The algorithm conserves energy, charge, and canonical-momentum exactly, even with grid packing. A simple fluid preconditioner allows efficient use of large timesteps, O (√{mi/me}c/veT) larger than the explicit CFL. We demonstrate the accuracy and efficiency properties of the of the algorithm with various numerical experiments in 2D3V.
A joint tracking method for NSCC based on WLS algorithm

NASA Astrophysics Data System (ADS)

Luo, Ruidan; Xu, Ying; Yuan, Hong

2017-12-01

Navigation signal based on compound carrier (NSCC), has the flexible multi-carrier scheme and various scheme parameters configuration, which enables it to possess significant efficiency of navigation augmentation in terms of spectral efficiency, tracking accuracy, multipath mitigation capability and anti-jamming reduction compared with legacy navigation signals. Meanwhile, the typical scheme characteristics can provide auxiliary information for signal synchronism algorithm design. This paper, based on the characteristics of NSCC, proposed a kind of joint tracking method utilizing Weighted Least Square (WLS) algorithm. In this method, the LS algorithm is employed to jointly estimate each sub-carrier frequency shift with the frequency-Doppler linear relationship, by utilizing the known sub-carrier frequency. Besides, the weighting matrix is set adaptively according to the sub-carrier power to ensure the estimation accuracy. Both the theory analysis and simulation results illustrate that the tracking accuracy and sensitivity of this method outperforms the single-carrier algorithm with lower SNR.
Efficient geometric rectification techniques for spectral analysis algorithm

NASA Technical Reports Server (NTRS)

Chang, C. Y.; Pang, S. S.; Curlander, J. C.

1992-01-01

The spectral analysis algorithm is a viable technique for processing synthetic aperture radar (SAR) data in near real time throughput rates by trading the image resolution. One major challenge of the spectral analysis algorithm is that the output image, often referred to as the range-Doppler image, is represented in the iso-range and iso-Doppler lines, a curved grid format. This phenomenon is known to be the fanshape effect. Therefore, resampling is required to convert the range-Doppler image into a rectangular grid format before the individual images can be overlaid together to form seamless multi-look strip imagery. An efficient algorithm for geometric rectification of the range-Doppler image is presented. The proposed algorithm, realized in two one-dimensional resampling steps, takes into consideration the fanshape phenomenon of the range-Doppler image as well as the high squint angle and updates of the cross-track and along-track Doppler parameters. No ground reference points are required.
A novel algorithm of super-resolution image reconstruction based on multi-class dictionaries for natural scene

NASA Astrophysics Data System (ADS)

Wu, Wei; Zhao, Dewei; Zhang, Huan

2015-12-01

Super-resolution image reconstruction is an effective method to improve the image quality. It has important research significance in the field of image processing. However, the choice of the dictionary directly affects the efficiency of image reconstruction. A sparse representation theory is introduced into the problem of the nearest neighbor selection. Based on the sparse representation of super-resolution image reconstruction method, a super-resolution image reconstruction algorithm based on multi-class dictionary is analyzed. This method avoids the redundancy problem of only training a hyper complete dictionary, and makes the sub-dictionary more representatives, and then replaces the traditional Euclidean distance computing method to improve the quality of the whole image reconstruction. In addition, the ill-posed problem is introduced into non-local self-similarity regularization. Experimental results show that the algorithm is much better results than state-of-the-art algorithm in terms of both PSNR and visual perception.
HACC: Simulating sky surveys on state-of-the-art supercomputing architectures

NASA Astrophysics Data System (ADS)

Habib, Salman; Pope, Adrian; Finkel, Hal; Frontiere, Nicholas; Heitmann, Katrin; Daniel, David; Fasel, Patricia; Morozov, Vitali; Zagaris, George; Peterka, Tom; Vishwanath, Venkatram; Lukić, Zarija; Sehrish, Saba; Liao, Wei-keng

2016-01-01

Current and future surveys of large-scale cosmic structure are associated with a massive and complex datastream to study, characterize, and ultimately understand the physics behind the two major components of the 'Dark Universe', dark energy and dark matter. In addition, the surveys also probe primordial perturbations and carry out fundamental measurements, such as determining the sum of neutrino masses. Large-scale simulations of structure formation in the Universe play a critical role in the interpretation of the data and extraction of the physics of interest. Just as survey instruments continue to grow in size and complexity, so do the supercomputers that enable these simulations. Here we report on HACC (Hardware/Hybrid Accelerated Cosmology Code), a recently developed and evolving cosmology N-body code framework, designed to run efficiently on diverse computing architectures and to scale to millions of cores and beyond. HACC can run on all current supercomputer architectures and supports a variety of programming models and algorithms. It has been demonstrated at scale on Cell- and GPU-accelerated systems, standard multi-core node clusters, and Blue Gene systems. HACC's design allows for ease of portability, and at the same time, high levels of sustained performance on the fastest supercomputers available. We present a description of the design philosophy of HACC, the underlying algorithms and code structure, and outline implementation details for several specific architectures. We show selected accuracy and performance results from some of the largest high resolution cosmological simulations so far performed, including benchmarks evolving more than 3.6 trillion particles.
HACC: Simulating sky surveys on state-of-the-art supercomputing architectures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Habib, Salman; Pope, Adrian; Finkel, Hal

2016-01-01

Current and future surveys of large-scale cosmic structure are associated with a massive and complex datastream to study, characterize, and ultimately understand the physics behind the two major components of the ‘Dark Universe’, dark energy and dark matter. In addition, the surveys also probe primordial perturbations and carry out fundamental measurements, such as determining the sum of neutrino masses. Large-scale simulations of structure formation in the Universe play a critical role in the interpretation of the data and extraction of the physics of interest. Just as survey instruments continue to grow in size and complexity, so do the supercomputers thatmore » enable these simulations. Here we report on HACC (Hardware/Hybrid Accelerated Cosmology Code), a recently developed and evolving cosmology N-body code framework, designed to run efficiently on diverse computing architectures and to scale to millions of cores and beyond. HACC can run on all current supercomputer architectures and supports a variety of programming models and algorithms. It has been demonstrated at scale on Cell- and GPU-accelerated systems, standard multi-core node clusters, and Blue Gene systems. HACC’s design allows for ease of portability, and at the same time, high levels of sustained performance on the fastest supercomputers available. We present a description of the design philosophy of HACC, the underlying algorithms and code structure, and outline implementation details for several specific architectures. We show selected accuracy and performance results from some of the largest high resolution cosmological simulations so far performed, including benchmarks evolving more than 3.6 trillion particles.« less
Parallelization of combinatorial search when solving knapsack optimization problem on computing systems based on multicore processors

NASA Astrophysics Data System (ADS)

Rahman, P. A.

2018-05-01

This scientific paper deals with the model of the knapsack optimization problem and method of its solving based on directed combinatorial search in the boolean space. The offered by the author specialized mathematical model of decomposition of the search-zone to the separate search-spheres and the algorithm of distribution of the search-spheres to the different cores of the multi-core processor are also discussed. The paper also provides an example of decomposition of the search-zone to the several search-spheres and distribution of the search-spheres to the different cores of the quad-core processor. Finally, an offered by the author formula for estimation of the theoretical maximum of the computational acceleration, which can be achieved due to the parallelization of the search-zone to the search-spheres on the unlimited number of the processor cores, is also given.
Improving Design Efficiency for Large-Scale Heterogeneous Circuits

NASA Astrophysics Data System (ADS)

Gregerson, Anthony

Despite increases in logic density, many Big Data applications must still be partitioned across multiple computing devices in order to meet their strict performance requirements. Among the most demanding of these applications is high-energy physics (HEP), which uses complex computing systems consisting of thousands of FPGAs and ASICs to process the sensor data created by experiments at particles accelerators such as the Large Hadron Collider (LHC). Designing such computing systems is challenging due to the scale of the systems, the exceptionally high-throughput and low-latency performance constraints that necessitate application-specific hardware implementations, the requirement that algorithms are efficiently partitioned across many devices, and the possible need to update the implemented algorithms during the lifetime of the system. In this work, we describe our research to develop flexible architectures for implementing such large-scale circuits on FPGAs. In particular, this work is motivated by (but not limited in scope to) high-energy physics algorithms for the Compact Muon Solenoid (CMS) experiment at the LHC. To make efficient use of logic resources in multi-FPGA systems, we introduce Multi-Personality Partitioning, a novel form of the graph partitioning problem, and present partitioning algorithms that can significantly improve resource utilization on heterogeneous devices while also reducing inter-chip connections. To reduce the high communication costs of Big Data applications, we also introduce Information-Aware Partitioning, a partitioning method that analyzes the data content of application-specific circuits, characterizes their entropy, and selects circuit partitions that enable efficient compression of data between chips. We employ our information-aware partitioning method to improve the performance of the hardware validation platform for evaluating new algorithms for the CMS experiment. Together, these research efforts help to improve the efficiency and decrease the cost of the developing large-scale, heterogeneous circuits needed to enable large-scale application in high-energy physics and other important areas.
Efficient Irregular Wavefront Propagation Algorithms on Hybrid CPU-GPU Machines

PubMed Central

Teodoro, George; Pan, Tony; Kurc, Tahsin; Kong, Jun; Cooper, Lee; Saltz, Joel

2013-01-01

We address the problem of efficient execution of a computation pattern, referred to here as the irregular wavefront propagation pattern (IWPP), on hybrid systems with multiple CPUs and GPUs. The IWPP is common in several image processing operations. In the IWPP, data elements in the wavefront propagate waves to their neighboring elements on a grid if a propagation condition is satisfied. Elements receiving the propagated waves become part of the wavefront. This pattern results in irregular data accesses and computations. We develop and evaluate strategies for efficient computation and propagation of wavefronts using a multi-level queue structure. This queue structure improves the utilization of fast memories in a GPU and reduces synchronization overheads. We also develop a tile-based parallelization strategy to support execution on multiple CPUs and GPUs. We evaluate our approaches on a state-of-the-art GPU accelerated machine (equipped with 3 GPUs and 2 multicore CPUs) using the IWPP implementations of two widely used image processing operations: morphological reconstruction and euclidean distance transform. Our results show significant performance improvements on GPUs. The use of multiple CPUs and GPUs cooperatively attains speedups of 50× and 85× with respect to single core CPU executions for morphological reconstruction and euclidean distance transform, respectively. PMID:23908562
Array-based Hierarchical Mesh Generation in Parallel

DOE PAGES

Ray, Navamita; Grindeanu, Iulian; Zhao, Xinglin; ...

2015-11-03

In this paper, we describe an array-based hierarchical mesh generation capability through uniform refinement of unstructured meshes for efficient solution of PDE's using finite element methods and multigrid solvers. A multi-degree, multi-dimensional and multi-level framework is designed to generate the nested hierarchies from an initial mesh that can be used for a number of purposes such as multi-level methods to generating large meshes. The capability is developed under the parallel mesh framework “Mesh Oriented dAtaBase” a.k.a MOAB. We describe the underlying data structures and algorithms to generate such hierarchies and present numerical results for computational efficiency and mesh quality. Inmore » conclusion, we also present results to demonstrate the applicability of the developed capability to a multigrid finite-element solver.« less
Efficient parallelization for AMR MHD multiphysics calculations; implementation in AstroBEAR

NASA Astrophysics Data System (ADS)

Carroll-Nellenback, Jonathan J.; Shroyer, Brandon; Frank, Adam; Ding, Chen

2013-03-01

Current adaptive mesh refinement (AMR) simulations require algorithms that are highly parallelized and manage memory efficiently. As compute engines grow larger, AMR simulations will require algorithms that achieve new levels of efficient parallelization and memory management. We have attempted to employ new techniques to achieve both of these goals. Patch or grid based AMR often employs ghost cells to decouple the hyperbolic advances of each grid on a given refinement level. This decoupling allows each grid to be advanced independently. In AstroBEAR we utilize this independence by threading the grid advances on each level with preference going to the finer level grids. This allows for global load balancing instead of level by level load balancing and allows for greater parallelization across both physical space and AMR level. Threading of level advances can also improve performance by interleaving communication with computation, especially in deep simulations with many levels of refinement. While we see improvements of up to 30% on deep simulations run on a few cores, the speedup is typically more modest (5-20%) for larger scale simulations. To improve memory management we have employed a distributed tree algorithm that requires processors to only store and communicate local sections of the AMR tree structure with neighboring processors. Using this distributed approach we are able to get reasonable scaling efficiency (>80%) out to 12288 cores and up to 8 levels of AMR - independent of the use of threading.

High-order hydrodynamic algorithms for exascale computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Morgan, Nathaniel Ray

Hydrodynamic algorithms are at the core of many laboratory missions ranging from simulating ICF implosions to climate modeling. The hydrodynamic algorithms commonly employed at the laboratory and in industry (1) typically lack requisite accuracy for complex multi- material vortical flows and (2) are not well suited for exascale computing due to poor data locality and poor FLOP/memory ratios. Exascale computing requires advances in both computer science and numerical algorithms. We propose to research the second requirement and create a new high-order hydrodynamic algorithm that has superior accuracy, excellent data locality, and excellent FLOP/memory ratios. This proposal will impact a broadmore » range of research areas including numerical theory, discrete mathematics, vorticity evolution, gas dynamics, interface instability evolution, turbulent flows, fluid dynamics and shock driven flows. If successful, the proposed research has the potential to radically transform simulation capabilities and help position the laboratory for computing at the exascale.« less
Efficient data communication protocols for wireless networks

NASA Astrophysics Data System (ADS)

Zeydan, Engin

In this dissertation, efficient decentralized algorithms are investigated for cost minimization problems in wireless networks. For wireless sensor networks, we investigate both the reduction in the energy consumption and throughput maximization problems separately using multi-hop data aggregation for correlated data in wireless sensor networks. The proposed algorithms exploit data redundancy using a game theoretic framework. For energy minimization, routes are chosen to minimize the total energy expended by the network using best response dynamics to local data. The cost function used in routing takes into account distance, interference and in-network data aggregation. The proposed energy-efficient correlation-aware routing algorithm significantly reduces the energy consumption in the network and converges in a finite number of steps iteratively. For throughput maximization, we consider both the interference distribution across the network and correlation between forwarded data when establishing routes. Nodes along each route are chosen to minimize the interference impact in their neighborhood and to maximize the in-network data aggregation. The resulting network topology maximizes the global network throughput and the algorithm is guaranteed to converge with a finite number of steps using best response dynamics. For multiple antenna wireless ad-hoc networks, we present distributed cooperative and regret-matching based learning schemes for joint transmit beanformer and power level selection problem for nodes operating in multi-user interference environment. Total network transmit power is minimized while ensuring a constant received signal-to-interference and noise ratio at each receiver. In cooperative and regret-matching based power minimization algorithms, transmit beanformers are selected from a predefined codebook to minimize the total power. By selecting transmit beamformers judiciously and performing power adaptation, the cooperative algorithm is shown to converge to pure strategy Nash equilibrium with high probability throughout the iterations in the interference impaired network. On the other hand, the regret-matching learning algorithm is noncooperative and requires minimum amount of overhead. The proposed cooperative and regret-matching based distributed algorithms are also compared with centralized solutions through simulation results.
Detection of Nitrogen Content in Rubber Leaves Using Near-Infrared (NIR) Spectroscopy with Correlation-Based Successive Projections Algorithm (SPA).

PubMed

Tang, Rongnian; Chen, Xupeng; Li, Chuang

2018-05-01

Near-infrared spectroscopy is an efficient, low-cost technology that has potential as an accurate method in detecting the nitrogen content of natural rubber leaves. Successive projections algorithm (SPA) is a widely used variable selection method for multivariate calibration, which uses projection operations to select a variable subset with minimum multi-collinearity. However, due to the fluctuation of correlation between variables, high collinearity may still exist in non-adjacent variables of subset obtained by basic SPA. Based on analysis to the correlation matrix of the spectra data, this paper proposed a correlation-based SPA (CB-SPA) to apply the successive projections algorithm in regions with consistent correlation. The result shows that CB-SPA can select variable subsets with more valuable variables and less multi-collinearity. Meanwhile, models established by the CB-SPA subset outperform basic SPA subsets in predicting nitrogen content in terms of both cross-validation and external prediction. Moreover, CB-SPA is assured to be more efficient, for the time cost in its selection procedure is one-twelfth that of the basic SPA.
Identification of multivariable nonlinear systems in the presence of colored noises using iterative hierarchical least squares algorithm.

PubMed

Jafari, Masoumeh; Salimifard, Maryam; Dehghani, Maryam

2014-07-01

This paper presents an efficient method for identification of nonlinear Multi-Input Multi-Output (MIMO) systems in the presence of colored noises. The method studies the multivariable nonlinear Hammerstein and Wiener models, in which, the nonlinear memory-less block is approximated based on arbitrary vector-based basis functions. The linear time-invariant (LTI) block is modeled by an autoregressive moving average with exogenous (ARMAX) model which can effectively describe the moving average noises as well as the autoregressive and the exogenous dynamics. According to the multivariable nature of the system, a pseudo-linear-in-the-parameter model is obtained which includes two different kinds of unknown parameters, a vector and a matrix. Therefore, the standard least squares algorithm cannot be applied directly. To overcome this problem, a Hierarchical Least Squares Iterative (HLSI) algorithm is used to simultaneously estimate the vector and the matrix of unknown parameters as well as the noises. The efficiency of the proposed identification approaches are investigated through three nonlinear MIMO case studies. Copyright © 2014 ISA. Published by Elsevier Ltd. All rights reserved.
Novel density-based and hierarchical density-based clustering algorithms for uncertain data.

PubMed

Zhang, Xianchao; Liu, Han; Zhang, Xiaotong

2017-09-01

Uncertain data has posed a great challenge to traditional clustering algorithms. Recently, several algorithms have been proposed for clustering uncertain data, and among them density-based techniques seem promising for handling data uncertainty. However, some issues like losing uncertain information, high time complexity and nonadaptive threshold have not been addressed well in the previous density-based algorithm FDBSCAN and hierarchical density-based algorithm FOPTICS. In this paper, we firstly propose a novel density-based algorithm PDBSCAN, which improves the previous FDBSCAN from the following aspects: (1) it employs a more accurate method to compute the probability that the distance between two uncertain objects is less than or equal to a boundary value, instead of the sampling-based method in FDBSCAN; (2) it introduces new definitions of probability neighborhood, support degree, core object probability, direct reachability probability, thus reducing the complexity and solving the issue of nonadaptive threshold (for core object judgement) in FDBSCAN. Then, we modify the algorithm PDBSCAN to an improved version (PDBSCANi), by using a better cluster assignment strategy to ensure that every object will be assigned to the most appropriate cluster, thus solving the issue of nonadaptive threshold (for direct density reachability judgement) in FDBSCAN. Furthermore, as PDBSCAN and PDBSCANi have difficulties for clustering uncertain data with non-uniform cluster density, we propose a novel hierarchical density-based algorithm POPTICS by extending the definitions of PDBSCAN, adding new definitions of fuzzy core distance and fuzzy reachability distance, and employing a new clustering framework. POPTICS can reveal the cluster structures of the datasets with different local densities in different regions better than PDBSCAN and PDBSCANi, and it addresses the issues in FOPTICS. Experimental results demonstrate the superiority of our proposed algorithms over the existing algorithms in accuracy and efficiency. Copyright © 2017 Elsevier Ltd. All rights reserved.
Scalable Motion Estimation Processor Core for Multimedia System-on-Chip Applications

NASA Astrophysics Data System (ADS)

Lai, Yeong-Kang; Hsieh, Tian-En; Chen, Lien-Fei

2007-04-01

In this paper, we describe a high-throughput and scalable motion estimation processor architecture for multimedia system-on-chip applications. The number of processing elements (PEs) is scalable according to the variable algorithm parameters and the performance required for different applications. Using the PE rings efficiently and an intelligent memory-interleaving organization, the efficiency of the architecture can be increased. Moreover, using efficient on-chip memories and a data management technique can effectively decrease the power consumption and memory bandwidth. Techniques for reducing the number of interconnections and external memory accesses are also presented. Our results demonstrate that the proposed scalable PE-ringed architecture is a flexible and high-performance processor core in multimedia system-on-chip applications.
Techniques in processing multi-frequency multi-polarization spaceborne SAR data

NASA Technical Reports Server (NTRS)

Curlander, John C.; Chang, C. Y.

1991-01-01

This paper presents the algorithm design of the SIR-C ground data processor, with emphasis on the unique elements involved in the production of registered multifrequency polarimetric data products. A quick-look processing algorithm used for generation of low-resolution browse image products and estimation of echo signal parameters is also presented. Specifically the discussion covers: (1) azimuth reference function generation to produce registered polarimetric imagery; (2) geometric rectification to accommondate cross-track and along-track Doppler drifts; (3) multilook filtering designed to generate output imagery with a uniform resolution; and (4) efficient coding to compress the polarimetric image data for distribution.
Improved particle swarm optimization algorithm for android medical care IOT using modified parameters.

PubMed

Sung, Wen-Tsai; Chiang, Yen-Chun

2012-12-01

This study examines wireless sensor network with real-time remote identification using the Android study of things (HCIOT) platform in community healthcare. An improved particle swarm optimization (PSO) method is proposed to efficiently enhance physiological multi-sensors data fusion measurement precision in the Internet of Things (IOT) system. Improved PSO (IPSO) includes: inertia weight factor design, shrinkage factor adjustment to allow improved PSO algorithm data fusion performance. The Android platform is employed to build multi-physiological signal processing and timely medical care of things analysis. Wireless sensor network signal transmission and Internet links allow community or family members to have timely medical care network services.
Using learning automata to determine proper subset size in high-dimensional spaces

NASA Astrophysics Data System (ADS)

Seyyedi, Seyyed Hossein; Minaei-Bidgoli, Behrouz

2017-03-01

In this paper, we offer a new method called FSLA (Finding the best candidate Subset using Learning Automata), which combines the filter and wrapper approaches for feature selection in high-dimensional spaces. Considering the difficulties of dimension reduction in high-dimensional spaces, FSLA's multi-objective functionality is to determine, in an efficient manner, a feature subset that leads to an appropriate tradeoff between the learning algorithm's accuracy and efficiency. First, using an existing weighting function, the feature list is sorted and selected subsets of the list of different sizes are considered. Then, a learning automaton verifies the performance of each subset when it is used as the input space of the learning algorithm and estimates its fitness upon the algorithm's accuracy and the subset size, which determines the algorithm's efficiency. Finally, FSLA introduces the fittest subset as the best choice. We tested FSLA in the framework of text classification. The results confirm its promising performance of attaining the identified goal.
A novel algorithm for solving the true coincident counting issues in Monte Carlo simulations for radiation spectroscopy.

PubMed

Guan, Fada; Johns, Jesse M; Vasudevan, Latha; Zhang, Guoqing; Tang, Xiaobin; Poston, John W; Braby, Leslie A

2015-06-01

Coincident counts can be observed in experimental radiation spectroscopy. Accurate quantification of the radiation source requires the detection efficiency of the spectrometer, which is often experimentally determined. However, Monte Carlo analysis can be used to supplement experimental approaches to determine the detection efficiency a priori. The traditional Monte Carlo method overestimates the detection efficiency as a result of omitting coincident counts caused mainly by multiple cascade source particles. In this study, a novel "multi-primary coincident counting" algorithm was developed using the Geant4 Monte Carlo simulation toolkit. A high-purity Germanium detector for ⁶⁰Co gamma-ray spectroscopy problems was accurately modeled to validate the developed algorithm. The simulated pulse height spectrum agreed well qualitatively with the measured spectrum obtained using the high-purity Germanium detector. The developed algorithm can be extended to other applications, with a particular emphasis on challenging radiation fields, such as counting multiple types of coincident radiations released from nuclear fission or used nuclear fuel.
Real-time dose computation: GPU-accelerated source modeling and superposition/convolution

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jacques, Robert; Wong, John; Taylor, Russell

Purpose: To accelerate dose calculation to interactive rates using highly parallel graphics processing units (GPUs). Methods: The authors have extended their prior work in GPU-accelerated superposition/convolution with a modern dual-source model and have enhanced performance. The primary source algorithm supports both focused leaf ends and asymmetric rounded leaf ends. The extra-focal algorithm uses a discretized, isotropic area source and models multileaf collimator leaf height effects. The spectral and attenuation effects of static beam modifiers were integrated into each source's spectral function. The authors introduce the concepts of arc superposition and delta superposition. Arc superposition utilizes separate angular sampling for themore » total energy released per unit mass (TERMA) and superposition computations to increase accuracy and performance. Delta superposition allows single beamlet changes to be computed efficiently. The authors extended their concept of multi-resolution superposition to include kernel tilting. Multi-resolution superposition approximates solid angle ray-tracing, improving performance and scalability with a minor loss in accuracy. Superposition/convolution was implemented using the inverse cumulative-cumulative kernel and exact radiological path ray-tracing. The accuracy analyses were performed using multiple kernel ray samplings, both with and without kernel tilting and multi-resolution superposition. Results: Source model performance was <9 ms (data dependent) for a high resolution (400{sup 2}) field using an NVIDIA (Santa Clara, CA) GeForce GTX 280. Computation of the physically correct multispectral TERMA attenuation was improved by a material centric approach, which increased performance by over 80%. Superposition performance was improved by {approx}24% to 0.058 and 0.94 s for 64{sup 3} and 128{sup 3} water phantoms; a speed-up of 101-144x over the highly optimized Pinnacle{sup 3} (Philips, Madison, WI) implementation. Pinnacle{sup 3} times were 8.3 and 94 s, respectively, on an AMD (Sunnyvale, CA) Opteron 254 (two cores, 2.8 GHz). Conclusions: The authors have completed a comprehensive, GPU-accelerated dose engine in order to provide a substantial performance gain over CPU based implementations. Real-time dose computation is feasible with the accuracy levels of the superposition/convolution algorithm.« less
A heuristic approach using multiple criteria for environmentally benign 3PLs selection

NASA Astrophysics Data System (ADS)

Kongar, Elif

2005-11-01

Maintaining competitiveness in an environment where price and quality differences between competing products are disappearing depends on the company's ability to reduce costs and supply time. Timely responses to rapidly changing market conditions require an efficient Supply Chain Management (SCM). Outsourcing logistics to third-party logistics service providers (3PLs) is one commonly used way of increasing the efficiency of logistics operations, while creating a more "core competency focused" business environment. However, this alone may not be sufficient. Due to recent environmental regulations and growing public awareness regarding environmental issues, 3PLs need to be not only efficient but also environmentally benign to maintain companies' competitiveness. Even though an efficient and environmentally benign combination of 3PLs can theoretically be obtained using exhaustive search algorithms, heuristics approaches to the selection process may be superior in terms of the computational complexity. In this paper, a hybrid approach that combines a multiple criteria Genetic Algorithm (GA) with Linear Physical Weighting Algorithm (LPPW) to be used in efficient and environmentally benign 3PLs is proposed. A numerical example is also provided to illustrate the method and the analyses.
Chance-constrained multi-objective optimization of groundwater remediation design at DNAPLs-contaminated sites using a multi-algorithm genetically adaptive method

NASA Astrophysics Data System (ADS)

Ouyang, Qi; Lu, Wenxi; Hou, Zeyu; Zhang, Yu; Li, Shuai; Luo, Jiannan

2017-05-01

In this paper, a multi-algorithm genetically adaptive multi-objective (AMALGAM) method is proposed as a multi-objective optimization solver. It was implemented in the multi-objective optimization of a groundwater remediation design at sites contaminated by dense non-aqueous phase liquids. In this study, there were two objectives: minimization of the total remediation cost, and minimization of the remediation time. A non-dominated sorting genetic algorithm II (NSGA-II) was adopted to compare with the proposed method. For efficiency, the time-consuming surfactant-enhanced aquifer remediation simulation model was replaced by a surrogate model constructed by a multi-gene genetic programming (MGGP) technique. Similarly, two other surrogate modeling methods-support vector regression (SVR) and Kriging (KRG)-were employed to make comparisons with MGGP. In addition, the surrogate-modeling uncertainty was incorporated in the optimization model by chance-constrained programming (CCP). The results showed that, for the problem considered in this study, (1) the solutions obtained by AMALGAM incurred less remediation cost and required less time than those of NSGA-II, indicating that AMALGAM outperformed NSGA-II. It was additionally shown that (2) the MGGP surrogate model was more accurate than SVR and KRG; and (3) the remediation cost and time increased with the confidence level, which can enable decision makers to make a suitable choice by considering the given budget, remediation time, and reliability.
PARALLELISATION OF THE MODEL-BASED ITERATIVE RECONSTRUCTION ALGORITHM DIRA.

PubMed

Örtenberg, A; Magnusson, M; Sandborg, M; Alm Carlsson, G; Malusek, A

2016-06-01

New paradigms for parallel programming have been devised to simplify software development on multi-core processors and many-core graphical processing units (GPU). Despite their obvious benefits, the parallelisation of existing computer programs is not an easy task. In this work, the use of the Open Multiprocessing (OpenMP) and Open Computing Language (OpenCL) frameworks is considered for the parallelisation of the model-based iterative reconstruction algorithm DIRA with the aim to significantly shorten the code's execution time. Selected routines were parallelised using OpenMP and OpenCL libraries; some routines were converted from MATLAB to C and optimised. Parallelisation of the code with the OpenMP was easy and resulted in an overall speedup of 15 on a 16-core computer. Parallelisation with OpenCL was more difficult owing to differences between the central processing unit and GPU architectures. The resulting speedup was substantially lower than the theoretical peak performance of the GPU; the cause was explained. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Seed robustness of oriented relative fuzzy connectedness: core computation and its applications

NASA Astrophysics Data System (ADS)

Tavares, Anderson C. M.; Bejar, Hans H. C.; Miranda, Paulo A. V.

2017-02-01

In this work, we present a formal definition and an efficient algorithm to compute the cores of Oriented Relative Fuzzy Connectedness (ORFC), a recent seed-based segmentation technique. The core is a region where the seed can be moved without altering the segmentation, an important aspect for robust techniques and reduction of user effort. We show how ORFC cores can be used to build a powerful hybrid image segmentation approach. We also provide some new theoretical relations between ORFC and Oriented Image Foresting Transform (OIFT), as well as their cores. Experimental results among several methods show that the hybrid approach conserves high accuracy, avoids the shrinking problem and provides robustness to seed placement inside the desired object due to the cores properties.
Analyses of multi-color plant-growth light sources in achieving maximum photosynthesis efficiencies with enhanced color qualities.

PubMed

Wu, Tingzhu; Lin, Yue; Zheng, Lili; Guo, Ziquan; Xu, Jianxing; Liang, Shijie; Liu, Zhuguagn; Lu, Yijun; Shih, Tien-Mo; Chen, Zhong

2018-02-19

An optimal design of light-emitting diode (LED) lighting that benefits both the photosynthesis performance for plants and the visional health for human eyes has drawn considerable attention. In the present study, we have developed a multi-color driving algorithm that serves as a liaison between desired spectral power distributions and pulse-width-modulation duty cycles. With the aid of this algorithm, our multi-color plant-growth light sources can optimize correlated-color temperature (CCT) and color rendering index (CRI) such that photosynthetic luminous efficacy of radiation (PLER) is maximized regardless of the number of LEDs and the type of photosynthetic action spectrum (PAS). In order to illustrate the accuracies of the proposed algorithm and the practicalities of our plant-growth light sources, we choose six color LEDs and German PAS for experiments. Finally, our study can help provide a useful guide to improve light qualities in plant factories, in which long-term co-inhabitance of plants and human beings is required.
A multi-objective genetic algorithm for a mixed-model assembly U-line balancing type-I problem considering human-related issues, training, and learning

NASA Astrophysics Data System (ADS)

Rabbani, Masoud; Montazeri, Mona; Farrokhi-Asl, Hamed; Rafiei, Hamed

2016-12-01

Mixed-model assembly lines are increasingly accepted in many industrial environments to meet the growing trend of greater product variability, diversification of customer demands, and shorter life cycles. In this research, a new mathematical model is presented considering balancing a mixed-model U-line and human-related issues, simultaneously. The objective function consists of two separate components. The first part of the objective function is related to balance problem. In this part, objective functions are minimizing the cycle time, minimizing the number of workstations, and maximizing the line efficiencies. The second part is related to human issues and consists of hiring cost, firing cost, training cost, and salary. To solve the presented model, two well-known multi-objective evolutionary algorithms, namely non-dominated sorting genetic algorithm and multi-objective particle swarm optimization, have been used. A simple solution representation is provided in this paper to encode the solutions. Finally, the computational results are compared and analyzed.
Computational multicore on two-layer 1D shallow water equations for erodible dambreak

NASA Astrophysics Data System (ADS)

Simanjuntak, C. A.; Bagustara, B. A. R. H.; Gunawan, P. H.

2018-03-01

The simulation of erodible dambreak using two-layer shallow water equations and SCHR scheme are elaborated in this paper. The results show that the two-layer SWE model in a good agreement with the data experiment which is performed by Louvain-la-Neuve Université Catholique de Louvain. Moreover, the parallel algorithm with multicore architecture are given in the results. The results show that Computer I with processor Intel(R) Core(TM) i5-2500 CPU Quad-Core has the best performance to accelerate the computational time. Moreover, Computer III with processor AMD A6-5200 APU Quad-Core is observed has higher speedup and efficiency. The speedup and efficiency of Computer III with number of grids 3200 are 3.716050530 times and 92.9% respectively.
A ℓ2, 1 norm regularized multi-kernel learning for false positive reduction in Lung nodule CAD.

PubMed

Cao, Peng; Liu, Xiaoli; Zhang, Jian; Li, Wei; Zhao, Dazhe; Huang, Min; Zaiane, Osmar

2017-03-01

The aim of this paper is to describe a novel algorithm for False Positive Reduction in lung nodule Computer Aided Detection(CAD). In this paper, we describes a new CT lung CAD method which aims to detect solid nodules. Specially, we proposed a multi-kernel classifier with a ℓ 2, 1 norm regularizer for heterogeneous feature fusion and selection from the feature subset level, and designed two efficient strategies to optimize the parameters of kernel weights in non-smooth ℓ 2, 1 regularized multiple kernel learning algorithm. The first optimization algorithm adapts a proximal gradient method for solving the ℓ 2, 1 norm of kernel weights, and use an accelerated method based on FISTA; the second one employs an iterative scheme based on an approximate gradient descent method. The results demonstrates that the FISTA-style accelerated proximal descent method is efficient for the ℓ 2, 1 norm formulation of multiple kernel learning with the theoretical guarantee of the convergence rate. Moreover, the experimental results demonstrate the effectiveness of the proposed methods in terms of Geometric mean (G-mean) and Area under the ROC curve (AUC), and significantly outperforms the competing methods. The proposed approach exhibits some remarkable advantages both in heterogeneous feature subsets fusion and classification phases. Compared with the fusion strategies of feature-level and decision level, the proposed ℓ 2, 1 norm multi-kernel learning algorithm is able to accurately fuse the complementary and heterogeneous feature sets, and automatically prune the irrelevant and redundant feature subsets to form a more discriminative feature set, leading a promising classification performance. Moreover, the proposed algorithm consistently outperforms the comparable classification approaches in the literature. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Accelerating global optimization of aerodynamic shapes using a new surrogate-assisted parallel genetic algorithm

NASA Astrophysics Data System (ADS)

Ebrahimi, Mehdi; Jahangirian, Alireza

2017-12-01

An efficient strategy is presented for global shape optimization of wing sections with a parallel genetic algorithm. Several computational techniques are applied to increase the convergence rate and the efficiency of the method. A variable fidelity computational evaluation method is applied in which the expensive Navier-Stokes flow solver is complemented by an inexpensive multi-layer perceptron neural network for the objective function evaluations. A population dispersion method that consists of two phases, of exploration and refinement, is developed to improve the convergence rate and the robustness of the genetic algorithm. Owing to the nature of the optimization problem, a parallel framework based on the master/slave approach is used. The outcomes indicate that the method is able to find the global optimum with significantly lower computational time in comparison to the conventional genetic algorithm.

Multi-objective optimal design of sandwich panels using a genetic algorithm

NASA Astrophysics Data System (ADS)

Xu, Xiaomei; Jiang, Yiping; Pueh Lee, Heow

2017-10-01

In this study, an optimization problem concerning sandwich panels is investigated by simultaneously considering the two objectives of minimizing the panel mass and maximizing the sound insulation performance. First of all, the acoustic model of sandwich panels is discussed, which provides a foundation to model the acoustic objective function. Then the optimization problem is formulated as a bi-objective programming model, and a solution algorithm based on the non-dominated sorting genetic algorithm II (NSGA-II) is provided to solve the proposed model. Finally, taking an example of a sandwich panel that is expected to be used as an automotive roof panel, numerical experiments are carried out to verify the effectiveness of the proposed model and solution algorithm. Numerical results demonstrate in detail how the core material, geometric constraints and mechanical constraints impact the optimal designs of sandwich panels.
A parameter estimation subroutine package

NASA Technical Reports Server (NTRS)

Bierman, G. J.; Nead, M. W.

1978-01-01

Linear least squares estimation and regression analyses continue to play a major role in orbit determination and related areas. A library of FORTRAN subroutines were developed to facilitate analyses of a variety of estimation problems. An easy to use, multi-purpose set of algorithms that are reasonably efficient and which use a minimal amount of computer storage are presented. Subroutine inputs, outputs, usage and listings are given, along with examples of how these routines can be used. The routines are compact and efficient and are far superior to the normal equation and Kalman filter data processing algorithms that are often used for least squares analyses.
Modular multiplication in GF(p) for public-key cryptography

NASA Astrophysics Data System (ADS)

Olszyna, Jakub

Modular multiplication forms the basis of modular exponentiation which is the core operation of the RSA cryptosystem. It is also present in many other cryptographic algorithms including those based on ECC and HECC. Hence, an efficient implementation of PKC relies on efficient implementation of modular multiplication. The paper presents a survey of most common algorithms for modular multiplication along with hardware architectures especially suitable for cryptographic applications in energy constrained environments. The motivation for studying low-power and areaefficient modular multiplication algorithms comes from enabling public-key security for ultra-low power devices that can perform under constrained environments like wireless sensor networks. Serial architectures for GF(p) are analyzed and presented. Finally proposed architectures are verified and compared according to the amount of power dissipated throughout the operation.
On the usefulness of gradient information in multi-objective deformable image registration using a B-spline-based dual-dynamic transformation model: comparison of three optimization algorithms

NASA Astrophysics Data System (ADS)

Pirpinia, Kleopatra; Bosman, Peter A. N.; Sonke, Jan-Jakob; van Herk, Marcel; Alderliesten, Tanja

2015-03-01

The use of gradient information is well-known to be highly useful in single-objective optimization-based image registration methods. However, its usefulness has not yet been investigated for deformable image registration from a multi-objective optimization perspective. To this end, within a previously introduced multi-objective optimization framework, we use a smooth B-spline-based dual-dynamic transformation model that allows us to derive gradient information analytically, while still being able to account for large deformations. Within the multi-objective framework, we previously employed a powerful evolutionary algorithm (EA) that computes and advances multiple outcomes at once, resulting in a set of solutions (a so-called Pareto front) that represents efficient trade-offs between the objectives. With the addition of the B-spline-based transformation model, we studied the usefulness of gradient information in multiobjective deformable image registration using three different optimization algorithms: the (gradient-less) EA, a gradientonly algorithm, and a hybridization of these two. We evaluated the algorithms to register highly deformed images: 2D MRI slices of the breast in prone and supine positions. Results demonstrate that gradient-based multi-objective optimization significantly speeds up optimization in the initial stages of optimization. However, allowing sufficient computational resources, better results could still be obtained with the EA. Ultimately, the hybrid EA found the best overall approximation of the optimal Pareto front, further indicating that adding gradient-based optimization for multiobjective optimization-based deformable image registration can indeed be beneficial
Production scheduling with ant colony optimization

NASA Astrophysics Data System (ADS)

Chernigovskiy, A. S.; Kapulin, D. V.; Noskova, E. E.; Yamskikh, T. N.; Tsarev, R. Yu

2017-10-01

The optimum solution of the production scheduling problem for manufacturing processes at an enterprise is crucial as it allows one to obtain the required amount of production within a specified time frame. Optimum production schedule can be found using a variety of optimization algorithms or scheduling algorithms. Ant colony optimization is one of well-known techniques to solve the global multi-objective optimization problem. In the article, the authors present a solution of the production scheduling problem by means of an ant colony optimization algorithm. A case study of the algorithm efficiency estimated against some others production scheduling algorithms is presented. Advantages of the ant colony optimization algorithm and its beneficial effect on the manufacturing process are provided.
A high performance load balance strategy for real-time multicore systems.

PubMed

Cho, Keng-Mao; Tsai, Chun-Wei; Chiu, Yi-Shiuan; Yang, Chu-Sing

2014-01-01

Finding ways to distribute workloads to each processor core and efficiently reduce power consumption is of vital importance, especially for real-time systems. In this paper, a novel scheduling algorithm is proposed for real-time multicore systems to balance the computation loads and save power. The developed algorithm simultaneously considers multiple criteria, a novel factor, and task deadline, and is called power and deadline-aware multicore scheduling (PDAMS). Experiment results show that the proposed algorithm can greatly reduce energy consumption by up to 54.2% and the deadline times missed, as compared to the other scheduling algorithms outlined in this paper.
featsel: A framework for benchmarking of feature selection algorithms and cost functions

NASA Astrophysics Data System (ADS)

Reis, Marcelo S.; Estrela, Gustavo; Ferreira, Carlos Eduardo; Barrera, Junior

In this paper, we introduce featsel, a framework for benchmarking of feature selection algorithms and cost functions. This framework allows the user to deal with the search space as a Boolean lattice and has its core coded in C++ for computational efficiency purposes. Moreover, featsel includes Perl scripts to add new algorithms and/or cost functions, generate random instances, plot graphs and organize results into tables. Besides, this framework already comes with dozens of algorithms and cost functions for benchmarking experiments. We also provide illustrative examples, in which featsel outperforms the popular Weka workbench in feature selection procedures on data sets from the UCI Machine Learning Repository.
A High Performance Load Balance Strategy for Real-Time Multicore Systems

PubMed Central

Cho, Keng-Mao; Tsai, Chun-Wei; Chiu, Yi-Shiuan; Yang, Chu-Sing

2014-01-01

Finding ways to distribute workloads to each processor core and efficiently reduce power consumption is of vital importance, especially for real-time systems. In this paper, a novel scheduling algorithm is proposed for real-time multicore systems to balance the computation loads and save power. The developed algorithm simultaneously considers multiple criteria, a novel factor, and task deadline, and is called power and deadline-aware multicore scheduling (PDAMS). Experiment results show that the proposed algorithm can greatly reduce energy consumption by up to 54.2% and the deadline times missed, as compared to the other scheduling algorithms outlined in this paper. PMID:24955382
GenoCore: A simple and fast algorithm for core subset selection from large genotype datasets.

PubMed

Jeong, Seongmun; Kim, Jae-Yoon; Jeong, Soon-Chun; Kang, Sung-Taeg; Moon, Jung-Kyung; Kim, Namshin

2017-01-01

Selecting core subsets from plant genotype datasets is important for enhancing cost-effectiveness and to shorten the time required for analyses of genome-wide association studies (GWAS), and genomics-assisted breeding of crop species, etc. Recently, a large number of genetic markers (>100,000 single nucleotide polymorphisms) have been identified from high-density single nucleotide polymorphism (SNP) arrays and next-generation sequencing (NGS) data. However, there is no software available for picking out the efficient and consistent core subset from such a huge dataset. It is necessary to develop software that can extract genetically important samples in a population with coherence. We here present a new program, GenoCore, which can find quickly and efficiently the core subset representing the entire population. We introduce simple measures of coverage and diversity scores, which reflect genotype errors and genetic variations, and can help to select a sample rapidly and accurately for crop genotype dataset. Comparison of our method to other core collection software using example datasets are performed to validate the performance according to genetic distance, diversity, coverage, required system resources, and the number of selected samples. GenoCore selects the smallest, most consistent, and most representative core collection from all samples, using less memory with more efficient scores, and shows greater genetic coverage compared to the other software tested. GenoCore was written in R language, and can be accessed online with an example dataset and test results at https://github.com/lovemun/Genocore.
An assessment of coupling algorithms for nuclear reactor core physics simulations

DOE PAGES

Hamilton, Steven; Berrill, Mark; Clarno, Kevin; ...

2016-04-01

This paper evaluates the performance of multiphysics coupling algorithms applied to a light water nuclear reactor core simulation. The simulation couples the k-eigenvalue form of the neutron transport equation with heat conduction and subchannel flow equations. We compare Picard iteration (block Gauss–Seidel) to Anderson acceleration and multiple variants of preconditioned Jacobian-free Newton–Krylov (JFNK). The performance of the methods are evaluated over a range of energy group structures and core power levels. A novel physics-based approximation to a Jacobian-vector product has been developed to mitigate the impact of expensive on-line cross section processing steps. Furthermore, numerical simulations demonstrating the efficiency ofmore » JFNK and Anderson acceleration relative to standard Picard iteration are performed on a 3D model of a nuclear fuel assembly. Both criticality (k-eigenvalue) and critical boron search problems are considered.« less
An assessment of coupling algorithms for nuclear reactor core physics simulations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hamilton, Steven; Berrill, Mark; Clarno, Kevin

This paper evaluates the performance of multiphysics coupling algorithms applied to a light water nuclear reactor core simulation. The simulation couples the k-eigenvalue form of the neutron transport equation with heat conduction and subchannel flow equations. We compare Picard iteration (block Gauss–Seidel) to Anderson acceleration and multiple variants of preconditioned Jacobian-free Newton–Krylov (JFNK). The performance of the methods are evaluated over a range of energy group structures and core power levels. A novel physics-based approximation to a Jacobian-vector product has been developed to mitigate the impact of expensive on-line cross section processing steps. Furthermore, numerical simulations demonstrating the efficiency ofmore » JFNK and Anderson acceleration relative to standard Picard iteration are performed on a 3D model of a nuclear fuel assembly. Both criticality (k-eigenvalue) and critical boron search problems are considered.« less
An assessment of coupling algorithms for nuclear reactor core physics simulations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hamilton, Steven, E-mail: hamiltonsp@ornl.gov; Berrill, Mark, E-mail: berrillma@ornl.gov; Clarno, Kevin, E-mail: clarnokt@ornl.gov

This paper evaluates the performance of multiphysics coupling algorithms applied to a light water nuclear reactor core simulation. The simulation couples the k-eigenvalue form of the neutron transport equation with heat conduction and subchannel flow equations. We compare Picard iteration (block Gauss–Seidel) to Anderson acceleration and multiple variants of preconditioned Jacobian-free Newton–Krylov (JFNK). The performance of the methods are evaluated over a range of energy group structures and core power levels. A novel physics-based approximation to a Jacobian-vector product has been developed to mitigate the impact of expensive on-line cross section processing steps. Numerical simulations demonstrating the efficiency of JFNKmore » and Anderson acceleration relative to standard Picard iteration are performed on a 3D model of a nuclear fuel assembly. Both criticality (k-eigenvalue) and critical boron search problems are considered.« less
Open-source software platform for medical image segmentation applications

NASA Astrophysics Data System (ADS)

Namías, R.; D'Amato, J. P.; del Fresno, M.

2017-11-01

Segmenting 2D and 3D images is a crucial and challenging problem in medical image analysis. Although several image segmentation algorithms have been proposed for different applications, no universal method currently exists. Moreover, their use is usually limited when detection of complex and multiple adjacent objects of interest is needed. In addition, the continually increasing volumes of medical imaging scans require more efficient segmentation software design and highly usable applications. In this context, we present an extension of our previous segmentation framework which allows the combination of existing explicit deformable models in an efficient and transparent way, handling simultaneously different segmentation strategies and interacting with a graphic user interface (GUI). We present the object-oriented design and the general architecture which consist of two layers: the GUI at the top layer, and the processing core filters at the bottom layer. We apply the framework for segmenting different real-case medical image scenarios on public available datasets including bladder and prostate segmentation from 2D MRI, and heart segmentation in 3D CT. Our experiments on these concrete problems show that this framework facilitates complex and multi-object segmentation goals while providing a fast prototyping open-source segmentation tool.
Communications protocol

NASA Technical Reports Server (NTRS)

Zhou, Xiaoming (Inventor); Baras, John S. (Inventor)

2010-01-01

The present invention relates to an improved communications protocol which increases the efficiency of transmission in return channels on a multi-channel slotted Alohas system by incorporating advanced error correction algorithms, selective retransmission protocols and the use of reserved channels to satisfy the retransmission requests.
Camouflage target reconnaissance based on hyperspectral imaging technology

NASA Astrophysics Data System (ADS)

Hua, Wenshen; Guo, Tong; Liu, Xun

2015-08-01

Efficient camouflaged target reconnaissance technology makes great influence on modern warfare. Hyperspectral images can provide large spectral range and high spectral resolution, which are invaluable in discriminating between camouflaged targets and backgrounds. Hyperspectral target detection and classification technology are utilized to achieve single class and multi-class camouflaged targets reconnaissance respectively. Constrained energy minimization (CEM), a widely used algorithm in hyperspectral target detection, is employed to achieve one class camouflage target reconnaissance. Then, support vector machine (SVM), a classification method, is proposed to achieve multi-class camouflage target reconnaissance. Experiments have been conducted to demonstrate the efficiency of the proposed method.
A frequency dependent preconditioned wavelet method for atmospheric tomography

NASA Astrophysics Data System (ADS)

Yudytskiy, Mykhaylo; Helin, Tapio; Ramlau, Ronny

2013-12-01

Atmospheric tomography, i.e. the reconstruction of the turbulence in the atmosphere, is a main task for the adaptive optics systems of the next generation telescopes. For extremely large telescopes, such as the European Extremely Large Telescope, this problem becomes overly complex and an efficient algorithm is needed to reduce numerical costs. Recently, a conjugate gradient method based on wavelet parametrization of turbulence layers was introduced [5]. An iterative algorithm can only be numerically efficient when the number of iterations required for a sufficient reconstruction is low. A way to achieve this is to design an efficient preconditioner. In this paper we propose a new frequency-dependent preconditioner for the wavelet method. In the context of a multi conjugate adaptive optics (MCAO) system simulated on the official end-to-end simulation tool OCTOPUS of the European Southern Observatory we demonstrate robustness and speed of the preconditioned algorithm. We show that three iterations are sufficient for a good reconstruction.
Hybrid Pareto artificial bee colony algorithm for multi-objective single machine group scheduling problem with sequence-dependent setup times and learning effects.

PubMed

Yue, Lei; Guan, Zailin; Saif, Ullah; Zhang, Fei; Wang, Hao

2016-01-01

Group scheduling is significant for efficient and cost effective production system. However, there exist setup times between the groups, which require to decrease it by sequencing groups in an efficient way. Current research is focused on a sequence dependent group scheduling problem with an aim to minimize the makespan in addition to minimize the total weighted tardiness simultaneously. In most of the production scheduling problems, the processing time of jobs is assumed as fixed. However, the actual processing time of jobs may be reduced due to "learning effect". The integration of sequence dependent group scheduling problem with learning effects has been rarely considered in literature. Therefore, current research considers a single machine group scheduling problem with sequence dependent setup times and learning effects simultaneously. A novel hybrid Pareto artificial bee colony algorithm (HPABC) with some steps of genetic algorithm is proposed for current problem to get Pareto solutions. Furthermore, five different sizes of test problems (small, small medium, medium, large medium, large) are tested using proposed HPABC. Taguchi method is used to tune the effective parameters of the proposed HPABC for each problem category. The performance of HPABC is compared with three famous multi objective optimization algorithms, improved strength Pareto evolutionary algorithm (SPEA2), non-dominated sorting genetic algorithm II (NSGAII) and particle swarm optimization algorithm (PSO). Results indicate that HPABC outperforms SPEA2, NSGAII and PSO and gives better Pareto optimal solutions in terms of diversity and quality for almost all the instances of the different sizes of problems.
JOMAR: Joint Operations with Mobile Autonomous Robots

DTIC Science & Technology

2015-12-21

AFRL-AFOSR-JP-TR-2015-0009 JOMAR: Joint Operations with Mobile Autonomous Robots Edwin Olson UNIVERSITY OF MICHIGAN Final Report 12/21/2015...SUBTITLE JOMAR: Joint Operations with Mobile Autonomous Robots 5a. CONTRACT NUMBER FA23861114024 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6...14. ABSTRACT Under this grant, we formulated and implemented a variety of novel algorithms that address core problems in multi- robot systems. These
Acoustic simulation in architecture with parallel algorithm

NASA Astrophysics Data System (ADS)

Li, Xiaohong; Zhang, Xinrong; Li, Dan

2004-03-01

In allusion to complexity of architecture environment and Real-time simulation of architecture acoustics, a parallel radiosity algorithm was developed. The distribution of sound energy in scene is solved with this method. And then the impulse response between sources and receivers at frequency segment, which are calculated with multi-process, are combined into whole frequency response. The numerical experiment shows that parallel arithmetic can improve the acoustic simulating efficiency of complex scene.
Parallel Algorithms for Monte Carlo Particle Transport Simulation on Exascale Computing Architectures

NASA Astrophysics Data System (ADS)

Romano, Paul Kollath

Monte Carlo particle transport methods are being considered as a viable option for high-fidelity simulation of nuclear reactors. While Monte Carlo methods offer several potential advantages over deterministic methods, there are a number of algorithmic shortcomings that would prevent their immediate adoption for full-core analyses. In this thesis, algorithms are proposed both to ameliorate the degradation in parallel efficiency typically observed for large numbers of processors and to offer a means of decomposing large tally data that will be needed for reactor analysis. A nearest-neighbor fission bank algorithm was proposed and subsequently implemented in the OpenMC Monte Carlo code. A theoretical analysis of the communication pattern shows that the expected cost is O( N ) whereas traditional fission bank algorithms are O(N) at best. The algorithm was tested on two supercomputers, the Intrepid Blue Gene/P and the Titan Cray XK7, and demonstrated nearly linear parallel scaling up to 163,840 processor cores on a full-core benchmark problem. An algorithm for reducing network communication arising from tally reduction was analyzed and implemented in OpenMC. The proposed algorithm groups only particle histories on a single processor into batches for tally purposes---in doing so it prevents all network communication for tallies until the very end of the simulation. The algorithm was tested, again on a full-core benchmark, and shown to reduce network communication substantially. A model was developed to predict the impact of load imbalances on the performance of domain decomposed simulations. The analysis demonstrated that load imbalances in domain decomposed simulations arise from two distinct phenomena: non-uniform particle densities and non-uniform spatial leakage. The dominant performance penalty for domain decomposition was shown to come from these physical effects rather than insufficient network bandwidth or high latency. The model predictions were verified with measured data from simulations in OpenMC on a full-core benchmark problem. Finally, a novel algorithm for decomposing large tally data was proposed, analyzed, and implemented/tested in OpenMC. The algorithm relies on disjoint sets of compute processes and tally servers. The analysis showed that for a range of parameters relevant to LWR analysis, the tally server algorithm should perform with minimal overhead. Tests were performed on Intrepid and Titan and demonstrated that the algorithm did indeed perform well over a wide range of parameters. (Copies available exclusively from MIT Libraries, libraries.mit.edu/docs - docs mit.edu)

Evaluating Multi-core Architectures through Accelerating the Three-Dimensional Lax–Wendroff Correction

DOE Office of Scientific and Technical Information (OSTI.GOV)

You, Yang; Fu, Haohuan; Song, Shuaiwen

2014-07-18

Wave propagation forward modeling is a widely used computational method in oil and gas exploration. The iterative stencil loops in such problems have broad applications in scientific computing. However, executing such loops can be highly time time-consuming, which greatly limits application’s performance and power efficiency. In this paper, we accelerate the forward modeling technique on the latest multi-core and many-core architectures such as Intel Sandy Bridge CPUs, NVIDIA Fermi C2070 GPU, NVIDIA Kepler K20x GPU, and the Intel Xeon Phi Co-processor. For the GPU platforms, we propose two parallel strategies to explore the performance optimization opportunities for our stencil kernels.more » For Sandy Bridge CPUs and MIC, we also employ various optimization techniques in order to achieve the best.« less
Phasor based single-molecule localization microscopy in 3D (pSMLM-3D): An algorithm for MHz localization rates using standard CPUs

NASA Astrophysics Data System (ADS)

Martens, Koen J. A.; Bader, Arjen N.; Baas, Sander; Rieger, Bernd; Hohlbein, Johannes

2018-03-01

We present a fast and model-free 2D and 3D single-molecule localization algorithm that allows more than 3 × 106 localizations per second to be calculated on a standard multi-core central processing unit with localization accuracies in line with the most accurate algorithms currently available. Our algorithm converts the region of interest around a point spread function to two phase vectors (phasors) by calculating the first Fourier coefficients in both the x- and y-direction. The angles of these phasors are used to localize the center of the single fluorescent emitter, and the ratio of the magnitudes of the two phasors is a measure for astigmatism, which can be used to obtain depth information (z-direction). Our approach can be used both as a stand-alone algorithm for maximizing localization speed and as a first estimator for more time consuming iterative algorithms.
A multi-stage heuristic algorithm for matching problem in the modified miniload automated storage and retrieval system of e-commerce

NASA Astrophysics Data System (ADS)

Wang, Wenrui; Wu, Yaohua; Wu, Yingying

2016-05-01

E-commerce, as an emerging marketing mode, has attracted more and more attention and gradually changed the way of our life. However, the existing layout of distribution centers can't fulfill the storage and picking demands of e-commerce sufficiently. In this paper, a modified miniload automated storage/retrieval system is designed to fit these new characteristics of e-commerce in logistics. Meanwhile, a matching problem, concerning with the improvement of picking efficiency in new system, is studied in this paper. The problem is how to reduce the travelling distance of totes between aisles and picking stations. A multi-stage heuristic algorithm is proposed based on statement and model of this problem. The main idea of this algorithm is, with some heuristic strategies based on similarity coefficients, minimizing the transportations of items which can not arrive in the destination picking stations just through direct conveyors. The experimental results based on the cases generated by computers show that the average reduced rate of indirect transport times can reach 14.36% with the application of multi-stage heuristic algorithm. For the cases from a real e-commerce distribution center, the order processing time can be reduced from 11.20 h to 10.06 h with the help of the modified system and the proposed algorithm. In summary, this research proposed a modified system and a multi-stage heuristic algorithm that can reduce the travelling distance of totes effectively and improve the whole performance of e-commerce distribution center.
A novel communication mechanism based on node potential multi-path routing

NASA Astrophysics Data System (ADS)

Bu, Youjun; Zhang, Chuanhao; Jiang, YiMing; Zhang, Zhen

2016-10-01

With the network scales rapidly and new network applications emerge frequently, bandwidth supply for today's Internet could not catch up with the rapid increasing requirements. Unfortunately, irrational using of network sources makes things worse. Actual network deploys single-next-hop optimization paths for data transmission, but such "best effort" model leads to the imbalance use of network resources and usually leads to local congestion. On the other hand Multi-path routing can use the aggregation bandwidth of multi paths efficiently and improve the robustness of network, security, load balancing and quality of service. As a result, multi-path has attracted much attention in the routing and switching research fields and many important ideas and solutions have been proposed. This paper focuses on implementing the parallel transmission of multi next-hop data, balancing the network traffic and reducing the congestion. It aimed at exploring the key technologies of the multi-path communication network, which could provide a feasible academic support for subsequent applications of multi-path communication networking. It proposed a novel multi-path algorithm based on node potential in the network. And the algorithm can fully use of the network link resource and effectively balance network link resource utilization.
A bio-inspired swarm robot coordination algorithm for multiple target searching

NASA Astrophysics Data System (ADS)

Meng, Yan; Gan, Jing; Desai, Sachi

2008-04-01

The coordination of a multi-robot system searching for multi targets is challenging under dynamic environment since the multi-robot system demands group coherence (agents need to have the incentive to work together faithfully) and group competence (agents need to know how to work together well). In our previous proposed bio-inspired coordination method, Local Interaction through Virtual Stigmergy (LIVS), one problem is the considerable randomness of the robot movement during coordination, which may lead to more power consumption and longer searching time. To address these issues, an adaptive LIVS (ALIVS) method is proposed in this paper, which not only considers the travel cost and target weight, but also predicting the target/robot ratio and potential robot redundancy with respect to the detected targets. Furthermore, a dynamic weight adjustment is also applied to improve the searching performance. This new method a truly distributed method where each robot makes its own decision based on its local sensing information and the information from its neighbors. Basically, each robot only communicates with its neighbors through a virtual stigmergy mechanism and makes its local movement decision based on a Particle Swarm Optimization (PSO) algorithm. The proposed ALIVS algorithm has been implemented on the embodied robot simulator, Player/Stage, in a searching target. The simulation results demonstrate the efficiency and robustness in a power-efficient manner with the real-world constraints.
Learning Incoherent Sparse and Low-Rank Patterns from Multiple Tasks

PubMed Central

Chen, Jianhui; Liu, Ji; Ye, Jieping

2013-01-01

We consider the problem of learning incoherent sparse and low-rank patterns from multiple tasks. Our approach is based on a linear multi-task learning formulation, in which the sparse and low-rank patterns are induced by a cardinality regularization term and a low-rank constraint, respectively. This formulation is non-convex; we convert it into its convex surrogate, which can be routinely solved via semidefinite programming for small-size problems. We propose to employ the general projected gradient scheme to efficiently solve such a convex surrogate; however, in the optimization formulation, the objective function is non-differentiable and the feasible domain is non-trivial. We present the procedures for computing the projected gradient and ensuring the global convergence of the projected gradient scheme. The computation of projected gradient involves a constrained optimization problem; we show that the optimal solution to such a problem can be obtained via solving an unconstrained optimization subproblem and an Euclidean projection subproblem. We also present two projected gradient algorithms and analyze their rates of convergence in details. In addition, we illustrate the use of the presented projected gradient algorithms for the proposed multi-task learning formulation using the least squares loss. Experimental results on a collection of real-world data sets demonstrate the effectiveness of the proposed multi-task learning formulation and the efficiency of the proposed projected gradient algorithms. PMID:24077658
Learning Incoherent Sparse and Low-Rank Patterns from Multiple Tasks.

PubMed

Chen, Jianhui; Liu, Ji; Ye, Jieping

2012-02-01

We consider the problem of learning incoherent sparse and low-rank patterns from multiple tasks. Our approach is based on a linear multi-task learning formulation, in which the sparse and low-rank patterns are induced by a cardinality regularization term and a low-rank constraint, respectively. This formulation is non-convex; we convert it into its convex surrogate, which can be routinely solved via semidefinite programming for small-size problems. We propose to employ the general projected gradient scheme to efficiently solve such a convex surrogate; however, in the optimization formulation, the objective function is non-differentiable and the feasible domain is non-trivial. We present the procedures for computing the projected gradient and ensuring the global convergence of the projected gradient scheme. The computation of projected gradient involves a constrained optimization problem; we show that the optimal solution to such a problem can be obtained via solving an unconstrained optimization subproblem and an Euclidean projection subproblem. We also present two projected gradient algorithms and analyze their rates of convergence in details. In addition, we illustrate the use of the presented projected gradient algorithms for the proposed multi-task learning formulation using the least squares loss. Experimental results on a collection of real-world data sets demonstrate the effectiveness of the proposed multi-task learning formulation and the efficiency of the proposed projected gradient algorithms.
Self-consistent predictor/corrector algorithms for stable and efficient integration of the time-dependent Kohn-Sham equation

NASA Astrophysics Data System (ADS)

Zhu, Ying; Herbert, John M.

2018-01-01

The "real time" formulation of time-dependent density functional theory (TDDFT) involves integration of the time-dependent Kohn-Sham (TDKS) equation in order to describe the time evolution of the electron density following a perturbation. This approach, which is complementary to the more traditional linear-response formulation of TDDFT, is more efficient for computation of broad-band spectra (including core-excited states) and for systems where the density of states is large. Integration of the TDKS equation is complicated by the time-dependent nature of the effective Hamiltonian, and we introduce several predictor/corrector algorithms to propagate the density matrix, one of which can be viewed as a self-consistent extension of the widely used modified-midpoint algorithm. The predictor/corrector algorithms facilitate larger time steps and are shown to be more efficient despite requiring more than one Fock build per time step, and furthermore can be used to detect a divergent simulation on-the-fly, which can then be halted or else the time step modified.
TACD: a transportable ant colony discrimination model for corporate bankruptcy prediction

NASA Astrophysics Data System (ADS)

Lalbakhsh, Pooia; Chen, Yi-Ping Phoebe

2017-05-01

This paper presents a transportable ant colony discrimination strategy (TACD) to predict corporate bankruptcy, a topic of vital importance that is attracting increasing interest in the field of economics. The proposed algorithm uses financial ratios to build a binary prediction model for companies with the two statuses of bankrupt and non-bankrupt. The algorithm takes advantage of an improved version of continuous ant colony optimisation (CACO) at the core, which is used to create an accurate, simple and understandable linear model for discrimination. This also enables the algorithm to work with continuous values, leading to more efficient learning and adaption by avoiding data discretisation. We conduct a comprehensive performance evaluation on three real-world data sets under a stratified cross-validation strategy. In three different scenarios, TACD is compared with 11 other bankruptcy prediction strategies. We also discuss the efficiency of the attribute selection methods used in the experiments. In addition to its simplicity and understandability, statistical significance tests prove the efficiency of TACD against the other prediction algorithms in both measures of AUC and accuracy.
Computing Bounds on Resource Levels for Flexible Plans

NASA Technical Reports Server (NTRS)

Muscvettola, Nicola; Rijsman, David

2009-01-01

A new algorithm efficiently computes the tightest exact bound on the levels of resources induced by a flexible activity plan (see figure). Tightness of bounds is extremely important for computations involved in planning because tight bounds can save potentially exponential amounts of search (through early backtracking and detection of solutions), relative to looser bounds. The bound computed by the new algorithm, denoted the resource-level envelope, constitutes the measure of maximum and minimum consumption of resources at any time for all fixed-time schedules in the flexible plan. At each time, the envelope guarantees that there are two fixed-time instantiations one that produces the minimum level and one that produces the maximum level. Therefore, the resource-level envelope is the tightest possible resource-level bound for a flexible plan because any tighter bound would exclude the contribution of at least one fixed-time schedule. If the resource- level envelope can be computed efficiently, one could substitute looser bounds that are currently used in the inner cores of constraint-posting scheduling algorithms, with the potential for great improvements in performance. What is needed to reduce the cost of computation is an algorithm, the measure of complexity of which is no greater than a low-degree polynomial in N (where N is the number of activities). The new algorithm satisfies this need. In this algorithm, the computation of resource-level envelopes is based on a novel combination of (1) the theory of shortest paths in the temporal-constraint network for the flexible plan and (2) the theory of maximum flows for a flow network derived from the temporal and resource constraints. The measure of asymptotic complexity of the algorithm is O(N O(maxflow(N)), where O(x) denotes an amount of computing time or a number of arithmetic operations proportional to a number of the order of x and O(maxflow(N)) is the measure of complexity (and thus of cost) of a maximumflow algorithm applied to an auxiliary flow network of 2N nodes. The algorithm is believed to be efficient in practice; experimental analysis shows the practical cost of maxflow to be as low as O(N1.5). The algorithm could be enhanced following at least two approaches. In the first approach, incremental subalgorithms for the computation of the envelope could be developed. By use of temporal scanning of the events in the temporal network, it may be possible to significantly reduce the size of the networks on which it is necessary to run the maximum-flow subalgorithm, thereby significantly reducing the time required for envelope calculation. In the second approach, the practical effectiveness of resource envelopes in the inner loops of search algorithms could be tested for multi-capacity resource scheduling. This testing would include inner-loop backtracking and termination tests and variable and value-ordering heuristics that exploit the properties of resource envelopes more directly.
Assessing the weighted multi-objective adaptive surrogate model optimization to derive large-scale reservoir operating rules with sensitivity analysis

NASA Astrophysics Data System (ADS)

Zhang, Jingwen; Wang, Xu; Liu, Pan; Lei, Xiaohui; Li, Zejun; Gong, Wei; Duan, Qingyun; Wang, Hao

2017-01-01

The optimization of large-scale reservoir system is time-consuming due to its intrinsic characteristics of non-commensurable objectives and high dimensionality. One way to solve the problem is to employ an efficient multi-objective optimization algorithm in the derivation of large-scale reservoir operating rules. In this study, the Weighted Multi-Objective Adaptive Surrogate Model Optimization (WMO-ASMO) algorithm is used. It consists of three steps: (1) simplifying the large-scale reservoir operating rules by the aggregation-decomposition model, (2) identifying the most sensitive parameters through multivariate adaptive regression splines (MARS) for dimensional reduction, and (3) reducing computational cost and speeding the searching process by WMO-ASMO, embedded with weighted non-dominated sorting genetic algorithm II (WNSGAII). The intercomparison of non-dominated sorting genetic algorithm (NSGAII), WNSGAII and WMO-ASMO are conducted in the large-scale reservoir system of Xijiang river basin in China. Results indicate that: (1) WNSGAII surpasses NSGAII in the median of annual power generation, increased by 1.03% (from 523.29 to 528.67 billion kW h), and the median of ecological index, optimized by 3.87% (from 1.879 to 1.809) with 500 simulations, because of the weighted crowding distance and (2) WMO-ASMO outperforms NSGAII and WNSGAII in terms of better solutions (annual power generation (530.032 billion kW h) and ecological index (1.675)) with 1000 simulations and computational time reduced by 25% (from 10 h to 8 h) with 500 simulations. Therefore, the proposed method is proved to be more efficient and could provide better Pareto frontier.
Micromagnetic measurement for characterization of ferromagnetic materials' microstructural properties

NASA Astrophysics Data System (ADS)

Zhang, Shuo; Shi, Xiaodong; Udpa, Lalita; Deng, Yiming

2018-05-01

Magnetic Barkhausen noise (MBN) is measured in low carbon steels and the relationship between carbon content and parameter extracted from MBN signal has been investigated. The parameter is extracted experimentally by fitting the original profiles with two Gaussian curves. The gap between two peaks (ΔG) of fitted Gaussian curves shows a better linear relationship with carbon contents of samples in the experiment. The result has been validated with simulation by Monte Carlo method. To ensure the sensitivity of measurement, advanced multi-objective optimization algorithm Non-dominant sorting genetic algorithm III (NSGA III) has been used to fulfill the optimization of the magnetic core of sensor.
A distributed-memory approximation algorithm for maximum weight perfect bipartite matching

DOE Office of Scientific and Technical Information (OSTI.GOV)

Azad, Ariful; Buluc, Aydin; Li, Xiaoye S.

We design and implement an efficient parallel approximation algorithm for the problem of maximum weight perfect matching in bipartite graphs, i.e. the problem of finding a set of non-adjacent edges that covers all vertices and has maximum weight. This problem differs from the maximum weight matching problem, for which scalable approximation algorithms are known. It is primarily motivated by finding good pivots in scalable sparse direct solvers before factorization where sequential implementations of maximum weight perfect matching algorithms, such as those available in MC64, are widely used due to the lack of scalable alternatives. To overcome this limitation, we proposemore » a fully parallel distributed memory algorithm that first generates a perfect matching and then searches for weightaugmenting cycles of length four in parallel and iteratively augments the matching with a vertex disjoint set of such cycles. For most practical problems the weights of the perfect matchings generated by our algorithm are very close to the optimum. An efficient implementation of the algorithm scales up to 256 nodes (17,408 cores) on a Cray XC40 supercomputer and can solve instances that are too large to be handled by a single node using the sequential algorithm.« less
Multi-cored vortices support function of slotted wing tips of birds in gliding and flapping flight

PubMed Central

2017-01-01

Slotted wing tips of birds are commonly considered an adaptation to improve soaring performance, despite their presence in species that neither soar nor glide. We used particle image velocimetry to measure the airflow around the slotted wing tip of a jackdaw (Corvus monedula) as well as in its wake during unrestrained flight in a wind tunnel. The separated primary feathers produce individual wakes, confirming a multi-slotted function, in both gliding and flapping flight. The resulting multi-cored wingtip vortex represents a spreading of vorticity, which has previously been suggested as indicative of increased aerodynamic efficiency. Considering benefits of the slotted wing tips that are specific to flapping flight combined with the wide phylogenetic occurrence of this configuration, we propose the hypothesis that slotted wings evolved initially to improve performance in powered flight. PMID:28539482
Exploring Manycore Multinode Systems for Irregular Applications with FPGA Prototyping

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ceriani, Marco; Palermo, Gianluca; Secchi, Simone

We present a prototype of a multi-core architecture implemented on FPGA, designed to enable efficient execution of irregular applications on distributed shared memory machines, while maintaining high performance on regular workloads. The architecture is composed of off-the-shelf soft-core cores, local interconnection and memory interface, integrated with custom components that optimize it for irregular applications. It relies on three key elements: a global address space, multithreading, and fine-grained synchronization. Global addresses are scrambled to reduce the formation of network hot-spots, while the latency of the transactions is covered by integrating an hardware scheduler within the custom load/store buffers to take advantagemore » from the availability of multiple executions threads, increasing the efficiency in a transparent way to the application. We evaluated a dual node system irregular kernels showing scalability in the number of cores and threads.« less
[Research on K-means clustering segmentation method for MRI brain image based on selecting multi-peaks in gray histogram].

PubMed

Chen, Zhaoxue; Yu, Haizhong; Chen, Hao

2013-12-01

To solve the problem of traditional K-means clustering in which initial clustering centers are selected randomly, we proposed a new K-means segmentation algorithm based on robustly selecting 'peaks' standing for White Matter, Gray Matter and Cerebrospinal Fluid in multi-peaks gray histogram of MRI brain image. The new algorithm takes gray value of selected histogram 'peaks' as the initial K-means clustering center and can segment the MRI brain image into three parts of tissue more effectively, accurately, steadily and successfully. Massive experiments have proved that the proposed algorithm can overcome many shortcomings caused by traditional K-means clustering method such as low efficiency, veracity, robustness and time consuming. The histogram 'peak' selecting idea of the proposed segmentootion method is of more universal availability.
Multi-sources data fusion framework for remote triage prioritization in telehealth.

PubMed

Salman, O H; Rasid, M F A; Saripan, M I; Subramaniam, S K

2014-09-01

The healthcare industry is streamlining processes to offer more timely and effective services to all patients. Computerized software algorithm and smart devices can streamline the relation between users and doctors by providing more services inside the healthcare telemonitoring systems. This paper proposes a multi-sources framework to support advanced healthcare applications. The proposed framework named Multi Sources Healthcare Architecture (MSHA) considers multi-sources: sensors (ECG, SpO2 and Blood Pressure) and text-based inputs from wireless and pervasive devices of Wireless Body Area Network. The proposed framework is used to improve the healthcare scalability efficiency by enhancing the remote triaging and remote prioritization processes for the patients. The proposed framework is also used to provide intelligent services over telemonitoring healthcare services systems by using data fusion method and prioritization technique. As telemonitoring system consists of three tiers (Sensors/ sources, Base station and Server), the simulation of the MSHA algorithm in the base station is demonstrated in this paper. The achievement of a high level of accuracy in the prioritization and triaging patients remotely, is set to be our main goal. Meanwhile, the role of multi sources data fusion in the telemonitoring healthcare services systems has been demonstrated. In addition to that, we discuss how the proposed framework can be applied in a healthcare telemonitoring scenario. Simulation results, for different symptoms relate to different emergency levels of heart chronic diseases, demonstrate the superiority of our algorithm compared with conventional algorithms in terms of classify and prioritize the patients remotely.
Path lumping: An efficient algorithm to identify metastable path channels for conformational dynamics of multi-body systems

NASA Astrophysics Data System (ADS)

Meng, Luming; Sheong, Fu Kit; Zeng, Xiangze; Zhu, Lizhe; Huang, Xuhui

2017-07-01

Constructing Markov state models from large-scale molecular dynamics simulation trajectories is a promising approach to dissect the kinetic mechanisms of complex chemical and biological processes. Combined with transition path theory, Markov state models can be applied to identify all pathways connecting any conformational states of interest. However, the identified pathways can be too complex to comprehend, especially for multi-body processes where numerous parallel pathways with comparable flux probability often coexist. Here, we have developed a path lumping method to group these parallel pathways into metastable path channels for analysis. We define the similarity between two pathways as the intercrossing flux between them and then apply the spectral clustering algorithm to lump these pathways into groups. We demonstrate the power of our method by applying it to two systems: a 2D-potential consisting of four metastable energy channels and the hydrophobic collapse process of two hydrophobic molecules. In both cases, our algorithm successfully reveals the metastable path channels. We expect this path lumping algorithm to be a promising tool for revealing unprecedented insights into the kinetic mechanisms of complex multi-body processes.
Deterministic implementations of single-photon multi-qubit Deutsch-Jozsa algorithms with linear optics

NASA Astrophysics Data System (ADS)

Wei, Hai-Rui; Liu, Ji-Zhen

2017-02-01

It is very important to seek an efficient and robust quantum algorithm demanding less quantum resources. We propose one-photon three-qubit original and refined Deutsch-Jozsa algorithms with polarization and two linear momentums degrees of freedom (DOFs). Our schemes are constructed by solely using linear optics. Compared to the traditional ones with one DOF, our schemes are more economic and robust because the necessary photons are reduced from three to one. Our linear-optic schemes are working in a determinate way, and they are feasible with current experimental technology.
Deterministic implementations of single-photon multi-qubit Deutsch–Jozsa algorithms with linear optics

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wei, Hai-Rui, E-mail: hrwei@ustb.edu.cn; Liu, Ji-Zhen

2017-02-15

It is very important to seek an efficient and robust quantum algorithm demanding less quantum resources. We propose one-photon three-qubit original and refined Deutsch–Jozsa algorithms with polarization and two linear momentums degrees of freedom (DOFs). Our schemes are constructed by solely using linear optics. Compared to the traditional ones with one DOF, our schemes are more economic and robust because the necessary photons are reduced from three to one. Our linear-optic schemes are working in a determinate way, and they are feasible with current experimental technology.

TomoPhantom, a software package to generate 2D-4D analytical phantoms for CT image reconstruction algorithm benchmarks

NASA Astrophysics Data System (ADS)

Kazantsev, Daniil; Pickalov, Valery; Nagella, Srikanth; Pasca, Edoardo; Withers, Philip J.

2018-01-01

In the field of computerized tomographic imaging, many novel reconstruction techniques are routinely tested using simplistic numerical phantoms, e.g. the well-known Shepp-Logan phantom. These phantoms cannot sufficiently cover the broad spectrum of applications in CT imaging where, for instance, smooth or piecewise-smooth 3D objects are common. TomoPhantom provides quick access to an external library of modular analytical 2D/3D phantoms with temporal extensions. In TomoPhantom, quite complex phantoms can be built using additive combinations of geometrical objects, such as, Gaussians, parabolas, cones, ellipses, rectangles and volumetric extensions of them. Newly designed phantoms are better suited for benchmarking and testing of different image processing techniques. Specifically, tomographic reconstruction algorithms which employ 2D and 3D scanning geometries, can be rigorously analyzed using the software. TomoPhantom also provides a capability of obtaining analytical tomographic projections which further extends the applicability of software towards more realistic, free from the "inverse crime" testing. All core modules of the package are written in the C-OpenMP language and wrappers for Python and MATLAB are provided to enable easy access. Due to C-based multi-threaded implementation, volumetric phantoms of high spatial resolution can be obtained with computational efficiency.
Modified Multi Prime RSA Cryptosystem

NASA Astrophysics Data System (ADS)

Ghazali Kamardan, M.; Aminudin, N.; Che-Him, Norziha; Sufahani, Suliadi; Khalid, Kamil; Roslan, Rozaini

2018-04-01

RSA [1] is one of the mostly used cryptosystem in securing data and information. Though, it has been recently discovered that RSA has some weaknesses and in advance technology, RSA is believed to be inefficient especially when it comes to decryption. Thus, a new algorithm called Multi prime RSA, an extended version of the standard RSA is studied. Then, a modification is made to the Multi prime RSA where another keys is shared secretly between the receiver and the sender to increase the securerity. As in RSA, the methodology used for modified Multi-prime RSA also consists of three phases; 1. Key Generation in which the secret and public keys are generated and published. In this phase, the secrecy is improved by adding more prime numbers and addition of secret keys. 2. Encryption of the message using the public and secret keys given. 3. Decryption of the secret message using the secret key generated. For the decryption phase, a method called Chinese Remainder Theorem is used which helps to fasten the computation. Since Multi prime RSA use more than two prime numbers, the algorithm is more efficient and secure when compared to the standard RSA. Furthermore, in modified Multi prime RSA another secret key is introduced to increase the obstacle to the attacker. Therefore, it is strongly believed that this new algorithm is better and can be an alternative to the RSA.
A Multi-Scale Algorithm for Graffito Advertisement Detection from Images of Real Estate

NASA Astrophysics Data System (ADS)

Yang, Jun; Zhu, Shi-Jiao

There is a significant need to detect and extract the graffito advertisement embedded in the housing images automatically. However, it is a hard job to separate the advertisement region well since housing images generally have complex background. In this paper, a detecting algorithm which uses multi-scale Gabor filters to identify graffito regions is proposed. Firstly, multi-scale Gabor filters with different directions are applied to housing images, then the approach uses these frequency data to find likely graffito regions using the relationship of different channels, it exploits the ability of different filters technique to solve the detection problem with low computational efforts. Lastly, the method is tested on several real estate images which are embedded graffito advertisement to verify its robustness and efficiency. The experiments demonstrate graffito regions can be detected quite well.
Scalable geocomputation: evolving an environmental model building platform from single-core to supercomputers

NASA Astrophysics Data System (ADS)

Schmitz, Oliver; de Jong, Kor; Karssenberg, Derek

2017-04-01

There is an increasing demand to run environmental models on a big scale: simulations over large areas at high resolution. The heterogeneity of available computing hardware such as multi-core CPUs, GPUs or supercomputer potentially provides significant computing power to fulfil this demand. However, this requires detailed knowledge of the underlying hardware, parallel algorithm design and the implementation thereof in an efficient system programming language. Domain scientists such as hydrologists or ecologists often lack this specific software engineering knowledge, their emphasis is (and should be) on exploratory building and analysis of simulation models. As a result, models constructed by domain specialists mostly do not take full advantage of the available hardware. A promising solution is to separate the model building activity from software engineering by offering domain specialists a model building framework with pre-programmed building blocks that they combine to construct a model. The model building framework, consequently, needs to have built-in capabilities to make full usage of the available hardware. Developing such a framework providing understandable code for domain scientists and being runtime efficient at the same time poses several challenges on developers of such a framework. For example, optimisations can be performed on individual operations or the whole model, or tasks need to be generated for a well-balanced execution without explicitly knowing the complexity of the domain problem provided by the modeller. Ideally, a modelling framework supports the optimal use of available hardware whichsoever combination of model building blocks scientists use. We demonstrate our ongoing work on developing parallel algorithms for spatio-temporal modelling and demonstrate 1) PCRaster, an environmental software framework (http://www.pcraster.eu) providing spatio-temporal model building blocks and 2) parallelisation of about 50 of these building blocks using the new Fern library (https://github.com/geoneric/fern/), an independent generic raster processing library. Fern is a highly generic software library and its algorithms can be configured according to the configuration of a modelling framework. With manageable programming effort (e.g. matching data types between programming and domain language) we created a binding between Fern and PCRaster. The resulting PCRaster Python multicore module can be used to execute existing PCRaster models without having to make any changes to the model code. We show initial results on synthetic and geoscientific models indicating significant runtime improvements provided by parallel local and focal operations. We further outline challenges in improving remaining algorithms such as flow operations over digital elevation maps and further potential improvements like enhancing disk I/O.
Robust and efficient overset grid assembly for partitioned unstructured meshes

NASA Astrophysics Data System (ADS)

Roget, Beatrice; Sitaraman, Jayanarayanan

2014-03-01

This paper presents a method to perform efficient and automated Overset Grid Assembly (OGA) on a system of overlapping unstructured meshes in a parallel computing environment where all meshes are partitioned into multiple mesh-blocks and processed on multiple cores. The main task of the overset grid assembler is to identify, in parallel, among all points in the overlapping mesh system, at which points the flow solution should be computed (field points), interpolated (receptor points), or ignored (hole points). Point containment search or donor search, an algorithm to efficiently determine the cell that contains a given point, is the core procedure necessary for accomplishing this task. Donor search is particularly challenging for partitioned unstructured meshes because of the complex irregular boundaries that are often created during partitioning.
Multi-AUV Target Search Based on Bioinspired Neurodynamics Model in 3-D Underwater Environments.

PubMed

Cao, Xiang; Zhu, Daqi; Yang, Simon X

2016-11-01

Target search in 3-D underwater environments is a challenge in multiple autonomous underwater vehicles (multi-AUVs) exploration. This paper focuses on an effective strategy for multi-AUV target search in the 3-D underwater environments with obstacles. First, the Dempster-Shafer theory of evidence is applied to extract information of environment from the sonar data to build a grid map of the underwater environments. Second, a topologically organized bioinspired neurodynamics model based on the grid map is constructed to represent the dynamic environment. The target globally attracts the AUVs through the dynamic neural activity landscape of the model, while the obstacles locally push the AUVs away to avoid collision. Finally, the AUVs plan their search path to the targets autonomously by a steepest gradient descent rule. The proposed algorithm deals with various situations, such as static targets search, dynamic targets search, and one or several AUVs break down in the 3-D underwater environments with obstacles. The simulation results show that the proposed algorithm is capable of guiding multi-AUV to achieve search task of multiple targets with higher efficiency and adaptability compared with other algorithms.
Nonlinear information fusion algorithms for data-efficient multi-fidelity modelling.

PubMed

Perdikaris, P; Raissi, M; Damianou, A; Lawrence, N D; Karniadakis, G E

2017-02-01

Multi-fidelity modelling enables accurate inference of quantities of interest by synergistically combining realizations of low-cost/low-fidelity models with a small set of high-fidelity observations. This is particularly effective when the low- and high-fidelity models exhibit strong correlations, and can lead to significant computational gains over approaches that solely rely on high-fidelity models. However, in many cases of practical interest, low-fidelity models can only be well correlated to their high-fidelity counterparts for a specific range of input parameters, and potentially return wrong trends and erroneous predictions if probed outside of their validity regime. Here we put forth a probabilistic framework based on Gaussian process regression and nonlinear autoregressive schemes that is capable of learning complex nonlinear and space-dependent cross-correlations between models of variable fidelity, and can effectively safeguard against low-fidelity models that provide wrong trends. This introduces a new class of multi-fidelity information fusion algorithms that provide a fundamental extension to the existing linear autoregressive methodologies, while still maintaining the same algorithmic complexity and overall computational cost. The performance of the proposed methods is tested in several benchmark problems involving both synthetic and real multi-fidelity datasets from computational fluid dynamics simulations.
Efficient Parallelization of a Dynamic Unstructured Application on the Tera MTA

NASA Technical Reports Server (NTRS)

Oliker, Leonid; Biswas, Rupak

1999-01-01

The success of parallel computing in solving real-life computationally-intensive problems relies on their efficient mapping and execution on large-scale multiprocessor architectures. Many important applications are both unstructured and dynamic in nature, making their efficient parallel implementation a daunting task. This paper presents the parallelization of a dynamic unstructured mesh adaptation algorithm using three popular programming paradigms on three leading supercomputers. We examine an MPI message-passing implementation on the Cray T3E and the SGI Origin2OOO, a shared-memory implementation using cache coherent nonuniform memory access (CC-NUMA) of the Origin2OOO, and a multi-threaded version on the newly-released Tera Multi-threaded Architecture (MTA). We compare several critical factors of this parallel code development, including runtime, scalability, programmability, and memory overhead. Our overall results demonstrate that multi-threaded systems offer tremendous potential for quickly and efficiently solving some of the most challenging real-life problems on parallel computers.
Efficient Machine Learning Approach for Optimizing Scientific Computing Applications on Emerging HPC Architectures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Arumugam, Kamesh

Efficient parallel implementations of scientific applications on multi-core CPUs with accelerators such as GPUs and Xeon Phis is challenging. This requires - exploiting the data parallel architecture of the accelerator along with the vector pipelines of modern x86 CPU architectures, load balancing, and efficient memory transfer between different devices. It is relatively easy to meet these requirements for highly structured scientific applications. In contrast, a number of scientific and engineering applications are unstructured. Getting performance on accelerators for these applications is extremely challenging because many of these applications employ irregular algorithms which exhibit data-dependent control-ow and irregular memory accesses. Furthermore,more » these applications are often iterative with dependency between steps, and thus making it hard to parallelize across steps. As a result, parallelism in these applications is often limited to a single step. Numerical simulation of charged particles beam dynamics is one such application where the distribution of work and memory access pattern at each time step is irregular. Applications with these properties tend to present significant branch and memory divergence, load imbalance between different processor cores, and poor compute and memory utilization. Prior research on parallelizing such irregular applications have been focused around optimizing the irregular, data-dependent memory accesses and control-ow during a single step of the application independent of the other steps, with the assumption that these patterns are completely unpredictable. We observed that the structure of computation leading to control-ow divergence and irregular memory accesses in one step is similar to that in the next step. It is possible to predict this structure in the current step by observing the computation structure of previous steps. In this dissertation, we present novel machine learning based optimization techniques to address the parallel implementation challenges of such irregular applications on different HPC architectures. In particular, we use supervised learning to predict the computation structure and use it to address the control-ow and memory access irregularities in the parallel implementation of such applications on GPUs, Xeon Phis, and heterogeneous architectures composed of multi-core CPUs with GPUs or Xeon Phis. We use numerical simulation of charged particles beam dynamics simulation as a motivating example throughout the dissertation to present our new approach, though they should be equally applicable to a wide range of irregular applications. The machine learning approach presented here use predictive analytics and forecasting techniques to adaptively model and track the irregular memory access pattern at each time step of the simulation to anticipate the future memory access pattern. Access pattern forecasts can then be used to formulate optimization decisions during application execution which improves the performance of the application at a future time step based on the observations from earlier time steps. In heterogeneous architectures, forecasts can also be used to improve the memory performance and resource utilization of all the processing units to deliver a good aggregate performance. We used these optimization techniques and anticipation strategy to design a cache-aware, memory efficient parallel algorithm to address the irregularities in the parallel implementation of charged particles beam dynamics simulation on different HPC architectures. Experimental result using a diverse mix of HPC architectures shows that our approach in using anticipation strategy is effective in maximizing data reuse, ensuring workload balance, minimizing branch and memory divergence, and in improving resource utilization.« less
Hyper thin 3D edge measurement of honeycomb core structures based on the triangular camera-projector layout & phase-based stereo matching.

PubMed

Jiang, Hongzhi; Zhao, Huijie; Li, Xudong; Quan, Chenggen

2016-03-07

We propose a novel hyper thin 3D edge measurement technique to measure the profile of 3D outer envelope of honeycomb core structures. The width of the edges of the honeycomb core is less than 0.1 mm. We introduce a triangular layout design consisting of two cameras and one projector to measure hyper thin 3D edges and eliminate data interference from the walls. A phase-shifting algorithm and the multi-frequency heterodyne phase-unwrapping principle are applied for phase retrievals on edges. A new stereo matching method based on phase mapping and epipolar constraint is presented to solve correspondence searching on the edges and remove false matches resulting in 3D outliers. Experimental results demonstrate the effectiveness of the proposed method for measuring the 3D profile of honeycomb core structures.
Blended-Wing-Body (BWB) Fuselage Structural Design for Weight Reduction

NASA Technical Reports Server (NTRS)

Mukhopadhyay, V.

2005-01-01

Structural analysis and design of efficient pressurized fuselage configurations for the advanced Blended-Wing-Body (BWB) flight vehicle is a challenging problem. Unlike a conventional cylindrical pressurized fuselage, stress level in a box type BWB fuselage is an order of magnitude higher, because internal pressure primarily results in bending stress instead of skin-membrane stress. In addition, resulting deformation of aerodynamic surface could significantly affect performance advantages provided by lifting body. The pressurized composite conformal multi-lobe tanks of X-33 type space vehicle also suffered from similar problem. In the earlier BWB design studies, Vaulted Ribbed Shell (VLRS), Flat Ribbed Shell (FRS); Vaulted shell Honeycomb Core (VLHC) and Flat sandwich shell Honeycomb Core (FLHC) concepts were studied. The flat and vaulted ribbed shell concepts were found most efficient. In a recent study, a set of composite sandwich panel and cross-ribbed panel were analyzed. Optimal values of rib and skin thickness, rib spacing, and panel depth were obtained for minimal weight under stress and buckling constraints. In addition, a set of efficient multi-bubble fuselage (MBF) configuration concept was developed. The special geometric configuration of this concept allows for balancing internal cabin pressure load efficiently, through membrane stress in inner-stiffened shell and inter-cabin walls, while the outer-ribbed shell prevents buckling due to external resultant compressive loads. The initial results from these approximate finite element analyses indicate progressively lower maximum stresses and deflections compared to the earlier study. However, a relative comparison of the FEM weight per unit floor area of the segment unit indicates that the unit weights are still relatively higher that the conventional B777 type cylindrical or A380 type elliptic fuselage design. Due to the manufacturing concern associated with multi-bubble fuselage, a Y braced box-type fuselage alternative with special resin-film injected (RFI) stitched carbon composite with foam-core was designed by Boeing under a NASA research contract for the 480 passenger version. It is shown that this configuration can be improved to a modified multi-bubble fuselage which has better stress distribution, for same material and dimension.
Axion string dynamics I: 2+1D

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fleury, Leesa M.; Moore, Guy D.

2016-05-03

If the axion exists and if the initial axion field value is uncorrelated at causally disconnected points, then it should be possible to predict the efficiency of cosmological axion production, relating the axionic dark matter density to the axion mass. The main obstacle to making this prediction is correctly treating the axion string cores. We develop a new algorithm for treating the axionic string cores correctly in 2+1 dimensions. When the axionic string cores are given their full physical string tension, axion production is about twice as efficient as in previous simulations. We argue that the string network in 2+1more » dimensions should behave very differently than in 3+1 dimensions, so this result cannot be simply carried over to the physical case. We outline how to extend our method to 3+1D axion string dynamics.« less
Scaled Runge-Kutta algorithms for handling dense output

NASA Technical Reports Server (NTRS)

Horn, M. K.

1981-01-01

Low order Runge-Kutta algorithms are developed which determine the solution of a system of ordinary differential equations at any point within a given integration step, as well as at the end of each step. The scaled Runge-Kutta methods are designed to be used with existing Runge-Kutta formulas, using the derivative evaluations of these defining algorithms as the core of the system. For a slight increase in computing time, the solution may be generated within the integration step, improving the efficiency of the Runge-Kutta algorithms, since the step length need no longer be severely reduced to coincide with the desired output point. Scaled Runge-Kutta algorithms are presented for orders 3 through 5, along with accuracy comparisons between the defining algorithms and their scaled versions for a test problem.
Convolution of large 3D images on GPU and its decomposition

NASA Astrophysics Data System (ADS)

Karas, Pavel; Svoboda, David

2011-12-01

In this article, we propose a method for computing convolution of large 3D images. The convolution is performed in a frequency domain using a convolution theorem. The algorithm is accelerated on a graphic card by means of the CUDA parallel computing model. Convolution is decomposed in a frequency domain using the decimation in frequency algorithm. We pay attention to keeping our approach efficient in terms of both time and memory consumption and also in terms of memory transfers between CPU and GPU which have a significant inuence on overall computational time. We also study the implementation on multiple GPUs and compare the results between the multi-GPU and multi-CPU implementations.
A multi-parametric particle-pairing algorithm for particle tracking in single and multiphase flows

NASA Astrophysics Data System (ADS)

Cardwell, Nicholas D.; Vlachos, Pavlos P.; Thole, Karen A.

2011-10-01

Multiphase flows (MPFs) offer a rich area of fundamental study with many practical applications. Examples of such flows range from the ingestion of foreign particulates in gas turbines to transport of particles within the human body. Experimental investigation of MPFs, however, is challenging, and requires techniques that simultaneously resolve both the carrier and discrete phases present in the flowfield. This paper presents a new multi-parametric particle-pairing algorithm for particle tracking velocimetry (MP3-PTV) in MPFs. MP3-PTV improves upon previous particle tracking algorithms by employing a novel variable pair-matching algorithm which utilizes displacement preconditioning in combination with estimated particle size and intensity to more effectively and accurately match particle pairs between successive images. To improve the method's efficiency, a new particle identification and segmentation routine was also developed. Validation of the new method was initially performed on two artificial data sets: a traditional single-phase flow published by the Visualization Society of Japan (VSJ) and an in-house generated MPF data set having a bi-modal distribution of particles diameters. Metrics of the measurement yield, reliability and overall tracking efficiency were used for method comparison. On the VSJ data set, the newly presented segmentation routine delivered a twofold improvement in identifying particles when compared to other published methods. For the simulated MPF data set, measurement efficiency of the carrier phases improved from 9% to 41% for MP3-PTV as compared to a traditional hybrid PTV. When employed on experimental data of a gas-solid flow, the MP3-PTV effectively identified the two particle populations and reported a vector efficiency and velocity measurement error comparable to measurements for the single-phase flow images. Simultaneous measurement of the dispersed particle and the carrier flowfield velocities allowed for the calculation of instantaneous particle slip velocities, illustrating the algorithm's strength to robustly and accurately resolve polydispersed MPFs.
Parallel deterministic transport sweeps of structured and unstructured meshes with overloaded mesh decompositions

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pautz, Shawn D.; Bailey, Teresa S.

Here, the efficiency of discrete ordinates transport sweeps depends on the scheduling algorithm, the domain decomposition, the problem to be solved, and the computational platform. Sweep scheduling algorithms may be categorized by their approach to several issues. In this paper we examine the strategy of domain overloading for mesh partitioning as one of the components of such algorithms. In particular, we extend the domain overloading strategy, previously defined and analyzed for structured meshes, to the general case of unstructured meshes. We also present computational results for both the structured and unstructured domain overloading cases. We find that an appropriate amountmore » of domain overloading can greatly improve the efficiency of parallel sweeps for both structured and unstructured partitionings of the test problems examined on up to 10 5 processor cores.« less
Parallel deterministic transport sweeps of structured and unstructured meshes with overloaded mesh decompositions

DOE PAGES

Pautz, Shawn D.; Bailey, Teresa S.

2016-11-29

Here, the efficiency of discrete ordinates transport sweeps depends on the scheduling algorithm, the domain decomposition, the problem to be solved, and the computational platform. Sweep scheduling algorithms may be categorized by their approach to several issues. In this paper we examine the strategy of domain overloading for mesh partitioning as one of the components of such algorithms. In particular, we extend the domain overloading strategy, previously defined and analyzed for structured meshes, to the general case of unstructured meshes. We also present computational results for both the structured and unstructured domain overloading cases. We find that an appropriate amountmore » of domain overloading can greatly improve the efficiency of parallel sweeps for both structured and unstructured partitionings of the test problems examined on up to 10 5 processor cores.« less
Chance-constrained multi-objective optimization of groundwater remediation design at DNAPLs-contaminated sites using a multi-algorithm genetically adaptive method.

PubMed

Ouyang, Qi; Lu, Wenxi; Hou, Zeyu; Zhang, Yu; Li, Shuai; Luo, Jiannan

2017-05-01

In this paper, a multi-algorithm genetically adaptive multi-objective (AMALGAM) method is proposed as a multi-objective optimization solver. It was implemented in the multi-objective optimization of a groundwater remediation design at sites contaminated by dense non-aqueous phase liquids. In this study, there were two objectives: minimization of the total remediation cost, and minimization of the remediation time. A non-dominated sorting genetic algorithm II (NSGA-II) was adopted to compare with the proposed method. For efficiency, the time-consuming surfactant-enhanced aquifer remediation simulation model was replaced by a surrogate model constructed by a multi-gene genetic programming (MGGP) technique. Similarly, two other surrogate modeling methods-support vector regression (SVR) and Kriging (KRG)-were employed to make comparisons with MGGP. In addition, the surrogate-modeling uncertainty was incorporated in the optimization model by chance-constrained programming (CCP). The results showed that, for the problem considered in this study, (1) the solutions obtained by AMALGAM incurred less remediation cost and required less time than those of NSGA-II, indicating that AMALGAM outperformed NSGA-II. It was additionally shown that (2) the MGGP surrogate model was more accurate than SVR and KRG; and (3) the remediation cost and time increased with the confidence level, which can enable decision makers to make a suitable choice by considering the given budget, remediation time, and reliability. Copyright © 2017 Elsevier B.V. All rights reserved.
Large Scale Frequent Pattern Mining using MPI One-Sided Model

DOE Office of Scientific and Technical Information (OSTI.GOV)

Vishnu, Abhinav; Agarwal, Khushbu

In this paper, we propose a work-stealing runtime --- Library for Work Stealing LibWS --- using MPI one-sided model for designing scalable FP-Growth --- {\\em de facto} frequent pattern mining algorithm --- on large scale systems. LibWS provides locality efficient and highly scalable work-stealing techniques for load balancing on a variety of data distributions. We also propose a novel communication algorithm for FP-growth data exchange phase, which reduces the communication complexity from state-of-the-art O(p) to O(f + p/f) for p processes and f frequent attributed-ids. FP-Growth is implemented using LibWS and evaluated on several work distributions and support counts. Anmore » experimental evaluation of the FP-Growth on LibWS using 4096 processes on an InfiniBand Cluster demonstrates excellent efficiency for several work distributions (87\\% efficiency for Power-law and 91% for Poisson). The proposed distributed FP-Tree merging algorithm provides 38x communication speedup on 4096 cores.« less
Libpsht - algorithms for efficient spherical harmonic transforms

NASA Astrophysics Data System (ADS)

Reinecke, M.

2011-02-01

Libpsht (or "library for performant spherical harmonic transforms") is a collection of algorithms for efficient conversion between spatial-domain and spectral-domain representations of data defined on the sphere. The package supports both transforms of scalars and spin-1 and spin-2 quantities, and can be used for a wide range of pixelisations (including HEALPix, GLESP, and ECP). It will take advantage of hardware features such as multiple processor cores and floating-point vector operations, if available. Even without this additional acceleration, the employed algorithms are among the most efficient (in terms of CPU time, as well as memory consumption) currently being used in the astronomical community. The library is written in strictly standard-conforming C90, ensuring portability to many different hard- and software platforms, and allowing straightforward integration with codes written in various programming languages like C, C++, Fortran, Python etc. Libpsht is distributed under the terms of the GNU General Public License (GPL) version 2 and can be downloaded from .

Some links on this page may take you to non-federal websites. Their policies may differ from this site.