efficient computational algorithms: Topics by Science.gov

Sample records for efficient computational algorithms

Computationally efficient multibody simulations

NASA Technical Reports Server (NTRS)

Ramakrishnan, Jayant; Kumar, Manoj

1994-01-01

Computationally efficient approaches to the solution of the dynamics of multibody systems are presented in this work. The computational efficiency is derived from both the algorithmic and implementational standpoint. Order(n) approaches provide a new formulation of the equations of motion eliminating the assembly and numerical inversion of a system mass matrix as required by conventional algorithms. Computational efficiency is also gained in the implementation phase by the symbolic processing and parallel implementation of these equations. Comparison of this algorithm with existing multibody simulation programs illustrates the increased computational efficiency.
CCOMP: An efficient algorithm for complex roots computation of determinantal equations

NASA Astrophysics Data System (ADS)

Zouros, Grigorios P.

2018-01-01

In this paper a free Python algorithm, entitled CCOMP (Complex roots COMPutation), is developed for the efficient computation of complex roots of determinantal equations inside a prescribed complex domain. The key to the method presented is the efficient determination of the candidate points inside the domain which, in their close neighborhood, a complex root may lie. Once these points are detected, the algorithm proceeds to a two-dimensional minimization problem with respect to the minimum modulus eigenvalue of the system matrix. In the core of CCOMP exist three sub-algorithms whose tasks are the efficient estimation of the minimum modulus eigenvalues of the system matrix inside the prescribed domain, the efficient computation of candidate points which guarantee the existence of minima, and finally, the computation of minima via bound constrained minimization algorithms. Theoretical results and heuristics support the development and the performance of the algorithm, which is discussed in detail. CCOMP supports general complex matrices, and its efficiency, applicability and validity is demonstrated to a variety of microwave applications.
Efficient conjugate gradient algorithms for computation of the manipulator forward dynamics

NASA Technical Reports Server (NTRS)

Fijany, Amir; Scheid, Robert E.

1989-01-01

The applicability of conjugate gradient algorithms for computation of the manipulator forward dynamics is investigated. The redundancies in the previously proposed conjugate gradient algorithm are analyzed. A new version is developed which, by avoiding these redundancies, achieves a significantly greater efficiency. A preconditioned conjugate gradient algorithm is also presented. A diagonal matrix whose elements are the diagonal elements of the inertia matrix is proposed as the preconditioner. In order to increase the computational efficiency, an algorithm is developed which exploits the synergism between the computation of the diagonal elements of the inertia matrix and that required by the conjugate gradient algorithm.
An efficient and accurate 3D displacements tracking strategy for digital volume correlation

NASA Astrophysics Data System (ADS)

Pan, Bing; Wang, Bo; Wu, Dafang; Lubineau, Gilles

2014-07-01

Owing to its inherent computational complexity, practical implementation of digital volume correlation (DVC) for internal displacement and strain mapping faces important challenges in improving its computational efficiency. In this work, an efficient and accurate 3D displacement tracking strategy is proposed for fast DVC calculation. The efficiency advantage is achieved by using three improvements. First, to eliminate the need of updating Hessian matrix in each iteration, an efficient 3D inverse compositional Gauss-Newton (3D IC-GN) algorithm is introduced to replace existing forward additive algorithms for accurate sub-voxel displacement registration. Second, to ensure the 3D IC-GN algorithm that converges accurately and rapidly and avoid time-consuming integer-voxel displacement searching, a generalized reliability-guided displacement tracking strategy is designed to transfer accurate and complete initial guess of deformation for each calculation point from its computed neighbors. Third, to avoid the repeated computation of sub-voxel intensity interpolation coefficients, an interpolation coefficient lookup table is established for tricubic interpolation. The computational complexity of the proposed fast DVC and the existing typical DVC algorithms are first analyzed quantitatively according to necessary arithmetic operations. Then, numerical tests are performed to verify the performance of the fast DVC algorithm in terms of measurement accuracy and computational efficiency. The experimental results indicate that, compared with the existing DVC algorithm, the presented fast DVC algorithm produces similar precision and slightly higher accuracy at a substantially reduced computational cost.
A class of parallel algorithms for computation of the manipulator inertia matrix

NASA Technical Reports Server (NTRS)

Fijany, Amir; Bejczy, Antal K.

1989-01-01

Parallel and parallel/pipeline algorithms for computation of the manipulator inertia matrix are presented. An algorithm based on composite rigid-body spatial inertia method, which provides better features for parallelization, is used for the computation of the inertia matrix. Two parallel algorithms are developed which achieve the time lower bound in computation. Also described is the mapping of these algorithms with topological variation on a two-dimensional processor array, with nearest-neighbor connection, and with cardinality variation on a linear processor array. An efficient parallel/pipeline algorithm for the linear array was also developed, but at significantly higher efficiency.
An efficient parallel termination detection algorithm

DOE Office of Scientific and Technical Information (OSTI.GOV)

Baker, A. H.; Crivelli, S.; Jessup, E. R.

2004-05-27

Information local to any one processor is insufficient to monitor the overall progress of most distributed computations. Typically, a second distributed computation for detecting termination of the main computation is necessary. In order to be a useful computational tool, the termination detection routine must operate concurrently with the main computation, adding minimal overhead, and it must promptly and correctly detect termination when it occurs. In this paper, we present a new algorithm for detecting the termination of a parallel computation on distributed-memory MIMD computers that satisfies all of those criteria. A variety of termination detection algorithms have been devised. Ofmore » these, the algorithm presented by Sinha, Kale, and Ramkumar (henceforth, the SKR algorithm) is unique in its ability to adapt to the load conditions of the system on which it runs, thereby minimizing the impact of termination detection on performance. Because their algorithm also detects termination quickly, we consider it to be the most efficient practical algorithm presently available. The termination detection algorithm presented here was developed for use in the PMESC programming library for distributed-memory MIMD computers. Like the SKR algorithm, our algorithm adapts to system loads and imposes little overhead. Also like the SKR algorithm, ours is tree-based, and it does not depend on any assumptions about the physical interconnection topology of the processors or the specifics of the distributed computation. In addition, our algorithm is easier to implement and requires only half as many tree traverses as does the SKR algorithm. This paper is organized as follows. In section 2, we define our computational model. In section 3, we review the SKR algorithm. We introduce our new algorithm in section 4, and prove its correctness in section 5. We discuss its efficiency and present experimental results in section 6.« less
Gradient gravitational search: An efficient metaheuristic algorithm for global optimization.

PubMed

Dash, Tirtharaj; Sahu, Prabhat K

2015-05-30

The adaptation of novel techniques developed in the field of computational chemistry to solve the concerned problems for large and flexible molecules is taking the center stage with regard to efficient algorithm, computational cost and accuracy. In this article, the gradient-based gravitational search (GGS) algorithm, using analytical gradients for a fast minimization to the next local minimum has been reported. Its efficiency as metaheuristic approach has also been compared with Gradient Tabu Search and others like: Gravitational Search, Cuckoo Search, and Back Tracking Search algorithms for global optimization. Moreover, the GGS approach has also been applied to computational chemistry problems for finding the minimal value potential energy of two-dimensional and three-dimensional off-lattice protein models. The simulation results reveal the relative stability and physical accuracy of protein models with efficient computational cost. © 2015 Wiley Periodicals, Inc.
Efficient convolutional sparse coding

DOEpatents

Wohlberg, Brendt

2017-06-20

Computationally efficient algorithms may be applied for fast dictionary learning solving the convolutional sparse coding problem in the Fourier domain. More specifically, efficient convolutional sparse coding may be derived within an alternating direction method of multipliers (ADMM) framework that utilizes fast Fourier transforms (FFT) to solve the main linear system in the frequency domain. Such algorithms may enable a significant reduction in computational cost over conventional approaches by implementing a linear solver for the most critical and computationally expensive component of the conventional iterative algorithm. The theoretical computational cost of the algorithm may be reduced from O(M.sup.3N) to O(MN log N), where N is the dimensionality of the data and M is the number of elements in the dictionary. This significant improvement in efficiency may greatly increase the range of problems that can practically be addressed via convolutional sparse representations.
Efficient FFT Algorithm for Psychoacoustic Model of the MPEG-4 AAC

NASA Astrophysics Data System (ADS)

Lee, Jae-Seong; Lee, Chang-Joon; Park, Young-Cheol; Youn, Dae-Hee

This paper proposes an efficient FFT algorithm for the Psycho-Acoustic Model (PAM) of MPEG-4 AAC. The proposed algorithm synthesizes FFT coefficients using MDCT and MDST coefficients through circular convolution. The complexity of the MDCT and MDST coefficients is approximately half of the original FFT. We also design a new PAM based on the proposed FFT algorithm, which has 15% lower computational complexity than the original PAM without degradation of sound quality. Subjective as well as objective test results are presented to confirm the efficiency of the proposed FFT computation algorithm and the PAM.
Computing rank-revealing QR factorizations of dense matrices.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bischof, C. H.; Quintana-Orti, G.; Mathematics and Computer Science

1998-06-01

We develop algorithms and implementations for computing rank-revealing QR (RRQR) factorizations of dense matrices. First, we develop an efficient block algorithm for approximating an RRQR factorization, employing a windowed version of the commonly used Golub pivoting strategy, aided by incremental condition estimation. Second, we develop efficiently implementable variants of guaranteed reliable RRQR algorithms for triangular matrices originally suggested by Chandrasekaran and Ipsen and by Pan and Tang. We suggest algorithmic improvements with respect to condition estimation, termination criteria, and Givens updating. By combining the block algorithm with one of the triangular postprocessing steps, we arrive at an efficient and reliablemore » algorithm for computing an RRQR factorization of a dense matrix. Experimental results on IBM RS/6000 SGI R8000 platforms show that this approach performs up to three times faster that the less reliable QR factorization with column pivoting as it is currently implemented in LAPACK, and comes within 15% of the performance of the LAPACK block algorithm for computing a QR factorization without any column exchanges. Thus, we expect this routine to be useful in may circumstances where numerical rank deficiency cannot be ruled out, but currently has been ignored because of the computational cost of dealing with it.« less
Efficient shortest-path-tree computation in network routing based on pulse-coupled neural networks.

PubMed

Qu, Hong; Yi, Zhang; Yang, Simon X

2013-06-01

Shortest path tree (SPT) computation is a critical issue for routers using link-state routing protocols, such as the most commonly used open shortest path first and intermediate system to intermediate system. Each router needs to recompute a new SPT rooted from itself whenever a change happens in the link state. Most commercial routers do this computation by deleting the current SPT and building a new one using static algorithms such as the Dijkstra algorithm at the beginning. Such recomputation of an entire SPT is inefficient, which may consume a considerable amount of CPU time and result in a time delay in the network. Some dynamic updating methods using the information in the updated SPT have been proposed in recent years. However, there are still many limitations in those dynamic algorithms. In this paper, a new modified model of pulse-coupled neural networks (M-PCNNs) is proposed for the SPT computation. It is rigorously proved that the proposed model is capable of solving some optimization problems, such as the SPT. A static algorithm is proposed based on the M-PCNNs to compute the SPT efficiently for large-scale problems. In addition, a dynamic algorithm that makes use of the structure of the previously computed SPT is proposed, which significantly improves the efficiency of the algorithm. Simulation results demonstrate the effective and efficient performance of the proposed approach.
An Adaptive Evolutionary Algorithm for Traveling Salesman Problem with Precedence Constraints

PubMed Central

Sung, Jinmo; Jeong, Bongju

2014-01-01

Traveling sales man problem with precedence constraints is one of the most notorious problems in terms of the efficiency of its solution approach, even though it has very wide range of industrial applications. We propose a new evolutionary algorithm to efficiently obtain good solutions by improving the search process. Our genetic operators guarantee the feasibility of solutions over the generations of population, which significantly improves the computational efficiency even when it is combined with our flexible adaptive searching strategy. The efficiency of the algorithm is investigated by computational experiments. PMID:24701158
An adaptive evolutionary algorithm for traveling salesman problem with precedence constraints.

PubMed

Sung, Jinmo; Jeong, Bongju

2014-01-01

Traveling sales man problem with precedence constraints is one of the most notorious problems in terms of the efficiency of its solution approach, even though it has very wide range of industrial applications. We propose a new evolutionary algorithm to efficiently obtain good solutions by improving the search process. Our genetic operators guarantee the feasibility of solutions over the generations of population, which significantly improves the computational efficiency even when it is combined with our flexible adaptive searching strategy. The efficiency of the algorithm is investigated by computational experiments.
DANoC: An Efficient Algorithm and Hardware Codesign of Deep Neural Networks on Chip.

PubMed

Zhou, Xichuan; Li, Shengli; Tang, Fang; Hu, Shengdong; Lin, Zhi; Zhang, Lei

2017-07-18

Deep neural networks (NNs) are the state-of-the-art models for understanding the content of images and videos. However, implementing deep NNs in embedded systems is a challenging task, e.g., a typical deep belief network could exhaust gigabytes of memory and result in bandwidth and computational bottlenecks. To address this challenge, this paper presents an algorithm and hardware codesign for efficient deep neural computation. A hardware-oriented deep learning algorithm, named the deep adaptive network, is proposed to explore the sparsity of neural connections. By adaptively removing the majority of neural connections and robustly representing the reserved connections using binary integers, the proposed algorithm could save up to 99.9% memory utility and computational resources without undermining classification accuracy. An efficient sparse-mapping-memory-based hardware architecture is proposed to fully take advantage of the algorithmic optimization. Different from traditional Von Neumann architecture, the deep-adaptive network on chip (DANoC) brings communication and computation in close proximity to avoid power-hungry parameter transfers between on-board memory and on-chip computational units. Experiments over different image classification benchmarks show that the DANoC system achieves competitively high accuracy and efficiency comparing with the state-of-the-art approaches.
Dynamic Load-Balancing for Distributed Heterogeneous Computing of Parallel CFD Problems

NASA Technical Reports Server (NTRS)

Ecer, A.; Chien, Y. P.; Boenisch, T.; Akay, H. U.

2000-01-01

The developed methodology is aimed at improving the efficiency of executing block-structured algorithms on parallel, distributed, heterogeneous computers. The basic approach of these algorithms is to divide the flow domain into many sub- domains called blocks, and solve the governing equations over these blocks. Dynamic load balancing problem is defined as the efficient distribution of the blocks among the available processors over a period of several hours of computations. In environments with computers of different architecture, operating systems, CPU speed, memory size, load, and network speed, balancing the loads and managing the communication between processors becomes crucial. Load balancing software tools for mutually dependent parallel processes have been created to efficiently utilize an advanced computation environment and algorithms. These tools are dynamic in nature because of the chances in the computer environment during execution time. More recently, these tools were extended to a second operating system: NT. In this paper, the problems associated with this application will be discussed. Also, the developed algorithms were combined with the load sharing capability of LSF to efficiently utilize workstation clusters for parallel computing. Finally, results will be presented on running a NASA based code ADPAC to demonstrate the developed tools for dynamic load balancing.
Computationally efficient algorithm for Gaussian Process regression in case of structured samples

NASA Astrophysics Data System (ADS)

Belyaev, M.; Burnaev, E.; Kapushev, Y.

2016-04-01

Surrogate modeling is widely used in many engineering problems. Data sets often have Cartesian product structure (for instance factorial design of experiments with missing points). In such case the size of the data set can be very large. Therefore, one of the most popular algorithms for approximation-Gaussian Process regression-can be hardly applied due to its computational complexity. In this paper a computationally efficient approach for constructing Gaussian Process regression in case of data sets with Cartesian product structure is presented. Efficiency is achieved by using a special structure of the data set and operations with tensors. Proposed algorithm has low computational as well as memory complexity compared to existing algorithms. In this work we also introduce a regularization procedure allowing to take into account anisotropy of the data set and avoid degeneracy of regression model.
Parallel Directionally Split Solver Based on Reformulation of Pipelined Thomas Algorithm

NASA Technical Reports Server (NTRS)

Povitsky, A.

1998-01-01

In this research an efficient parallel algorithm for 3-D directionally split problems is developed. The proposed algorithm is based on a reformulated version of the pipelined Thomas algorithm that starts the backward step computations immediately after the completion of the forward step computations for the first portion of lines This algorithm has data available for other computational tasks while processors are idle from the Thomas algorithm. The proposed 3-D directionally split solver is based on the static scheduling of processors where local and non-local, data-dependent and data-independent computations are scheduled while processors are idle. A theoretical model of parallelization efficiency is used to define optimal parameters of the algorithm, to show an asymptotic parallelization penalty and to obtain an optimal cover of a global domain with subdomains. It is shown by computational experiments and by the theoretical model that the proposed algorithm reduces the parallelization penalty about two times over the basic algorithm for the range of the number of processors (subdomains) considered and the number of grid nodes per subdomain.
An efficient parallel algorithm for the solution of a tridiagonal linear system of equations

NASA Technical Reports Server (NTRS)

Stone, H. S.

1971-01-01

Tridiagonal linear systems of equations are solved on conventional serial machines in a time proportional to N, where N is the number of equations. The conventional algorithms do not lend themselves directly to parallel computations on computers of the ILLIAC IV class, in the sense that they appear to be inherently serial. An efficient parallel algorithm is presented in which computation time grows as log sub 2 N. The algorithm is based on recursive doubling solutions of linear recurrence relations, and can be used to solve recurrence relations of all orders.
Efficient and Flexible Computation of Many-Electron Wave Function Overlaps.

PubMed

Plasser, Felix; Ruckenbauer, Matthias; Mai, Sebastian; Oppel, Markus; Marquetand, Philipp; González, Leticia

2016-03-08

A new algorithm for the computation of the overlap between many-electron wave functions is described. This algorithm allows for the extensive use of recurring intermediates and thus provides high computational efficiency. Because of the general formalism employed, overlaps can be computed for varying wave function types, molecular orbitals, basis sets, and molecular geometries. This paves the way for efficiently computing nonadiabatic interaction terms for dynamics simulations. In addition, other application areas can be envisaged, such as the comparison of wave functions constructed at different levels of theory. Aside from explaining the algorithm and evaluating the performance, a detailed analysis of the numerical stability of wave function overlaps is carried out, and strategies for overcoming potential severe pitfalls due to displaced atoms and truncated wave functions are presented.
I/O-Efficient Scientific Computation Using TPIE

NASA Technical Reports Server (NTRS)

Vengroff, Darren Erik; Vitter, Jeffrey Scott

1996-01-01

In recent years, input/output (I/O)-efficient algorithms for a wide variety of problems have appeared in the literature. However, systems specifically designed to assist programmers in implementing such algorithms have remained scarce. TPIE is a system designed to support I/O-efficient paradigms for problems from a variety of domains, including computational geometry, graph algorithms, and scientific computation. The TPIE interface frees programmers from having to deal not only with explicit read and write calls, but also the complex memory management that must be performed for I/O-efficient computation. In this paper we discuss applications of TPIE to problems in scientific computation. We discuss algorithmic issues underlying the design and implementation of the relevant components of TPIE and present performance results of programs written to solve a series of benchmark problems using our current TPIE prototype. Some of the benchmarks we present are based on the NAS parallel benchmarks while others are of our own creation. We demonstrate that the central processing unit (CPU) overhead required to manage I/O is small and that even with just a single disk, the I/O overhead of I/O-efficient computation ranges from negligible to the same order of magnitude as CPU time. We conjecture that if we use a number of disks in parallel this overhead can be all but eliminated.

Parallel CE/SE Computations via Domain Decomposition

NASA Technical Reports Server (NTRS)

Himansu, Ananda; Jorgenson, Philip C. E.; Wang, Xiao-Yen; Chang, Sin-Chung

2000-01-01

This paper describes the parallelization strategy and achieved parallel efficiency of an explicit time-marching algorithm for solving conservation laws. The Space-Time Conservation Element and Solution Element (CE/SE) algorithm for solving the 2D and 3D Euler equations is parallelized with the aid of domain decomposition. The parallel efficiency of the resultant algorithm on a Silicon Graphics Origin 2000 parallel computer is checked.
Efficient algorithms for single-axis attitude estimation

NASA Technical Reports Server (NTRS)

Shuster, M. D.

1981-01-01

The computationally efficient algorithms determine attitude from the measurement of art lengths and dihedral angles. The dependence of these algorithms on the solution of trigonometric equations was reduced. Both single time and batch estimators are presented along with the covariance analysis of each algorithm.
Development and application of unified algorithms for problems in computational science

NASA Technical Reports Server (NTRS)

Shankar, Vijaya; Chakravarthy, Sukumar

1987-01-01

A framework is presented for developing computationally unified numerical algorithms for solving nonlinear equations that arise in modeling various problems in mathematical physics. The concept of computational unification is an attempt to encompass efficient solution procedures for computing various nonlinear phenomena that may occur in a given problem. For example, in Computational Fluid Dynamics (CFD), a unified algorithm will be one that allows for solutions to subsonic (elliptic), transonic (mixed elliptic-hyperbolic), and supersonic (hyperbolic) flows for both steady and unsteady problems. The objectives are: development of superior unified algorithms emphasizing accuracy and efficiency aspects; development of codes based on selected algorithms leading to validation; application of mature codes to realistic problems; and extension/application of CFD-based algorithms to problems in other areas of mathematical physics. The ultimate objective is to achieve integration of multidisciplinary technologies to enhance synergism in the design process through computational simulation. Specific unified algorithms for a hierarchy of gas dynamics equations and their applications to two other areas: electromagnetic scattering, and laser-materials interaction accounting for melting.
Computational efficiency for the surface renewal method

NASA Astrophysics Data System (ADS)

Kelley, Jason; Higgins, Chad

2018-04-01

Measuring surface fluxes using the surface renewal (SR) method requires programmatic algorithms for tabulation, algebraic calculation, and data quality control. A number of different methods have been published describing automated calibration of SR parameters. Because the SR method utilizes high-frequency (10 Hz+) measurements, some steps in the flux calculation are computationally expensive, especially when automating SR to perform many iterations of these calculations. Several new algorithms were written that perform the required calculations more efficiently and rapidly, and that tested for sensitivity to length of flux averaging period, ability to measure over a large range of lag timescales, and overall computational efficiency. These algorithms utilize signal processing techniques and algebraic simplifications that demonstrate simple modifications that dramatically improve computational efficiency. The results here complement efforts by other authors to standardize a robust and accurate computational SR method. Increased speed of computation time grants flexibility to implementing the SR method, opening new avenues for SR to be used in research, for applied monitoring, and in novel field deployments.
BCM: toolkit for Bayesian analysis of Computational Models using samplers.

PubMed

Thijssen, Bram; Dijkstra, Tjeerd M H; Heskes, Tom; Wessels, Lodewyk F A

2016-10-21

Computational models in biology are characterized by a large degree of uncertainty. This uncertainty can be analyzed with Bayesian statistics, however, the sampling algorithms that are frequently used for calculating Bayesian statistical estimates are computationally demanding, and each algorithm has unique advantages and disadvantages. It is typically unclear, before starting an analysis, which algorithm will perform well on a given computational model. We present BCM, a toolkit for the Bayesian analysis of Computational Models using samplers. It provides efficient, multithreaded implementations of eleven algorithms for sampling from posterior probability distributions and for calculating marginal likelihoods. BCM includes tools to simplify the process of model specification and scripts for visualizing the results. The flexible architecture allows it to be used on diverse types of biological computational models. In an example inference task using a model of the cell cycle based on ordinary differential equations, BCM is significantly more efficient than existing software packages, allowing more challenging inference problems to be solved. BCM represents an efficient one-stop-shop for computational modelers wishing to use sampler-based Bayesian statistics.
Efficient Geometric Sound Propagation Using Visibility Culling

NASA Astrophysics Data System (ADS)

Chandak, Anish

2011-07-01

Simulating propagation of sound can improve the sense of realism in interactive applications such as video games and can lead to better designs in engineering applications such as architectural acoustics. In this thesis, we present geometric sound propagation techniques which are faster than prior methods and map well to upcoming parallel multi-core CPUs. We model specular reflections by using the image-source method and model finite-edge diffraction by using the well-known Biot-Tolstoy-Medwin (BTM) model. We accelerate the computation of specular reflections by applying novel visibility algorithms, FastV and AD-Frustum, which compute visibility from a point. We accelerate finite-edge diffraction modeling by applying a novel visibility algorithm which computes visibility from a region. Our visibility algorithms are based on frustum tracing and exploit recent advances in fast ray-hierarchy intersections, data-parallel computations, and scalable, multi-core algorithms. The AD-Frustum algorithm adapts its computation to the scene complexity and allows small errors in computing specular reflection paths for higher computational efficiency. FastV and our visibility algorithm from a region are general, object-space, conservative visibility algorithms that together significantly reduce the number of image sources compared to other techniques while preserving the same accuracy. Our geometric propagation algorithms are an order of magnitude faster than prior approaches for modeling specular reflections and two to ten times faster for modeling finite-edge diffraction. Our algorithms are interactive, scale almost linearly on multi-core CPUs, and can handle large, complex, and dynamic scenes. We also compare the accuracy of our sound propagation algorithms with other methods. Once sound propagation is performed, it is desirable to listen to the propagated sound in interactive and engineering applications. We can generate smooth, artifact-free output audio signals by applying efficient audio-processing algorithms. We also present the first efficient audio-processing algorithm for scenarios with simultaneously moving source and moving receiver (MS-MR) which incurs less than 25% overhead compared to static source and moving receiver (SS-MR) or moving source and static receiver (MS-SR) scenario.
An imperialist competitive algorithm for virtual machine placement in cloud computing

NASA Astrophysics Data System (ADS)

Jamali, Shahram; Malektaji, Sepideh; Analoui, Morteza

2017-05-01

Cloud computing, the recently emerged revolution in IT industry, is empowered by virtualisation technology. In this paradigm, the user's applications run over some virtual machines (VMs). The process of selecting proper physical machines to host these virtual machines is called virtual machine placement. It plays an important role on resource utilisation and power efficiency of cloud computing environment. In this paper, we propose an imperialist competitive-based algorithm for the virtual machine placement problem called ICA-VMPLC. The base optimisation algorithm is chosen to be ICA because of its ease in neighbourhood movement, good convergence rate and suitable terminology. The proposed algorithm investigates search space in a unique manner to efficiently obtain optimal placement solution that simultaneously minimises power consumption and total resource wastage. Its final solution performance is compared with several existing methods such as grouping genetic and ant colony-based algorithms as well as bin packing heuristic. The simulation results show that the proposed method is superior to other tested algorithms in terms of power consumption, resource wastage, CPU usage efficiency and memory usage efficiency.
Image restoration for three-dimensional fluorescence microscopy using an orthonormal basis for efficient representation of depth-variant point-spread functions

PubMed Central

Patwary, Nurmohammed; Preza, Chrysanthe

2015-01-01

A depth-variant (DV) image restoration algorithm for wide field fluorescence microscopy, using an orthonormal basis decomposition of DV point-spread functions (PSFs), is investigated in this study. The efficient PSF representation is based on a previously developed principal component analysis (PCA), which is computationally intensive. We present an approach developed to reduce the number of DV PSFs required for the PCA computation, thereby making the PCA-based approach computationally tractable for thick samples. Restoration results from both synthetic and experimental images show consistency and that the proposed algorithm addresses efficiently depth-induced aberration using a small number of principal components. Comparison of the PCA-based algorithm with a previously-developed strata-based DV restoration algorithm demonstrates that the proposed method improves performance by 50% in terms of accuracy and simultaneously reduces the processing time by 64% using comparable computational resources. PMID:26504634
Conjugate Gradient Algorithms For Manipulator Simulation

NASA Technical Reports Server (NTRS)

Fijany, Amir; Scheid, Robert E.

1991-01-01

Report discusses applicability of conjugate-gradient algorithms to computation of forward dynamics of robotic manipulators. Rapid computation of forward dynamics essential to teleoperation and other advanced robotic applications. Part of continuing effort to find algorithms meeting requirements for increased computational efficiency and speed. Method used for iterative solution of systems of linear equations.
Dynamic graph cuts for efficient inference in Markov Random Fields.

PubMed

Kohli, Pushmeet; Torr, Philip H S

2007-12-01

Abstract-In this paper we present a fast new fully dynamic algorithm for the st-mincut/max-flow problem. We show how this algorithm can be used to efficiently compute MAP solutions for certain dynamically changing MRF models in computer vision such as image segmentation. Specifically, given the solution of the max-flow problem on a graph, the dynamic algorithm efficiently computes the maximum flow in a modified version of the graph. The time taken by it is roughly proportional to the total amount of change in the edge weights of the graph. Our experiments show that, when the number of changes in the graph is small, the dynamic algorithm is significantly faster than the best known static graph cut algorithm. We test the performance of our algorithm on one particular problem: the object-background segmentation problem for video. It should be noted that the application of our algorithm is not limited to the above problem, the algorithm is generic and can be used to yield similar improvements in many other cases that involve dynamic change.
A POSTERIORI ERROR ANALYSIS OF TWO STAGE COMPUTATION METHODS WITH APPLICATION TO EFFICIENT DISCRETIZATION AND THE PARAREAL ALGORITHM.

PubMed

Chaudhry, Jehanzeb Hameed; Estep, Don; Tavener, Simon; Carey, Varis; Sandelin, Jeff

2016-01-01

We consider numerical methods for initial value problems that employ a two stage approach consisting of solution on a relatively coarse discretization followed by solution on a relatively fine discretization. Examples include adaptive error control, parallel-in-time solution schemes, and efficient solution of adjoint problems for computing a posteriori error estimates. We describe a general formulation of two stage computations then perform a general a posteriori error analysis based on computable residuals and solution of an adjoint problem. The analysis accommodates various variations in the two stage computation and in formulation of the adjoint problems. We apply the analysis to compute "dual-weighted" a posteriori error estimates, to develop novel algorithms for efficient solution that take into account cancellation of error, and to the Parareal Algorithm. We test the various results using several numerical examples.
Star adaptation for two-algorithms used on serial computers

NASA Technical Reports Server (NTRS)

Howser, L. M.; Lambiotte, J. J., Jr.

1974-01-01

Two representative algorithms used on a serial computer and presently executed on the Control Data Corporation 6000 computer were adapted to execute efficiently on the Control Data STAR-100 computer. Gaussian elimination for the solution of simultaneous linear equations and the Gauss-Legendre quadrature formula for the approximation of an integral are the two algorithms discussed. A description is given of how the programs were adapted for STAR and why these adaptations were necessary to obtain an efficient STAR program. Some points to consider when adapting an algorithm for STAR are discussed. Program listings of the 6000 version coded in 6000 FORTRAN, the adapted STAR version coded in 6000 FORTRAN, and the STAR version coded in STAR FORTRAN are presented in the appendices.
The application of dynamic programming in production planning

NASA Astrophysics Data System (ADS)

Wu, Run

2017-05-01

Nowadays, with the popularity of the computers, various industries and fields are widely applying computer information technology, which brings about huge demand for a variety of application software. In order to develop software meeting various needs with most economical cost and best quality, programmers must design efficient algorithms. A superior algorithm can not only soul up one thing, but also maximize the benefits and generate the smallest overhead. As one of the common algorithms, dynamic programming algorithms are used to solving problems with some sort of optimal properties. When solving problems with a large amount of sub-problems that needs repetitive calculations, the ordinary sub-recursive method requires to consume exponential time, and dynamic programming algorithm can reduce the time complexity of the algorithm to the polynomial level, according to which we can conclude that dynamic programming algorithm is a very efficient compared to other algorithms reducing the computational complexity and enriching the computational results. In this paper, we expound the concept, basic elements, properties, core, solving steps and difficulties of the dynamic programming algorithm besides, establish the dynamic programming model of the production planning problem.
Network Community Detection based on the Physarum-inspired Computational Framework.

PubMed

Gao, Chao; Liang, Mingxin; Li, Xianghua; Zhang, Zili; Wang, Zhen; Zhou, Zhili

2016-12-13

Community detection is a crucial and essential problem in the structure analytics of complex networks, which can help us understand and predict the characteristics and functions of complex networks. Many methods, ranging from the optimization-based algorithms to the heuristic-based algorithms, have been proposed for solving such a problem. Due to the inherent complexity of identifying network structure, how to design an effective algorithm with a higher accuracy and a lower computational cost still remains an open problem. Inspired by the computational capability and positive feedback mechanism in the wake of foraging process of Physarum, which is a large amoeba-like cell consisting of a dendritic network of tube-like pseudopodia, a general Physarum-based computational framework for community detection is proposed in this paper. Based on the proposed framework, the inter-community edges can be identified from the intra-community edges in a network and the positive feedback of solving process in an algorithm can be further enhanced, which are used to improve the efficiency of original optimization-based and heuristic-based community detection algorithms, respectively. Some typical algorithms (e.g., genetic algorithm, ant colony optimization algorithm, and Markov clustering algorithm) and real-world datasets have been used to estimate the efficiency of our proposed computational framework. Experiments show that the algorithms optimized by Physarum-inspired computational framework perform better than the original ones, in terms of accuracy and computational cost. Moreover, a computational complexity analysis verifies the scalability of our framework.
Automated Development of Accurate Algorithms and Efficient Codes for Computational Aeroacoustics

NASA Technical Reports Server (NTRS)

Goodrich, John W.; Dyson, Rodger W.

1999-01-01

The simulation of sound generation and propagation in three space dimensions with realistic aircraft components is a very large time dependent computation with fine details. Simulations in open domains with embedded objects require accurate and robust algorithms for propagation, for artificial inflow and outflow boundaries, and for the definition of geometrically complex objects. The development, implementation, and validation of methods for solving these demanding problems is being done to support the NASA pillar goals for reducing aircraft noise levels. Our goal is to provide algorithms which are sufficiently accurate and efficient to produce usable results rapidly enough to allow design engineers to study the effects on sound levels of design changes in propulsion systems, and in the integration of propulsion systems with airframes. There is a lack of design tools for these purposes at this time. Our technical approach to this problem combines the development of new, algorithms with the use of Mathematica and Unix utilities to automate the algorithm development, code implementation, and validation. We use explicit methods to ensure effective implementation by domain decomposition for SPMD parallel computing. There are several orders of magnitude difference in the computational efficiencies of the algorithms which we have considered. We currently have new artificial inflow and outflow boundary conditions that are stable, accurate, and unobtrusive, with implementations that match the accuracy and efficiency of the propagation methods. The artificial numerical boundary treatments have been proven to have solutions which converge to the full open domain problems, so that the error from the boundary treatments can be driven as low as is required. The purpose of this paper is to briefly present a method for developing highly accurate algorithms for computational aeroacoustics, the use of computer automation in this process, and a brief survey of the algorithms that have resulted from this work. A review of computational aeroacoustics has recently been given by Lele.
Efficient Parallel Algorithm For Direct Numerical Simulation of Turbulent Flows

NASA Technical Reports Server (NTRS)

Moitra, Stuti; Gatski, Thomas B.

1997-01-01

A distributed algorithm for a high-order-accurate finite-difference approach to the direct numerical simulation (DNS) of transition and turbulence in compressible flows is described. This work has two major objectives. The first objective is to demonstrate that parallel and distributed-memory machines can be successfully and efficiently used to solve computationally intensive and input/output intensive algorithms of the DNS class. The second objective is to show that the computational complexity involved in solving the tridiagonal systems inherent in the DNS algorithm can be reduced by algorithm innovations that obviate the need to use a parallelized tridiagonal solver.
A highly efficient multi-core algorithm for clustering extremely large datasets

PubMed Central

2010-01-01

Background In recent years, the demand for computational power in computational biology has increased due to rapidly growing data sets from microarray and other high-throughput technologies. This demand is likely to increase. Standard algorithms for analyzing data, such as cluster algorithms, need to be parallelized for fast processing. Unfortunately, most approaches for parallelizing algorithms largely rely on network communication protocols connecting and requiring multiple computers. One answer to this problem is to utilize the intrinsic capabilities in current multi-core hardware to distribute the tasks among the different cores of one computer. Results We introduce a multi-core parallelization of the k-means and k-modes cluster algorithms based on the design principles of transactional memory for clustering gene expression microarray type data and categorial SNP data. Our new shared memory parallel algorithms show to be highly efficient. We demonstrate their computational power and show their utility in cluster stability and sensitivity analysis employing repeated runs with slightly changed parameters. Computation speed of our Java based algorithm was increased by a factor of 10 for large data sets while preserving computational accuracy compared to single-core implementations and a recently published network based parallelization. Conclusions Most desktop computers and even notebooks provide at least dual-core processors. Our multi-core algorithms show that using modern algorithmic concepts, parallelization makes it possible to perform even such laborious tasks as cluster sensitivity and cluster number estimation on the laboratory computer. PMID:20370922
On the development of efficient algorithms for three dimensional fluid flow

NASA Technical Reports Server (NTRS)

Maccormack, R. W.

1988-01-01

The difficulties of constructing efficient algorithms for three-dimensional flow are discussed. Reasonable candidates are analyzed and tested, and most are found to have obvious shortcomings. Yet, there is promise that an efficient class of algorithms exist between the severely time-step sized-limited explicit or approximately factored algorithms and the computationally intensive direct inversion of large sparse matrices by Gaussian elimination.
The diffusive finite state projection algorithm for efficient simulation of the stochastic reaction-diffusion master equation.

PubMed

Drawert, Brian; Lawson, Michael J; Petzold, Linda; Khammash, Mustafa

2010-02-21

We have developed a computational framework for accurate and efficient simulation of stochastic spatially inhomogeneous biochemical systems. The new computational method employs a fractional step hybrid strategy. A novel formulation of the finite state projection (FSP) method, called the diffusive FSP method, is introduced for the efficient and accurate simulation of diffusive transport. Reactions are handled by the stochastic simulation algorithm.
Efficient least angle regression for identification of linear-in-the-parameters models

PubMed Central

Beach, Thomas H.; Rezgui, Yacine

2017-01-01

Least angle regression, as a promising model selection method, differentiates itself from conventional stepwise and stagewise methods, in that it is neither too greedy nor too slow. It is closely related to L1 norm optimization, which has the advantage of low prediction variance through sacrificing part of model bias property in order to enhance model generalization capability. In this paper, we propose an efficient least angle regression algorithm for model selection for a large class of linear-in-the-parameters models with the purpose of accelerating the model selection process. The entire algorithm works completely in a recursive manner, where the correlations between model terms and residuals, the evolving directions and other pertinent variables are derived explicitly and updated successively at every subset selection step. The model coefficients are only computed when the algorithm finishes. The direct involvement of matrix inversions is thereby relieved. A detailed computational complexity analysis indicates that the proposed algorithm possesses significant computational efficiency, compared with the original approach where the well-known efficient Cholesky decomposition is involved in solving least angle regression. Three artificial and real-world examples are employed to demonstrate the effectiveness, efficiency and numerical stability of the proposed algorithm. PMID:28293140

Parallel conjugate gradient algorithms for manipulator dynamic simulation

NASA Technical Reports Server (NTRS)

Fijany, Amir; Scheld, Robert E.

1989-01-01

Parallel conjugate gradient algorithms for the computation of multibody dynamics are developed for the specialized case of a robot manipulator. For an n-dimensional positive-definite linear system, the Classical Conjugate Gradient (CCG) algorithms are guaranteed to converge in n iterations, each with a computation cost of O(n); this leads to a total computational cost of O(n sq) on a serial processor. A conjugate gradient algorithms is presented that provide greater efficiency using a preconditioner, which reduces the number of iterations required, and by exploiting parallelism, which reduces the cost of each iteration. Two Preconditioned Conjugate Gradient (PCG) algorithms are proposed which respectively use a diagonal and a tridiagonal matrix, composed of the diagonal and tridiagonal elements of the mass matrix, as preconditioners. Parallel algorithms are developed to compute the preconditioners and their inversions in O(log sub 2 n) steps using n processors. A parallel algorithm is also presented which, on the same architecture, achieves the computational time of O(log sub 2 n) for each iteration. Simulation results for a seven degree-of-freedom manipulator are presented. Variants of the proposed algorithms are also developed which can be efficiently implemented on the Robot Mathematics Processor (RMP).
Fast Steerable Principal Component Analysis

PubMed Central

Zhao, Zhizhen; Shkolnisky, Yoel; Singer, Amit

2016-01-01

Cryo-electron microscopy nowadays often requires the analysis of hundreds of thousands of 2-D images as large as a few hundred pixels in each direction. Here, we introduce an algorithm that efficiently and accurately performs principal component analysis (PCA) for a large set of 2-D images, and, for each image, the set of its uniform rotations in the plane and their reflections. For a dataset consisting of n images of size L × L pixels, the computational complexity of our algorithm is O(nL3 + L4), while existing algorithms take O(nL4). The new algorithm computes the expansion coefficients of the images in a Fourier–Bessel basis efficiently using the nonuniform fast Fourier transform. We compare the accuracy and efficiency of the new algorithm with traditional PCA and existing algorithms for steerable PCA. PMID:27570801
A projected preconditioned conjugate gradient algorithm for computing many extreme eigenpairs of a Hermitian matrix [A projected preconditioned conjugate gradient algorithm for computing a large eigenspace of a Hermitian matrix

DOE PAGES

Vecharynski, Eugene; Yang, Chao; Pask, John E.

2015-02-25

Here, we present an iterative algorithm for computing an invariant subspace associated with the algebraically smallest eigenvalues of a large sparse or structured Hermitian matrix A. We are interested in the case in which the dimension of the invariant subspace is large (e.g., over several hundreds or thousands) even though it may still be small relative to the dimension of A. These problems arise from, for example, density functional theory (DFT) based electronic structure calculations for complex materials. The key feature of our algorithm is that it performs fewer Rayleigh–Ritz calculations compared to existing algorithms such as the locally optimalmore » block preconditioned conjugate gradient or the Davidson algorithm. It is a block algorithm, and hence can take advantage of efficient BLAS3 operations and be implemented with multiple levels of concurrency. We discuss a number of practical issues that must be addressed in order to implement the algorithm efficiently on a high performance computer.« less
Improved Ant Colony Clustering Algorithm and Its Performance Study

PubMed Central

Gao, Wei

2016-01-01

Clustering analysis is used in many disciplines and applications; it is an important tool that descriptively identifies homogeneous groups of objects based on attribute values. The ant colony clustering algorithm is a swarm-intelligent method used for clustering problems that is inspired by the behavior of ant colonies that cluster their corpses and sort their larvae. A new abstraction ant colony clustering algorithm using a data combination mechanism is proposed to improve the computational efficiency and accuracy of the ant colony clustering algorithm. The abstraction ant colony clustering algorithm is used to cluster benchmark problems, and its performance is compared with the ant colony clustering algorithm and other methods used in existing literature. Based on similar computational difficulties and complexities, the results show that the abstraction ant colony clustering algorithm produces results that are not only more accurate but also more efficiently determined than the ant colony clustering algorithm and the other methods. Thus, the abstraction ant colony clustering algorithm can be used for efficient multivariate data clustering. PMID:26839533
Efficient classical simulation of the Deutsch-Jozsa and Simon's algorithms

NASA Astrophysics Data System (ADS)

Johansson, Niklas; Larsson, Jan-Åke

2017-09-01

A long-standing aim of quantum information research is to understand what gives quantum computers their advantage. This requires separating problems that need genuinely quantum resources from those for which classical resources are enough. Two examples of quantum speed-up are the Deutsch-Jozsa and Simon's problem, both efficiently solvable on a quantum Turing machine, and both believed to lack efficient classical solutions. Here we present a framework that can simulate both quantum algorithms efficiently, solving the Deutsch-Jozsa problem with probability 1 using only one oracle query, and Simon's problem using linearly many oracle queries, just as expected of an ideal quantum computer. The presented simulation framework is in turn efficiently simulatable in a classical probabilistic Turing machine. This shows that the Deutsch-Jozsa and Simon's problem do not require any genuinely quantum resources, and that the quantum algorithms show no speed-up when compared with their corresponding classical simulation. Finally, this gives insight into what properties are needed in the two algorithms and calls for further study of oracle separation between quantum and classical computation.
Parallelization of Nullspace Algorithm for the computation of metabolic pathways

PubMed Central

Jevremović, Dimitrije; Trinh, Cong T.; Srienc, Friedrich; Sosa, Carlos P.; Boley, Daniel

2011-01-01

Elementary mode analysis is a useful metabolic pathway analysis tool in understanding and analyzing cellular metabolism, since elementary modes can represent metabolic pathways with unique and minimal sets of enzyme-catalyzed reactions of a metabolic network under steady state conditions. However, computation of the elementary modes of a genome- scale metabolic network with 100–1000 reactions is very expensive and sometimes not feasible with the commonly used serial Nullspace Algorithm. In this work, we develop a distributed memory parallelization of the Nullspace Algorithm to handle efficiently the computation of the elementary modes of a large metabolic network. We give an implementation in C++ language with the support of MPI library functions for the parallel communication. Our proposed algorithm is accompanied with an analysis of the complexity and identification of major bottlenecks during computation of all possible pathways of a large metabolic network. The algorithm includes methods to achieve load balancing among the compute-nodes and specific communication patterns to reduce the communication overhead and improve efficiency. PMID:22058581
Fast Ss-Ilm a Computationally Efficient Algorithm to Discover Socially Important Locations

NASA Astrophysics Data System (ADS)

Dokuz, A. S.; Celik, M.

2017-11-01

Socially important locations are places which are frequently visited by social media users in their social media lifetime. Discovering socially important locations provide several valuable information about user behaviours on social media networking sites. However, discovering socially important locations are challenging due to data volume and dimensions, spatial and temporal calculations, location sparseness in social media datasets, and inefficiency of current algorithms. In the literature, several studies are conducted to discover important locations, however, the proposed approaches do not work in computationally efficient manner. In this study, we propose Fast SS-ILM algorithm by modifying the algorithm of SS-ILM to mine socially important locations efficiently. Experimental results show that proposed Fast SS-ILM algorithm decreases execution time of socially important locations discovery process up to 20 %.
Efficient Numeric and Geometric Computations using Heterogeneous Shared Memory Architectures

DTIC Science & Technology

2017-10-04

Report: Efficient Numeric and Geometric Computations using Heterogeneous Shared Memory Architectures The views, opinions and/or findings contained in this...Chapel Hill Title: Efficient Numeric and Geometric Computations using Heterogeneous Shared Memory Architectures Report Term: 0-Other Email: dm...algorithms for scientific and geometric computing by exploiting the power and performance efficiency of heterogeneous shared memory architectures . These
Efficient Parallel Kernel Solvers for Computational Fluid Dynamics Applications

NASA Technical Reports Server (NTRS)

Sun, Xian-He

1997-01-01

Distributed-memory parallel computers dominate today's parallel computing arena. These machines, such as Intel Paragon, IBM SP2, and Cray Origin2OO, have successfully delivered high performance computing power for solving some of the so-called "grand-challenge" problems. Despite initial success, parallel machines have not been widely accepted in production engineering environments due to the complexity of parallel programming. On a parallel computing system, a task has to be partitioned and distributed appropriately among processors to reduce communication cost and to attain load balance. More importantly, even with careful partitioning and mapping, the performance of an algorithm may still be unsatisfactory, since conventional sequential algorithms may be serial in nature and may not be implemented efficiently on parallel machines. In many cases, new algorithms have to be introduced to increase parallel performance. In order to achieve optimal performance, in addition to partitioning and mapping, a careful performance study should be conducted for a given application to find a good algorithm-machine combination. This process, however, is usually painful and elusive. The goal of this project is to design and develop efficient parallel algorithms for highly accurate Computational Fluid Dynamics (CFD) simulations and other engineering applications. The work plan is 1) developing highly accurate parallel numerical algorithms, 2) conduct preliminary testing to verify the effectiveness and potential of these algorithms, 3) incorporate newly developed algorithms into actual simulation packages. The work plan has well achieved. Two highly accurate, efficient Poisson solvers have been developed and tested based on two different approaches: (1) Adopting a mathematical geometry which has a better capacity to describe the fluid, (2) Using compact scheme to gain high order accuracy in numerical discretization. The previously developed Parallel Diagonal Dominant (PDD) algorithm and Reduced Parallel Diagonal Dominant (RPDD) algorithm have been carefully studied on different parallel platforms for different applications, and a NASA simulation code developed by Man M. Rai and his colleagues has been parallelized and implemented based on data dependency analysis. These achievements are addressed in detail in the paper.
BWM*: A Novel, Provable, Ensemble-based Dynamic Programming Algorithm for Sparse Approximations of Computational Protein Design.

PubMed

Jou, Jonathan D; Jain, Swati; Georgiev, Ivelin S; Donald, Bruce R

2016-06-01

Sparse energy functions that ignore long range interactions between residue pairs are frequently used by protein design algorithms to reduce computational cost. Current dynamic programming algorithms that fully exploit the optimal substructure produced by these energy functions only compute the GMEC. This disproportionately favors the sequence of a single, static conformation and overlooks better binding sequences with multiple low-energy conformations. Provable, ensemble-based algorithms such as A* avoid this problem, but A* cannot guarantee better performance than exhaustive enumeration. We propose a novel, provable, dynamic programming algorithm called Branch-Width Minimization* (BWM*) to enumerate a gap-free ensemble of conformations in order of increasing energy. Given a branch-decomposition of branch-width w for an n-residue protein design with at most q discrete side-chain conformations per residue, BWM* returns the sparse GMEC in O([Formula: see text]) time and enumerates each additional conformation in merely O([Formula: see text]) time. We define a new measure, Total Effective Search Space (TESS), which can be computed efficiently a priori before BWM* or A* is run. We ran BWM* on 67 protein design problems and found that TESS discriminated between BWM*-efficient and A*-efficient cases with 100% accuracy. As predicted by TESS and validated experimentally, BWM* outperforms A* in 73% of the cases and computes the full ensemble or a close approximation faster than A*, enumerating each additional conformation in milliseconds. Unlike A*, the performance of BWM* can be predicted in polynomial time before running the algorithm, which gives protein designers the power to choose the most efficient algorithm for their particular design problem.
Enhancing battery efficiency for pervasive health-monitoring systems based on electronic textiles.

PubMed

Zheng, Nenggan; Wu, Zhaohui; Lin, Man; Yang, Laurence Tianruo

2010-03-01

Electronic textiles are regarded as one of the most important computation platforms for future computer-assisted health-monitoring applications. In these novel systems, multiple batteries are used in order to prolong their operational lifetime, which is a significant metric for system usability. However, due to the nonlinear features of batteries, computing systems with multiple batteries cannot achieve the same battery efficiency as those powered by a monolithic battery of equal capacity. In this paper, we propose an algorithm aiming to maximize battery efficiency globally for the computer-assisted health-care systems with multiple batteries. Based on an accurate analytical battery model, the concept of weighted battery fatigue degree is introduced and the novel battery-scheduling algorithm called predicted weighted fatigue degree least first (PWFDLF) is developed. Besides, we also discuss our attempts during search PWFDLF: a weighted round-robin (WRR) and a greedy algorithm achieving highest local battery efficiency, which reduces to the sequential discharging policy. Evaluation results show that a considerable improvement in battery efficiency can be obtained by PWFDLF under various battery configurations and current profiles compared to conventional sequential and WRR discharging policies.
A Parallel Nonrigid Registration Algorithm Based on B-Spline for Medical Images.

PubMed

Du, Xiaogang; Dang, Jianwu; Wang, Yangping; Wang, Song; Lei, Tao

2016-01-01

The nonrigid registration algorithm based on B-spline Free-Form Deformation (FFD) plays a key role and is widely applied in medical image processing due to the good flexibility and robustness. However, it requires a tremendous amount of computing time to obtain more accurate registration results especially for a large amount of medical image data. To address the issue, a parallel nonrigid registration algorithm based on B-spline is proposed in this paper. First, the Logarithm Squared Difference (LSD) is considered as the similarity metric in the B-spline registration algorithm to improve registration precision. After that, we create a parallel computing strategy and lookup tables (LUTs) to reduce the complexity of the B-spline registration algorithm. As a result, the computing time of three time-consuming steps including B-splines interpolation, LSD computation, and the analytic gradient computation of LSD, is efficiently reduced, for the B-spline registration algorithm employs the Nonlinear Conjugate Gradient (NCG) optimization method. Experimental results of registration quality and execution efficiency on the large amount of medical images show that our algorithm achieves a better registration accuracy in terms of the differences between the best deformation fields and ground truth and a speedup of 17 times over the single-threaded CPU implementation due to the powerful parallel computing ability of Graphics Processing Unit (GPU).
Algorithm-Based Fault Tolerance Integrated with Replication

NASA Technical Reports Server (NTRS)

Some, Raphael; Rennels, David

2008-01-01

In a proposed approach to programming and utilization of commercial off-the-shelf computing equipment, a combination of algorithm-based fault tolerance (ABFT) and replication would be utilized to obtain high degrees of fault tolerance without incurring excessive costs. The basic idea of the proposed approach is to integrate ABFT with replication such that the algorithmic portions of computations would be protected by ABFT, and the logical portions by replication. ABFT is an extremely efficient, inexpensive, high-coverage technique for detecting and mitigating faults in computer systems used for algorithmic computations, but does not protect against errors in logical operations surrounding algorithms.
Efficient mapping algorithms for scheduling robot inverse dynamics computation on a multiprocessor system

NASA Technical Reports Server (NTRS)

Lee, C. S. G.; Chen, C. L.

1989-01-01

Two efficient mapping algorithms for scheduling the robot inverse dynamics computation consisting of m computational modules with precedence relationship to be executed on a multiprocessor system consisting of p identical homogeneous processors with processor and communication costs to achieve minimum computation time are presented. An objective function is defined in terms of the sum of the processor finishing time and the interprocessor communication time. The minimax optimization is performed on the objective function to obtain the best mapping. This mapping problem can be formulated as a combination of the graph partitioning and the scheduling problems; both have been known to be NP-complete. Thus, to speed up the searching for a solution, two heuristic algorithms were proposed to obtain fast but suboptimal mapping solutions. The first algorithm utilizes the level and the communication intensity of the task modules to construct an ordered priority list of ready modules and the module assignment is performed by a weighted bipartite matching algorithm. For a near-optimal mapping solution, the problem can be solved by the heuristic algorithm with simulated annealing. These proposed optimization algorithms can solve various large-scale problems within a reasonable time. Computer simulations were performed to evaluate and verify the performance and the validity of the proposed mapping algorithms. Finally, experiments for computing the inverse dynamics of a six-jointed PUMA-like manipulator based on the Newton-Euler dynamic equations were implemented on an NCUBE/ten hypercube computer to verify the proposed mapping algorithms. Computer simulation and experimental results are compared and discussed.
A novel combined SLAM based on RBPF-SLAM and EIF-SLAM for mobile system sensing in a large scale environment.

PubMed

He, Bo; Zhang, Shujing; Yan, Tianhong; Zhang, Tao; Liang, Yan; Zhang, Hongjin

2011-01-01

Mobile autonomous systems are very important for marine scientific investigation and military applications. Many algorithms have been studied to deal with the computational efficiency problem required for large scale simultaneous localization and mapping (SLAM) and its related accuracy and consistency. Among these methods, submap-based SLAM is a more effective one. By combining the strength of two popular mapping algorithms, the Rao-Blackwellised particle filter (RBPF) and extended information filter (EIF), this paper presents a combined SLAM-an efficient submap-based solution to the SLAM problem in a large scale environment. RBPF-SLAM is used to produce local maps, which are periodically fused into an EIF-SLAM algorithm. RBPF-SLAM can avoid linearization of the robot model during operating and provide a robust data association, while EIF-SLAM can improve the whole computational speed, and avoid the tendency of RBPF-SLAM to be over-confident. In order to further improve the computational speed in a real time environment, a binary-tree-based decision-making strategy is introduced. Simulation experiments show that the proposed combined SLAM algorithm significantly outperforms currently existing algorithms in terms of accuracy and consistency, as well as the computing efficiency. Finally, the combined SLAM algorithm is experimentally validated in a real environment by using the Victoria Park dataset.
Sensor placement algorithm development to maximize the efficiency of acid gas removal unit for integrated gasifiction combined sycle (IGCC) power plant with CO2 capture

DOE Office of Scientific and Technical Information (OSTI.GOV)

Paul, P.; Bhattacharyya, D.; Turton, R.

2012-01-01

Future integrated gasification combined cycle (IGCC) power plants with CO{sub 2} capture will face stricter operational and environmental constraints. Accurate values of relevant states/outputs/disturbances are needed to satisfy these constraints and to maximize the operational efficiency. Unfortunately, a number of these process variables cannot be measured while a number of them can be measured, but have low precision, reliability, or signal-to-noise ratio. In this work, a sensor placement (SP) algorithm is developed for optimal selection of sensor location, number, and type that can maximize the plant efficiency and result in a desired precision of the relevant measured/unmeasured states. In thismore » work, an SP algorithm is developed for an selective, dual-stage Selexol-based acid gas removal (AGR) unit for an IGCC plant with pre-combustion CO{sub 2} capture. A comprehensive nonlinear dynamic model of the AGR unit is developed in Aspen Plus Dynamics® (APD) and used to generate a linear state-space model that is used in the SP algorithm. The SP algorithm is developed with the assumption that an optimal Kalman filter will be implemented in the plant for state and disturbance estimation. The algorithm is developed assuming steady-state Kalman filtering and steady-state operation of the plant. The control system is considered to operate based on the estimated states and thereby, captures the effects of the SP algorithm on the overall plant efficiency. The optimization problem is solved by Genetic Algorithm (GA) considering both linear and nonlinear equality and inequality constraints. Due to the very large number of candidate sets available for sensor placement and because of the long time that it takes to solve the constrained optimization problem that includes more than 1000 states, solution of this problem is computationally expensive. For reducing the computation time, parallel computing is performed using the Distributed Computing Server (DCS®) and the Parallel Computing® toolbox from Mathworks®. In this presentation, we will share our experience in setting up parallel computing using GA in the MATLAB® environment and present the overall approach for achieving higher computational efficiency in this framework.« less
Sensor placement algorithm development to maximize the efficiency of acid gas removal unit for integrated gasification combined cycle (IGCC) power plant with CO{sub 2} capture

DOE Office of Scientific and Technical Information (OSTI.GOV)

Paul, P.; Bhattacharyya, D.; Turton, R.

2012-01-01

Future integrated gasification combined cycle (IGCC) power plants with CO{sub 2} capture will face stricter operational and environmental constraints. Accurate values of relevant states/outputs/disturbances are needed to satisfy these constraints and to maximize the operational efficiency. Unfortunately, a number of these process variables cannot be measured while a number of them can be measured, but have low precision, reliability, or signal-to-noise ratio. In this work, a sensor placement (SP) algorithm is developed for optimal selection of sensor location, number, and type that can maximize the plant efficiency and result in a desired precision of the relevant measured/unmeasured states. In thismore » work, an SP algorithm is developed for an selective, dual-stage Selexol-based acid gas removal (AGR) unit for an IGCC plant with pre-combustion CO{sub 2} capture. A comprehensive nonlinear dynamic model of the AGR unit is developed in Aspen Plus Dynamics® (APD) and used to generate a linear state-space model that is used in the SP algorithm. The SP algorithm is developed with the assumption that an optimal Kalman filter will be implemented in the plant for state and disturbance estimation. The algorithm is developed assuming steady-state Kalman filtering and steady-state operation of the plant. The control system is considered to operate based on the estimated states and thereby, captures the effects of the SP algorithm on the overall plant efficiency. The optimization problem is solved by Genetic Algorithm (GA) considering both linear and nonlinear equality and inequality constraints. Due to the very large number of candidate sets available for sensor placement and because of the long time that it takes to solve the constrained optimization problem that includes more than 1000 states, solution of this problem is computationally expensive. For reducing the computation time, parallel computing is performed using the Distributed Computing Server (DCS®) and the Parallel Computing® toolbox from Mathworks®. In this presentation, we will share our experience in setting up parallel computing using GA in the MATLAB® environment and present the overall approach for achieving higher computational efficiency in this framework.« less
The efficient simulation of separated three-dimensional viscous flows using the boundary-layer equations

NASA Technical Reports Server (NTRS)

Van Dalsem, W. R.; Steger, J. L.

1985-01-01

A simple and computationally efficient algorithm for solving the unsteady three-dimensional boundary-layer equations in the time-accurate or relaxation mode is presented. Results of the new algorithm are shown to be in quantitative agreement with detailed experimental data for flow over a swept infinite wing. The separated flow over a 6:1 ellipsoid at angle of attack, and the transonic flow over a finite-wing with shock-induced 'mushroom' separation are also computed and compared with available experimental data. It is concluded that complex, separated, three-dimensional viscous layers can be economically and routinely computed using a time-relaxation boundary-layer algorithm.
Efficient dynamic simulation for multiple chain robotic mechanisms

NASA Technical Reports Server (NTRS)

Lilly, Kathryn W.; Orin, David E.

1989-01-01

An efficient O(mN) algorithm for dynamic simulation of simple closed-chain robotic mechanisms is presented, where m is the number of chains, and N is the number of degrees of freedom for each chain. It is based on computation of the operational space inertia matrix (6 x 6) for each chain as seen by the body, load, or object. Also, computation of the chain dynamics, when opened at one end, is required, and the most efficient algorithm is used for this purpose. Parallel implementation of the dynamics for each chain results in an O(N) + O(log sub 2 m+1) algorithm.
Visual saliency-based fast intracoding algorithm for high efficiency video coding

NASA Astrophysics Data System (ADS)

Zhou, Xin; Shi, Guangming; Zhou, Wei; Duan, Zhemin

2017-01-01

Intraprediction has been significantly improved in high efficiency video coding over H.264/AVC with quad-tree-based coding unit (CU) structure from size 64×64 to 8×8 and more prediction modes. However, these techniques cause a dramatic increase in computational complexity. An intracoding algorithm is proposed that consists of perceptual fast CU size decision algorithm and fast intraprediction mode decision algorithm. First, based on the visual saliency detection, an adaptive and fast CU size decision method is proposed to alleviate intraencoding complexity. Furthermore, a fast intraprediction mode decision algorithm with step halving rough mode decision method and early modes pruning algorithm is presented to selectively check the potential modes and effectively reduce the complexity of computation. Experimental results show that our proposed fast method reduces the computational complexity of the current HM to about 57% in encoding time with only 0.37% increases in BD rate. Meanwhile, the proposed fast algorithm has reasonable peak signal-to-noise ratio losses and nearly the same subjective perceptual quality.

Far-field radiation patterns of aperture antennas by the Winograd Fourier transform algorithm

NASA Technical Reports Server (NTRS)

Heisler, R.

1978-01-01

A more time-efficient algorithm for computing the discrete Fourier transform, the Winograd Fourier transform (WFT), is described. The WFT algorithm is compared with other transform algorithms. Results indicate that the WFT algorithm in antenna analysis appears to be a very successful application. Significant savings in cpu time will improve the computer turn around time and circumvent the need to resort to weekend runs.
Redundancy management for efficient fault recovery in NASA's distributed computing system

NASA Technical Reports Server (NTRS)

Malek, Miroslaw; Pandya, Mihir; Yau, Kitty

1991-01-01

The management of redundancy in computer systems was studied and guidelines were provided for the development of NASA's fault-tolerant distributed systems. Fault recovery and reconfiguration mechanisms were examined. A theoretical foundation was laid for redundancy management by efficient reconfiguration methods and algorithmic diversity. Algorithms were developed to optimize the resources for embedding of computational graphs of tasks in the system architecture and reconfiguration of these tasks after a failure has occurred. The computational structure represented by a path and the complete binary tree was considered and the mesh and hypercube architectures were targeted for their embeddings. The innovative concept of Hybrid Algorithm Technique was introduced. This new technique provides a mechanism for obtaining fault tolerance while exhibiting improved performance.
Computationally-Efficient Minimum-Time Aircraft Routes in the Presence of Winds

NASA Technical Reports Server (NTRS)

Jardin, Matthew R.

2004-01-01

A computationally efficient algorithm for minimizing the flight time of an aircraft in a variable wind field has been invented. The algorithm, referred to as Neighboring Optimal Wind Routing (NOWR), is based upon neighboring-optimal-control (NOC) concepts and achieves minimum-time paths by adjusting aircraft heading according to wind conditions at an arbitrary number of wind measurement points along the flight route. The NOWR algorithm may either be used in a fast-time mode to compute minimum- time routes prior to flight, or may be used in a feedback mode to adjust aircraft heading in real-time. By traveling minimum-time routes instead of direct great-circle (direct) routes, flights across the United States can save an average of about 7 minutes, and as much as one hour of flight time during periods of strong jet-stream winds. The neighboring optimal routes computed via the NOWR technique have been shown to be within 1.5 percent of the absolute minimum-time routes for flights across the continental United States. On a typical 450-MHz Sun Ultra workstation, the NOWR algorithm produces complete minimum-time routes in less than 40 milliseconds. This corresponds to a rate of 25 optimal routes per second. The closest comparable optimization technique runs approximately 10 times slower. Airlines currently use various trial-and-error search techniques to determine which of a set of commonly traveled routes will minimize flight time. These algorithms are too computationally expensive for use in real-time systems, or in systems where many optimal routes need to be computed in a short amount of time. Instead of operating in real-time, airlines will typically plan a trajectory several hours in advance using wind forecasts. If winds change significantly from forecasts, the resulting flights will no longer be minimum-time. The need for a computationally efficient wind-optimal routing algorithm is even greater in the case of new air-traffic-control automation concepts. For air-traffic-control automation, thousands of wind-optimal routes may need to be computed and checked for conflicts in just a few minutes. These factors motivated the need for a more efficient wind-optimal routing algorithm.
Parallel algorithms for computation of the manipulator inertia matrix

NASA Technical Reports Server (NTRS)

Amin-Javaheri, Masoud; Orin, David E.

1989-01-01

The development of an O(log2N) parallel algorithm for the manipulator inertia matrix is presented. It is based on the most efficient serial algorithm which uses the composite rigid body method. Recursive doubling is used to reformulate the linear recurrence equations which are required to compute the diagonal elements of the matrix. It results in O(log2N) levels of computation. Computation of the off-diagonal elements involves N linear recurrences of varying-size and a new method, which avoids redundant computation of position and orientation transforms for the manipulator, is developed. The O(log2N) algorithm is presented in both equation and graphic forms which clearly show the parallelism inherent in the algorithm.
Spectral compression algorithms for the analysis of very large multivariate images

DOEpatents

Keenan, Michael R.

2007-10-16

A method for spectrally compressing data sets enables the efficient analysis of very large multivariate images. The spectral compression algorithm uses a factored representation of the data that can be obtained from Principal Components Analysis or other factorization technique. Furthermore, a block algorithm can be used for performing common operations more efficiently. An image analysis can be performed on the factored representation of the data, using only the most significant factors. The spectral compression algorithm can be combined with a spatial compression algorithm to provide further computational efficiencies.
A community detection algorithm based on structural similarity

NASA Astrophysics Data System (ADS)

Guo, Xuchao; Hao, Xia; Liu, Yaqiong; Zhang, Li; Wang, Lu

2017-09-01

In order to further improve the efficiency and accuracy of community detection algorithm, a new algorithm named SSTCA (the community detection algorithm based on structural similarity with threshold) is proposed. In this algorithm, the structural similarities are taken as the weights of edges, and the threshold k is considered to remove multiple edges whose weights are less than the threshold, and improve the computational efficiency. Tests were done on the Zachary’s network, Dolphins’ social network and Football dataset by the proposed algorithm, and compared with GN and SSNCA algorithm. The results show that the new algorithm is superior to other algorithms in accuracy for the dense networks and the operating efficiency is improved obviously.
Biclustering Protein Complex Interactions with a Biclique FindingAlgorithm

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ding, Chris; Zhang, Anne Ya; Holbrook, Stephen

2006-12-01

Biclustering has many applications in text mining, web clickstream mining, and bioinformatics. When data entries are binary, the tightest biclusters become bicliques. We propose a flexible and highly efficient algorithm to compute bicliques. We first generalize the Motzkin-Straus formalism for computing the maximal clique from L{sub 1} constraint to L{sub p} constraint, which enables us to provide a generalized Motzkin-Straus formalism for computing maximal-edge bicliques. By adjusting parameters, the algorithm can favor biclusters with more rows less columns, or vice verse, thus increasing the flexibility of the targeted biclusters. We then propose an algorithm to solve the generalized Motzkin-Straus optimizationmore » problem. The algorithm is provably convergent and has a computational complexity of O(|E|) where |E| is the number of edges. It relies on a matrix vector multiplication and runs efficiently on most current computer architectures. Using this algorithm, we bicluster the yeast protein complex interaction network. We find that biclustering protein complexes at the protein level does not clearly reflect the functional linkage among protein complexes in many cases, while biclustering at protein domain level can reveal many underlying linkages. We show several new biologically significant results.« less
A synthetic visual plane algorithm for visibility computation in consideration of accuracy and efficiency

NASA Astrophysics Data System (ADS)

Yu, Jieqing; Wu, Lixin; Hu, Qingsong; Yan, Zhigang; Zhang, Shaoliang

2017-12-01

Visibility computation is of great interest to location optimization, environmental planning, ecology, and tourism. Many algorithms have been developed for visibility computation. In this paper, we propose a novel method of visibility computation, called synthetic visual plane (SVP), to achieve better performance with respect to efficiency, accuracy, or both. The method uses a global horizon, which is a synthesis of line-of-sight information of all nearer points, to determine the visibility of a point, which makes it an accurate visibility method. We used discretization of horizon to gain a good performance in efficiency. After discretization, the accuracy and efficiency of SVP depends on the scale of discretization (i.e., zone width). The method is more accurate at smaller zone widths, but this requires a longer operating time. Users must strike a balance between accuracy and efficiency at their discretion. According to our experiments, SVP is less accurate but more efficient than R2 if the zone width is set to one grid. However, SVP becomes more accurate than R2 when the zone width is set to 1/24 grid, while it continues to perform as fast or faster than R2. Although SVP performs worse than reference plane and depth map with respect to efficiency, it is superior in accuracy to these other two algorithms.
Computing border bases using mutant strategies

NASA Astrophysics Data System (ADS)

Ullah, E.; Abbas Khan, S.

2014-01-01

Border bases, a generalization of Gröbner bases, have actively been addressed during recent years due to their applicability to industrial problems. In cryptography and coding theory a useful application of border based is to solve zero-dimensional systems of polynomial equations over finite fields, which motivates us for developing optimizations of the algorithms that compute border bases. In 2006, Kehrein and Kreuzer formulated the Border Basis Algorithm (BBA), an algorithm which allows the computation of border bases that relate to a degree compatible term ordering. In 2007, J. Ding et al. introduced mutant strategies bases on finding special lower degree polynomials in the ideal. The mutant strategies aim to distinguish special lower degree polynomials (mutants) from the other polynomials and give them priority in the process of generating new polynomials in the ideal. In this paper we develop hybrid algorithms that use the ideas of J. Ding et al. involving the concept of mutants to optimize the Border Basis Algorithm for solving systems of polynomial equations over finite fields. In particular, we recall a version of the Border Basis Algorithm which is actually called the Improved Border Basis Algorithm and propose two hybrid algorithms, called MBBA and IMBBA. The new mutants variants provide us space efficiency as well as time efficiency. The efficiency of these newly developed hybrid algorithms is discussed using standard cryptographic examples.
Interaction sorting method for molecular dynamics on multi-core SIMD CPU architecture.

PubMed

Matvienko, Sergey; Alemasov, Nikolay; Fomin, Eduard

2015-02-01

Molecular dynamics (MD) is widely used in computational biology for studying binding mechanisms of molecules, molecular transport, conformational transitions, protein folding, etc. The method is computationally expensive; thus, the demand for the development of novel, much more efficient algorithms is still high. Therefore, the new algorithm designed in 2007 and called interaction sorting (IS) clearly attracted interest, as it outperformed the most efficient MD algorithms. In this work, a new IS modification is proposed which allows the algorithm to utilize SIMD processor instructions. This paper shows that the improvement provides an additional gain in performance, 9% to 45% in comparison to the original IS method.
An efficient algorithm to compute marginal posterior genotype probabilities for every member of a pedigree with loops

PubMed Central

2009-01-01

Background Marginal posterior genotype probabilities need to be computed for genetic analyses such as geneticcounseling in humans and selective breeding in animal and plant species. Methods In this paper, we describe a peeling based, deterministic, exact algorithm to compute efficiently genotype probabilities for every member of a pedigree with loops without recourse to junction-tree methods from graph theory. The efficiency in computing the likelihood by peeling comes from storing intermediate results in multidimensional tables called cutsets. Computing marginal genotype probabilities for individual i requires recomputing the likelihood for each of the possible genotypes of individual i. This can be done efficiently by storing intermediate results in two types of cutsets called anterior and posterior cutsets and reusing these intermediate results to compute the likelihood. Examples A small example is used to illustrate the theoretical concepts discussed in this paper, and marginal genotype probabilities are computed at a monogenic disease locus for every member in a real cattle pedigree. PMID:19958551
Computer architecture for efficient algorithmic executions in real-time systems: New technology for avionics systems and advanced space vehicles

NASA Technical Reports Server (NTRS)

Carroll, Chester C.; Youngblood, John N.; Saha, Aindam

1987-01-01

Improvements and advances in the development of computer architecture now provide innovative technology for the recasting of traditional sequential solutions into high-performance, low-cost, parallel system to increase system performance. Research conducted in development of specialized computer architecture for the algorithmic execution of an avionics system, guidance and control problem in real time is described. A comprehensive treatment of both the hardware and software structures of a customized computer which performs real-time computation of guidance commands with updated estimates of target motion and time-to-go is presented. An optimal, real-time allocation algorithm was developed which maps the algorithmic tasks onto the processing elements. This allocation is based on the critical path analysis. The final stage is the design and development of the hardware structures suitable for the efficient execution of the allocated task graph. The processing element is designed for rapid execution of the allocated tasks. Fault tolerance is a key feature of the overall architecture. Parallel numerical integration techniques, tasks definitions, and allocation algorithms are discussed. The parallel implementation is analytically verified and the experimental results are presented. The design of the data-driven computer architecture, customized for the execution of the particular algorithm, is discussed.
Computer architecture for efficient algorithmic executions in real-time systems: new technology for avionics systems and advanced space vehicles

DOE Office of Scientific and Technical Information (OSTI.GOV)

Carroll, C.C.; Youngblood, J.N.; Saha, A.

1987-12-01

Improvements and advances in the development of computer architecture now provide innovative technology for the recasting of traditional sequential solutions into high-performance, low-cost, parallel system to increase system performance. Research conducted in development of specialized computer architecture for the algorithmic execution of an avionics system, guidance and control problem in real time is described. A comprehensive treatment of both the hardware and software structures of a customized computer which performs real-time computation of guidance commands with updated estimates of target motion and time-to-go is presented. An optimal, real-time allocation algorithm was developed which maps the algorithmic tasks onto the processingmore » elements. This allocation is based on the critical path analysis. The final stage is the design and development of the hardware structures suitable for the efficient execution of the allocated task graph. The processing element is designed for rapid execution of the allocated tasks. Fault tolerance is a key feature of the overall architecture. Parallel numerical integration techniques, tasks definitions, and allocation algorithms are discussed. The parallel implementation is analytically verified and the experimental results are presented. The design of the data-driven computer architecture, customized for the execution of the particular algorithm, is discussed.« less
Parallel Computing Strategies for Irregular Algorithms

NASA Technical Reports Server (NTRS)

Biswas, Rupak; Oliker, Leonid; Shan, Hongzhang; Biegel, Bryan (Technical Monitor)

2002-01-01

Parallel computing promises several orders of magnitude increase in our ability to solve realistic computationally-intensive problems, but relies on their efficient mapping and execution on large-scale multiprocessor architectures. Unfortunately, many important applications are irregular and dynamic in nature, making their effective parallel implementation a daunting task. Moreover, with the proliferation of parallel architectures and programming paradigms, the typical scientist is faced with a plethora of questions that must be answered in order to obtain an acceptable parallel implementation of the solution algorithm. In this paper, we consider three representative irregular applications: unstructured remeshing, sparse matrix computations, and N-body problems, and parallelize them using various popular programming paradigms on a wide spectrum of computer platforms ranging from state-of-the-art supercomputers to PC clusters. We present the underlying problems, the solution algorithms, and the parallel implementation strategies. Smart load-balancing, partitioning, and ordering techniques are used to enhance parallel performance. Overall results demonstrate the complexity of efficiently parallelizing irregular algorithms.
Accelerated simulation of stochastic particle removal processes in particle-resolved aerosol models

DOE Office of Scientific and Technical Information (OSTI.GOV)

Curtis, J.H.; Michelotti, M.D.; Riemer, N.

2016-10-01

Stochastic particle-resolved methods have proven useful for simulating multi-dimensional systems such as composition-resolved aerosol size distributions. While particle-resolved methods have substantial benefits for highly detailed simulations, these techniques suffer from high computational cost, motivating efforts to improve their algorithmic efficiency. Here we formulate an algorithm for accelerating particle removal processes by aggregating particles of similar size into bins. We present the Binned Algorithm for particle removal processes and analyze its performance with application to the atmospherically relevant process of aerosol dry deposition. We show that the Binned Algorithm can dramatically improve the efficiency of particle removals, particularly for low removalmore » rates, and that computational cost is reduced without introducing additional error. In simulations of aerosol particle removal by dry deposition in atmospherically relevant conditions, we demonstrate about 50-times increase in algorithm efficiency.« less
Efficiency Analysis of the Parallel Implementation of the SIMPLE Algorithm on Multiprocessor Computers

NASA Astrophysics Data System (ADS)

Lashkin, S. V.; Kozelkov, A. S.; Yalozo, A. V.; Gerasimov, V. Yu.; Zelensky, D. K.

2017-12-01

This paper describes the details of the parallel implementation of the SIMPLE algorithm for numerical solution of the Navier-Stokes system of equations on arbitrary unstructured grids. The iteration schemes for the serial and parallel versions of the SIMPLE algorithm are implemented. In the description of the parallel implementation, special attention is paid to computational data exchange among processors under the condition of the grid model decomposition using fictitious cells. We discuss the specific features for the storage of distributed matrices and implementation of vector-matrix operations in parallel mode. It is shown that the proposed way of matrix storage reduces the number of interprocessor exchanges. A series of numerical experiments illustrates the effect of the multigrid SLAE solver tuning on the general efficiency of the algorithm; the tuning involves the types of the cycles used (V, W, and F), the number of iterations of a smoothing operator, and the number of cells for coarsening. Two ways (direct and indirect) of efficiency evaluation for parallelization of the numerical algorithm are demonstrated. The paper presents the results of solving some internal and external flow problems with the evaluation of parallelization efficiency by two algorithms. It is shown that the proposed parallel implementation enables efficient computations for the problems on a thousand processors. Based on the results obtained, some general recommendations are made for the optimal tuning of the multigrid solver, as well as for selecting the optimal number of cells per processor.
Job Scheduling with Efficient Resource Monitoring in Cloud Datacenter

PubMed Central

Loganathan, Shyamala; Mukherjee, Saswati

2015-01-01

Cloud computing is an on-demand computing model, which uses virtualization technology to provide cloud resources to users in the form of virtual machines through internet. Being an adaptable technology, cloud computing is an excellent alternative for organizations for forming their own private cloud. Since the resources are limited in these private clouds maximizing the utilization of resources and giving the guaranteed service for the user are the ultimate goal. For that, efficient scheduling is needed. This research reports on an efficient data structure for resource management and resource scheduling technique in a private cloud environment and discusses a cloud model. The proposed scheduling algorithm considers the types of jobs and the resource availability in its scheduling decision. Finally, we conducted simulations using CloudSim and compared our algorithm with other existing methods, like V-MCT and priority scheduling algorithms. PMID:26473166
Job Scheduling with Efficient Resource Monitoring in Cloud Datacenter.

PubMed

Loganathan, Shyamala; Mukherjee, Saswati

2015-01-01

Cloud computing is an on-demand computing model, which uses virtualization technology to provide cloud resources to users in the form of virtual machines through internet. Being an adaptable technology, cloud computing is an excellent alternative for organizations for forming their own private cloud. Since the resources are limited in these private clouds maximizing the utilization of resources and giving the guaranteed service for the user are the ultimate goal. For that, efficient scheduling is needed. This research reports on an efficient data structure for resource management and resource scheduling technique in a private cloud environment and discusses a cloud model. The proposed scheduling algorithm considers the types of jobs and the resource availability in its scheduling decision. Finally, we conducted simulations using CloudSim and compared our algorithm with other existing methods, like V-MCT and priority scheduling algorithms.
Vibration extraction based on fast NCC algorithm and high-speed camera.

PubMed

Lei, Xiujun; Jin, Yi; Guo, Jie; Zhu, Chang'an

2015-09-20

In this study, a high-speed camera system is developed to complete the vibration measurement in real time and to overcome the mass introduced by conventional contact measurements. The proposed system consists of a notebook computer and a high-speed camera which can capture the images as many as 1000 frames per second. In order to process the captured images in the computer, the normalized cross-correlation (NCC) template tracking algorithm with subpixel accuracy is introduced. Additionally, a modified local search algorithm based on the NCC is proposed to reduce the computation time and to increase efficiency significantly. The modified algorithm can rapidly accomplish one displacement extraction 10 times faster than the traditional template matching without installing any target panel onto the structures. Two experiments were carried out under laboratory and outdoor conditions to validate the accuracy and efficiency of the system performance in practice. The results demonstrated the high accuracy and efficiency of the camera system in extracting vibrating signals.
A Parallel Nonrigid Registration Algorithm Based on B-Spline for Medical Images

PubMed Central

Wang, Yangping; Wang, Song

2016-01-01

The nonrigid registration algorithm based on B-spline Free-Form Deformation (FFD) plays a key role and is widely applied in medical image processing due to the good flexibility and robustness. However, it requires a tremendous amount of computing time to obtain more accurate registration results especially for a large amount of medical image data. To address the issue, a parallel nonrigid registration algorithm based on B-spline is proposed in this paper. First, the Logarithm Squared Difference (LSD) is considered as the similarity metric in the B-spline registration algorithm to improve registration precision. After that, we create a parallel computing strategy and lookup tables (LUTs) to reduce the complexity of the B-spline registration algorithm. As a result, the computing time of three time-consuming steps including B-splines interpolation, LSD computation, and the analytic gradient computation of LSD, is efficiently reduced, for the B-spline registration algorithm employs the Nonlinear Conjugate Gradient (NCG) optimization method. Experimental results of registration quality and execution efficiency on the large amount of medical images show that our algorithm achieves a better registration accuracy in terms of the differences between the best deformation fields and ground truth and a speedup of 17 times over the single-threaded CPU implementation due to the powerful parallel computing ability of Graphics Processing Unit (GPU). PMID:28053653

High-Performance Computing for the Electromagnetic Modeling and Simulation of Interconnects

NASA Technical Reports Server (NTRS)

Schutt-Aine, Jose E.

1996-01-01

The electromagnetic modeling of packages and interconnects plays a very important role in the design of high-speed digital circuits, and is most efficiently performed by using computer-aided design algorithms. In recent years, packaging has become a critical area in the design of high-speed communication systems and fast computers, and the importance of the software support for their development has increased accordingly. Throughout this project, our efforts have focused on the development of modeling and simulation techniques and algorithms that permit the fast computation of the electrical parameters of interconnects and the efficient simulation of their electrical performance.
A fast Fourier transform on multipoles (FFTM) algorithm for solving Helmholtz equation in acoustics analysis.

PubMed

Ong, Eng Teo; Lee, Heow Pueh; Lim, Kian Meng

2004-09-01

This article presents a fast algorithm for the efficient solution of the Helmholtz equation. The method is based on the translation theory of the multipole expansions. Here, the speedup comes from the convolution nature of the translation operators, which can be evaluated rapidly using fast Fourier transform algorithms. Also, the computations of the translation operators are accelerated by using the recursive formulas developed recently by Gumerov and Duraiswami [SIAM J. Sci. Comput. 25, 1344-1381(2003)]. It is demonstrated that the algorithm can produce good accuracy with a relatively low order of expansion. Efficiency analyses of the algorithm reveal that it has computational complexities of O(Na), where a ranges from 1.05 to 1.24. However, this method requires substantially more memory to store the translation operators as compared to the fast multipole method. Hence, despite its simplicity in implementation, this memory requirement issue may limit the application of this algorithm to solving very large-scale problems.
Regional-scale calculation of the LS factor using parallel processing

NASA Astrophysics Data System (ADS)

Liu, Kai; Tang, Guoan; Jiang, Ling; Zhu, A.-Xing; Yang, Jianyi; Song, Xiaodong

2015-05-01

With the increase of data resolution and the increasing application of USLE over large areas, the existing serial implementation of algorithms for computing the LS factor is becoming a bottleneck. In this paper, a parallel processing model based on message passing interface (MPI) is presented for the calculation of the LS factor, so that massive datasets at a regional scale can be processed efficiently. The parallel model contains algorithms for calculating flow direction, flow accumulation, drainage network, slope, slope length and the LS factor. According to the existence of data dependence, the algorithms are divided into local algorithms and global algorithms. Parallel strategy are designed according to the algorithm characters including the decomposition method for maintaining the integrity of the results, optimized workflow for reducing the time taken for exporting the unnecessary intermediate data and a buffer-communication-computation strategy for improving the communication efficiency. Experiments on a multi-node system show that the proposed parallel model allows efficient calculation of the LS factor at a regional scale with a massive dataset.
Complexity of the Quantum Adiabatic Algorithm

NASA Astrophysics Data System (ADS)

Hen, Itay

2013-03-01

The Quantum Adiabatic Algorithm (QAA) has been proposed as a mechanism for efficiently solving optimization problems on a quantum computer. Since adiabatic computation is analog in nature and does not require the design and use of quantum gates, it can be thought of as a simpler and perhaps more profound method for performing quantum computations that might also be easier to implement experimentally. While these features have generated substantial research in QAA, to date there is still a lack of solid evidence that the algorithm can outperform classical optimization algorihms. Here, we discuss several aspects of the quantum adiabatic algorithm: We analyze the efficiency of the algorithm on several ``hard'' (NP) computational problems. Studying the size dependence of the typical minimum energy gap of the Hamiltonians of these problems using quantum Monte Carlo methods, we find that while for most problems the minimum gap decreases exponentially with the size of the problem, indicating that the QAA is not more efficient than existing classical search algorithms, for other problems there is evidence to suggest that the gap may be polynomial near the phase transition. We also discuss applications of the QAA to ``real life'' problems and how they can be implemented on currently available (albeit prototypical) quantum hardware such as ``D-Wave One'', that impose serious restrictions as to which type of problems may be tested. Finally, we discuss different approaches to find improved implementations of the algorithm such as local adiabatic evolution, adaptive methods, local search in Hamiltonian space and others.
A parallel computing engine for a class of time critical processes.

PubMed

Nabhan, T M; Zomaya, A Y

1997-01-01

This paper focuses on the efficient parallel implementation of systems of numerically intensive nature over loosely coupled multiprocessor architectures. These analytical models are of significant importance to many real-time systems that have to meet severe time constants. A parallel computing engine (PCE) has been developed in this work for the efficient simplification and the near optimal scheduling of numerical models over the different cooperating processors of the parallel computer. First, the analytical system is efficiently coded in its general form. The model is then simplified by using any available information (e.g., constant parameters). A task graph representing the interconnections among the different components (or equations) is generated. The graph can then be compressed to control the computation/communication requirements. The task scheduler employs a graph-based iterative scheme, based on the simulated annealing algorithm, to map the vertices of the task graph onto a Multiple-Instruction-stream Multiple-Data-stream (MIMD) type of architecture. The algorithm uses a nonanalytical cost function that properly considers the computation capability of the processors, the network topology, the communication time, and congestion possibilities. Moreover, the proposed technique is simple, flexible, and computationally viable. The efficiency of the algorithm is demonstrated by two case studies with good results.
Efficient 3D geometric and Zernike moments computation from unstructured surface meshes.

PubMed

Pozo, José María; Villa-Uriol, Maria-Cruz; Frangi, Alejandro F

2011-03-01

This paper introduces and evaluates a fast exact algorithm and a series of faster approximate algorithms for the computation of 3D geometric moments from an unstructured surface mesh of triangles. Being based on the object surface reduces the computational complexity of these algorithms with respect to volumetric grid-based algorithms. In contrast, it can only be applied for the computation of geometric moments of homogeneous objects. This advantage and restriction is shared with other proposed algorithms based on the object boundary. The proposed exact algorithm reduces the computational complexity for computing geometric moments up to order N with respect to previously proposed exact algorithms, from N(9) to N(6). The approximate series algorithm appears as a power series on the rate between triangle size and object size, which can be truncated at any desired degree. The higher the number and quality of the triangles, the better the approximation. This approximate algorithm reduces the computational complexity to N(3). In addition, the paper introduces a fast algorithm for the computation of 3D Zernike moments from the computed geometric moments, with a computational complexity N(4), while the previously proposed algorithm is of order N(6). The error introduced by the proposed approximate algorithms is evaluated in different shapes and the cost-benefit ratio in terms of error, and computational time is analyzed for different moment orders.
Efficient parallel resolution of the simplified transport equations in mixed-dual formulation

NASA Astrophysics Data System (ADS)

Barrault, M.; Lathuilière, B.; Ramet, P.; Roman, J.

2011-03-01

A reactivity computation consists of computing the highest eigenvalue of a generalized eigenvalue problem, for which an inverse power algorithm is commonly used. Very fine modelizations are difficult to treat for our sequential solver, based on the simplified transport equations, in terms of memory consumption and computational time. A first implementation of a Lagrangian based domain decomposition method brings to a poor parallel efficiency because of an increase in the power iterations [1]. In order to obtain a high parallel efficiency, we improve the parallelization scheme by changing the location of the loop over the subdomains in the overall algorithm and by benefiting from the characteristics of the Raviart-Thomas finite element. The new parallel algorithm still allows us to locally adapt the numerical scheme (mesh, finite element order). However, it can be significantly optimized for the matching grid case. The good behavior of the new parallelization scheme is demonstrated for the matching grid case on several hundreds of nodes for computations based on a pin-by-pin discretization.
Spatial compression algorithm for the analysis of very large multivariate images

DOEpatents

Keenan, Michael R [Albuquerque, NM

2008-07-15

A method for spatially compressing data sets enables the efficient analysis of very large multivariate images. The spatial compression algorithms use a wavelet transformation to map an image into a compressed image containing a smaller number of pixels that retain the original image's information content. Image analysis can then be performed on a compressed data matrix consisting of a reduced number of significant wavelet coefficients. Furthermore, a block algorithm can be used for performing common operations more efficiently. The spatial compression algorithms can be combined with spectral compression algorithms to provide further computational efficiencies.
Simulation of biochemical reactions with time-dependent rates by the rejection-based algorithm

DOE Office of Scientific and Technical Information (OSTI.GOV)

Thanh, Vo Hong, E-mail: vo@cosbi.eu; Priami, Corrado, E-mail: priami@cosbi.eu; Department of Mathematics, University of Trento, Trento

We address the problem of simulating biochemical reaction networks with time-dependent rates and propose a new algorithm based on our rejection-based stochastic simulation algorithm (RSSA) [Thanh et al., J. Chem. Phys. 141(13), 134116 (2014)]. The computation for selecting next reaction firings by our time-dependent RSSA (tRSSA) is computationally efficient. Furthermore, the generated trajectory is exact by exploiting the rejection-based mechanism. We benchmark tRSSA on different biological systems with varying forms of reaction rates to demonstrate its applicability and efficiency. We reveal that for nontrivial cases, the selection of reaction firings in existing algorithms introduces approximations because the integration of reactionmore » rates is very computationally demanding and simplifying assumptions are introduced. The selection of the next reaction firing by our approach is easier while preserving the exactness.« less
Accelerating global optimization of aerodynamic shapes using a new surrogate-assisted parallel genetic algorithm

NASA Astrophysics Data System (ADS)

Ebrahimi, Mehdi; Jahangirian, Alireza

2017-12-01

An efficient strategy is presented for global shape optimization of wing sections with a parallel genetic algorithm. Several computational techniques are applied to increase the convergence rate and the efficiency of the method. A variable fidelity computational evaluation method is applied in which the expensive Navier-Stokes flow solver is complemented by an inexpensive multi-layer perceptron neural network for the objective function evaluations. A population dispersion method that consists of two phases, of exploration and refinement, is developed to improve the convergence rate and the robustness of the genetic algorithm. Owing to the nature of the optimization problem, a parallel framework based on the master/slave approach is used. The outcomes indicate that the method is able to find the global optimum with significantly lower computational time in comparison to the conventional genetic algorithm.
STAR adaptation of QR algorithm. [program for solving over-determined systems of linear equations

NASA Technical Reports Server (NTRS)

Shah, S. N.

1981-01-01

The QR algorithm used on a serial computer and executed on the Control Data Corporation 6000 Computer was adapted to execute efficiently on the Control Data STAR-100 computer. How the scalar program was adapted for the STAR-100 and why these adaptations yielded an efficient STAR program is described. Program listings of the old scalar version and the vectorized SL/1 version are presented in the appendices. Execution times for the two versions applied to the same system of linear equations, are compared.
Approximate Algorithms for Computing Spatial Distance Histograms with Accuracy Guarantees

PubMed Central

Grupcev, Vladimir; Yuan, Yongke; Tu, Yi-Cheng; Huang, Jin; Chen, Shaoping; Pandit, Sagar; Weng, Michael

2014-01-01

Particle simulation has become an important research tool in many scientific and engineering fields. Data generated by such simulations impose great challenges to database storage and query processing. One of the queries against particle simulation data, the spatial distance histogram (SDH) query, is the building block of many high-level analytics, and requires quadratic time to compute using a straightforward algorithm. Previous work has developed efficient algorithms that compute exact SDHs. While beating the naive solution, such algorithms are still not practical in processing SDH queries against large-scale simulation data. In this paper, we take a different path to tackle this problem by focusing on approximate algorithms with provable error bounds. We first present a solution derived from the aforementioned exact SDH algorithm, and this solution has running time that is unrelated to the system size N. We also develop a mathematical model to analyze the mechanism that leads to errors in the basic approximate algorithm. Our model provides insights on how the algorithm can be improved to achieve higher accuracy and efficiency. Such insights give rise to a new approximate algorithm with improved time/accuracy tradeoff. Experimental results confirm our analysis. PMID:24693210
Complexity of the Quantum Adiabatic Algorithm

NASA Technical Reports Server (NTRS)

Hen, Itay

2013-01-01

The Quantum Adiabatic Algorithm (QAA) has been proposed as a mechanism for efficiently solving optimization problems on a quantum computer. Since adiabatic computation is analog in nature and does not require the design and use of quantum gates, it can be thought of as a simpler and perhaps more profound method for performing quantum computations that might also be easier to implement experimentally. While these features have generated substantial research in QAA, to date there is still a lack of solid evidence that the algorithm can outperform classical optimization algorithms.
Efficient volume computation for three-dimensional hexahedral cells

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dukowicz, J.K.

1988-02-01

Currently, algorithms for computing the volume of hexahedral cells with ''ruled'' surfaces require a minimum of 122 FLOPs (floating point operations) per cell. A new algorithm is described which reduces the operation count to 57 FLOPs per cell. copyright 1988 Academic Press, Inc.
Progress on a Taylor weak statement finite element algorithm for high-speed aerodynamic flows

NASA Technical Reports Server (NTRS)

Baker, A. J.; Freels, J. D.

1989-01-01

A new finite element numerical Computational Fluid Dynamics (CFD) algorithm has matured to the point of efficiently solving two-dimensional high speed real-gas compressible flow problems in generalized coordinates on modern vector computer systems. The algorithm employs a Taylor Weak Statement classical Galerkin formulation, a variably implicit Newton iteration, and a tensor matrix product factorization of the linear algebra Jacobian under a generalized coordinate transformation. Allowing for a general two-dimensional conservation law system, the algorithm has been exercised on the Euler and laminar forms of the Navier-Stokes equations. Real-gas fluid properties are admitted, and numerical results verify solution accuracy, efficiency, and stability over a range of test problem parameters.
Dynamic displacement measurement of large-scale structures based on the Lucas-Kanade template tracking algorithm

NASA Astrophysics Data System (ADS)

Guo, Jie; Zhu, Chang`an

2016-01-01

The development of optics and computer technologies enables the application of the vision-based technique that uses digital cameras to the displacement measurement of large-scale structures. Compared with traditional contact measurements, vision-based technique allows for remote measurement, has a non-intrusive characteristic, and does not necessitate mass introduction. In this study, a high-speed camera system is developed to complete the displacement measurement in real time. The system consists of a high-speed camera and a notebook computer. The high-speed camera can capture images at a speed of hundreds of frames per second. To process the captured images in computer, the Lucas-Kanade template tracking algorithm in the field of computer vision is introduced. Additionally, a modified inverse compositional algorithm is proposed to reduce the computing time of the original algorithm and improve the efficiency further. The modified algorithm can rapidly accomplish one displacement extraction within 1 ms without having to install any pre-designed target panel onto the structures in advance. The accuracy and the efficiency of the system in the remote measurement of dynamic displacement are demonstrated in the experiments on motion platform and sound barrier on suspension viaduct. Experimental results show that the proposed algorithm can extract accurate displacement signal and accomplish the vibration measurement of large-scale structures.
Optimization Algorithm for Kalman Filter Exploiting the Numerical Characteristics of SINS/GPS Integrated Navigation Systems.

PubMed

Hu, Shaoxing; Xu, Shike; Wang, Duhu; Zhang, Aiwu

2015-11-11

Aiming at addressing the problem of high computational cost of the traditional Kalman filter in SINS/GPS, a practical optimization algorithm with offline-derivation and parallel processing methods based on the numerical characteristics of the system is presented in this paper. The algorithm exploits the sparseness and/or symmetry of matrices to simplify the computational procedure. Thus plenty of invalid operations can be avoided by offline derivation using a block matrix technique. For enhanced efficiency, a new parallel computational mechanism is established by subdividing and restructuring calculation processes after analyzing the extracted "useful" data. As a result, the algorithm saves about 90% of the CPU processing time and 66% of the memory usage needed in a classical Kalman filter. Meanwhile, the method as a numerical approach needs no precise-loss transformation/approximation of system modules and the accuracy suffers little in comparison with the filter before computational optimization. Furthermore, since no complicated matrix theories are needed, the algorithm can be easily transplanted into other modified filters as a secondary optimization method to achieve further efficiency.
Rapid Calculation of Max-Min Fair Rates for Multi-Commodity Flows in Fat-Tree Networks

DOE PAGES

Mollah, Md Atiqul; Yuan, Xin; Pakin, Scott; ...

2017-08-29

Max-min fairness is often used in the performance modeling of interconnection networks. Existing methods to compute max-min fair rates for multi-commodity flows have high complexity and are computationally infeasible for large networks. In this paper, we show that by considering topological features, this problem can be solved efficiently for the fat-tree topology that is widely used in data centers and high performance compute clusters. Several efficient new algorithms are developed for this problem, including a parallel algorithm that can take advantage of multi-core and shared-memory architectures. Using these algorithms, we demonstrate that it is possible to find the max-min fairmore » rate allocation for multi-commodity flows in fat-tree networks that support tens of thousands of nodes. We evaluate the run-time performance of the proposed algorithms and show improvement in orders of magnitude over the previously best known method. Finally, we further demonstrate a new application of max-min fair rate allocation that is only computationally feasible using our new algorithms.« less
Rapid Calculation of Max-Min Fair Rates for Multi-Commodity Flows in Fat-Tree Networks

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mollah, Md Atiqul; Yuan, Xin; Pakin, Scott

Max-min fairness is often used in the performance modeling of interconnection networks. Existing methods to compute max-min fair rates for multi-commodity flows have high complexity and are computationally infeasible for large networks. In this paper, we show that by considering topological features, this problem can be solved efficiently for the fat-tree topology that is widely used in data centers and high performance compute clusters. Several efficient new algorithms are developed for this problem, including a parallel algorithm that can take advantage of multi-core and shared-memory architectures. Using these algorithms, we demonstrate that it is possible to find the max-min fairmore » rate allocation for multi-commodity flows in fat-tree networks that support tens of thousands of nodes. We evaluate the run-time performance of the proposed algorithms and show improvement in orders of magnitude over the previously best known method. Finally, we further demonstrate a new application of max-min fair rate allocation that is only computationally feasible using our new algorithms.« less
EMILiO: a fast algorithm for genome-scale strain design.

PubMed

Yang, Laurence; Cluett, William R; Mahadevan, Radhakrishnan

2011-05-01

Systems-level design of cell metabolism is becoming increasingly important for renewable production of fuels, chemicals, and drugs. Computational models are improving in the accuracy and scope of predictions, but are also growing in complexity. Consequently, efficient and scalable algorithms are increasingly important for strain design. Previous algorithms helped to consolidate the utility of computational modeling in this field. To meet intensifying demands for high-performance strains, both the number and variety of genetic manipulations involved in strain construction are increasing. Existing algorithms have experienced combinatorial increases in computational complexity when applied toward the design of such complex strains. Here, we present EMILiO, a new algorithm that increases the scope of strain design to include reactions with individually optimized fluxes. Unlike existing approaches that would experience an explosion in complexity to solve this problem, we efficiently generated numerous alternate strain designs producing succinate, l-glutamate and l-serine. This was enabled by successive linear programming, a technique new to the area of computational strain design. Copyright © 2011 Elsevier Inc. All rights reserved.

Hierarchical layered and semantic-based image segmentation using ergodicity map

NASA Astrophysics Data System (ADS)

Yadegar, Jacob; Liu, Xiaoqing

2010-04-01

Image segmentation plays a foundational role in image understanding and computer vision. Although great strides have been made and progress achieved on automatic/semi-automatic image segmentation algorithms, designing a generic, robust, and efficient image segmentation algorithm is still challenging. Human vision is still far superior compared to computer vision, especially in interpreting semantic meanings/objects in images. We present a hierarchical/layered semantic image segmentation algorithm that can automatically and efficiently segment images into hierarchical layered/multi-scaled semantic regions/objects with contextual topological relationships. The proposed algorithm bridges the gap between high-level semantics and low-level visual features/cues (such as color, intensity, edge, etc.) through utilizing a layered/hierarchical ergodicity map, where ergodicity is computed based on a space filling fractal concept and used as a region dissimilarity measurement. The algorithm applies a highly scalable, efficient, and adaptive Peano- Cesaro triangulation/tiling technique to decompose the given image into a set of similar/homogenous regions based on low-level visual cues in a top-down manner. The layered/hierarchical ergodicity map is built through a bottom-up region dissimilarity analysis. The recursive fractal sweep associated with the Peano-Cesaro triangulation provides efficient local multi-resolution refinement to any level of detail. The generated binary decomposition tree also provides efficient neighbor retrieval mechanisms for contextual topological object/region relationship generation. Experiments have been conducted within the maritime image environment where the segmented layered semantic objects include the basic level objects (i.e. sky/land/water) and deeper level objects in the sky/land/water surfaces. Experimental results demonstrate the proposed algorithm has the capability to robustly and efficiently segment images into layered semantic objects/regions with contextual topological relationships.
An efficient non-dominated sorting method for evolutionary algorithms.

PubMed

Fang, Hongbing; Wang, Qian; Tu, Yi-Cheng; Horstemeyer, Mark F

2008-01-01

We present a new non-dominated sorting algorithm to generate the non-dominated fronts in multi-objective optimization with evolutionary algorithms, particularly the NSGA-II. The non-dominated sorting algorithm used by NSGA-II has a time complexity of O(MN(2)) in generating non-dominated fronts in one generation (iteration) for a population size N and M objective functions. Since generating non-dominated fronts takes the majority of total computational time (excluding the cost of fitness evaluations) of NSGA-II, making this algorithm faster will significantly improve the overall efficiency of NSGA-II and other genetic algorithms using non-dominated sorting. The new non-dominated sorting algorithm proposed in this study reduces the number of redundant comparisons existing in the algorithm of NSGA-II by recording the dominance information among solutions from their first comparisons. By utilizing a new data structure called the dominance tree and the divide-and-conquer mechanism, the new algorithm is faster than NSGA-II for different numbers of objective functions. Although the number of solution comparisons by the proposed algorithm is close to that of NSGA-II when the number of objectives becomes large, the total computational time shows that the proposed algorithm still has better efficiency because of the adoption of the dominance tree structure and the divide-and-conquer mechanism.
Efficient storage, computation, and exposure of computer-generated holograms by electron-beam lithography.

PubMed

Newman, D M; Hawley, R W; Goeckel, D L; Crawford, R D; Abraham, S; Gallagher, N C

1993-05-10

An efficient storage format was developed for computer-generated holograms for use in electron-beam lithography. This method employs run-length encoding and Lempel-Ziv-Welch compression and succeeds in exposing holograms that were previously infeasible owing to the hologram's tremendous pattern-data file size. These holograms also require significant computation; thus the algorithm was implemented on a parallel computer, which improved performance by 2 orders of magnitude. The decompression algorithm was integrated into the Cambridge electron-beam machine's front-end processor.Although this provides much-needed ability, some hardware enhancements will be required in the future to overcome inadequacies in the current front-end processor that result in a lengthy exposure time.
Efficient Computational Prototyping of Mixed Technology Microfluidic Components and Systems

DTIC Science & Technology

2002-08-01

AFRL-IF-RS-TR-2002-190 Final Technical Report August 2002 EFFICIENT COMPUTATIONAL PROTOTYPING OF MIXED TECHNOLOGY MICROFLUIDIC...SUBTITLE EFFICIENT COMPUTATIONAL PROTOTYPING OF MIXED TECHNOLOGY MICROFLUIDIC COMPONENTS AND SYSTEMS 6. AUTHOR(S) Narayan R. Aluru, Jacob White...Aided Design (CAD) tools for microfluidic components and systems were developed in this effort. Innovative numerical methods and algorithms for mixed
Constraint treatment techniques and parallel algorithms for multibody dynamic analysis. Ph.D. Thesis

NASA Technical Reports Server (NTRS)

Chiou, Jin-Chern

1990-01-01

Computational procedures for kinematic and dynamic analysis of three-dimensional multibody dynamic (MBD) systems are developed from the differential-algebraic equations (DAE's) viewpoint. Constraint violations during the time integration process are minimized and penalty constraint stabilization techniques and partitioning schemes are developed. The governing equations of motion, a two-stage staggered explicit-implicit numerical algorithm, are treated which takes advantage of a partitioned solution procedure. A robust and parallelizable integration algorithm is developed. This algorithm uses a two-stage staggered central difference algorithm to integrate the translational coordinates and the angular velocities. The angular orientations of bodies in MBD systems are then obtained by using an implicit algorithm via the kinematic relationship between Euler parameters and angular velocities. It is shown that the combination of the present solution procedures yields a computationally more accurate solution. To speed up the computational procedures, parallel implementation of the present constraint treatment techniques, the two-stage staggered explicit-implicit numerical algorithm was efficiently carried out. The DAE's and the constraint treatment techniques were transformed into arrowhead matrices to which Schur complement form was derived. By fully exploiting the sparse matrix structural analysis techniques, a parallel preconditioned conjugate gradient numerical algorithm is used to solve the systems equations written in Schur complement form. A software testbed was designed and implemented in both sequential and parallel computers. This testbed was used to demonstrate the robustness and efficiency of the constraint treatment techniques, the accuracy of the two-stage staggered explicit-implicit numerical algorithm, and the speed up of the Schur-complement-based parallel preconditioned conjugate gradient algorithm on a parallel computer.
A Computationally Efficient Parallel Levenberg-Marquardt Algorithm for Large-Scale Big-Data Inversion

NASA Astrophysics Data System (ADS)

Lin, Y.; O'Malley, D.; Vesselinov, V. V.

2015-12-01

Inverse modeling seeks model parameters given a set of observed state variables. However, for many practical problems due to the facts that the observed data sets are often large and model parameters are often numerous, conventional methods for solving the inverse modeling can be computationally expensive. We have developed a new, computationally-efficient Levenberg-Marquardt method for solving large-scale inverse modeling. Levenberg-Marquardt methods require the solution of a dense linear system of equations which can be prohibitively expensive to compute for large-scale inverse problems. Our novel method projects the original large-scale linear problem down to a Krylov subspace, such that the dimensionality of the measurements can be significantly reduced. Furthermore, instead of solving the linear system for every Levenberg-Marquardt damping parameter, we store the Krylov subspace computed when solving the first damping parameter and recycle it for all the following damping parameters. The efficiency of our new inverse modeling algorithm is significantly improved by using these computational techniques. We apply this new inverse modeling method to invert for a random transitivity field. Our algorithm is fast enough to solve for the distributed model parameters (transitivity) at each computational node in the model domain. The inversion is also aided by the use regularization techniques. The algorithm is coded in Julia and implemented in the MADS computational framework (http://mads.lanl.gov). Julia is an advanced high-level scientific programing language that allows for efficient memory management and utilization of high-performance computational resources. By comparing with a Levenberg-Marquardt method using standard linear inversion techniques, our Levenberg-Marquardt method yields speed-up ratio of 15 in a multi-core computational environment and a speed-up ratio of 45 in a single-core computational environment. Therefore, our new inverse modeling method is a powerful tool for large-scale applications.
Computationally efficient algorithms for real-time attitude estimation

NASA Technical Reports Server (NTRS)

Pringle, Steven R.

1993-01-01

For many practical spacecraft applications, algorithms for determining spacecraft attitude must combine inputs from diverse sensors and provide redundancy in the event of sensor failure. A Kalman filter is suitable for this task, however, it may impose a computational burden which may be avoided by sub optimal methods. A suboptimal estimator is presented which was implemented successfully on the Delta Star spacecraft which performed a 9 month SDI flight experiment in 1989. This design sought to minimize algorithm complexity to accommodate the limitations of an 8K guidance computer. The algorithm used is interpreted in the framework of Kalman filtering and a derivation is given for the computation.
Efficient Multiple Kernel Learning Algorithms Using Low-Rank Representation.

PubMed

Niu, Wenjia; Xia, Kewen; Zu, Baokai; Bai, Jianchuan

2017-01-01

Unlike Support Vector Machine (SVM), Multiple Kernel Learning (MKL) allows datasets to be free to choose the useful kernels based on their distribution characteristics rather than a precise one. It has been shown in the literature that MKL holds superior recognition accuracy compared with SVM, however, at the expense of time consuming computations. This creates analytical and computational difficulties in solving MKL algorithms. To overcome this issue, we first develop a novel kernel approximation approach for MKL and then propose an efficient Low-Rank MKL (LR-MKL) algorithm by using the Low-Rank Representation (LRR). It is well-acknowledged that LRR can reduce dimension while retaining the data features under a global low-rank constraint. Furthermore, we redesign the binary-class MKL as the multiclass MKL based on pairwise strategy. Finally, the recognition effect and efficiency of LR-MKL are verified on the datasets Yale, ORL, LSVT, and Digit. Experimental results show that the proposed LR-MKL algorithm is an efficient kernel weights allocation method in MKL and boosts the performance of MKL largely.
Unified commutation-pruning technique for efficient computation of composite DFTs

NASA Astrophysics Data System (ADS)

Castro-Palazuelos, David E.; Medina-Melendrez, Modesto Gpe.; Torres-Roman, Deni L.; Shkvarko, Yuriy V.

2015-12-01

An efficient computation of a composite length discrete Fourier transform (DFT), as well as a fast Fourier transform (FFT) of both time and space data sequences in uncertain (non-sparse or sparse) computational scenarios, requires specific processing algorithms. Traditional algorithms typically employ some pruning methods without any commutations, which prevents them from attaining the potential computational efficiency. In this paper, we propose an alternative unified approach with automatic commutations between three computational modalities aimed at efficient computations of the pruned DFTs adapted for variable composite lengths of the non-sparse input-output data. The first modality is an implementation of the direct computation of a composite length DFT, the second one employs the second-order recursive filtering method, and the third one performs the new pruned decomposed transform. The pruned decomposed transform algorithm performs the decimation in time or space (DIT) data acquisition domain and, then, decimation in frequency (DIF). The unified combination of these three algorithms is addressed as the DFTCOMM technique. Based on the treatment of the combinational-type hypotheses testing optimization problem of preferable allocations between all feasible commuting-pruning modalities, we have found the global optimal solution to the pruning problem that always requires a fewer or, at most, the same number of arithmetic operations than other feasible modalities. The DFTCOMM method outperforms the existing competing pruning techniques in the sense of attainable savings in the number of required arithmetic operations. It requires fewer or at most the same number of arithmetic operations for its execution than any other of the competing pruning methods reported in the literature. Finally, we provide the comparison of the DFTCOMM with the recently developed sparse fast Fourier transform (SFFT) algorithmic family. We feature that, in the sensing scenarios with sparse/non-sparse data Fourier spectrum, the DFTCOMM technique manifests robustness against such model uncertainties in the sense of insensitivity for sparsity/non-sparsity restrictions and the variability of the operating parameters.
Automatic Blocking Of QR and LU Factorizations for Locality

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yi, Q; Kennedy, K; You, H

2004-03-26

QR and LU factorizations for dense matrices are important linear algebra computations that are widely used in scientific applications. To efficiently perform these computations on modern computers, the factorization algorithms need to be blocked when operating on large matrices to effectively exploit the deep cache hierarchy prevalent in today's computer memory systems. Because both QR (based on Householder transformations) and LU factorization algorithms contain complex loop structures, few compilers can fully automate the blocking of these algorithms. Though linear algebra libraries such as LAPACK provides manually blocked implementations of these algorithms, by automatically generating blocked versions of the computations, moremore » benefit can be gained such as automatic adaptation of different blocking strategies. This paper demonstrates how to apply an aggressive loop transformation technique, dependence hoisting, to produce efficient blockings for both QR and LU with partial pivoting. We present different blocking strategies that can be generated by our optimizer and compare the performance of auto-blocked versions with manually tuned versions in LAPACK, both using reference BLAS, ATLAS BLAS and native BLAS specially tuned for the underlying machine architectures.« less
A Genetic Algorithm That Exchanges Neighboring Centers for Fuzzy c-Means Clustering

ERIC Educational Resources Information Center

Chahine, Firas Safwan

2012-01-01

Clustering algorithms are widely used in pattern recognition and data mining applications. Due to their computational efficiency, partitional clustering algorithms are better suited for applications with large datasets than hierarchical clustering algorithms. K-means is among the most popular partitional clustering algorithm, but has a major…
Computationally Efficient Power Allocation Algorithm in Multicarrier-Based Cognitive Radio Networks: OFDM and FBMC Systems

NASA Astrophysics Data System (ADS)

Shaat, Musbah; Bader, Faouzi

2010-12-01

Cognitive Radio (CR) systems have been proposed to increase the spectrum utilization by opportunistically access the unused spectrum. Multicarrier communication systems are promising candidates for CR systems. Due to its high spectral efficiency, filter bank multicarrier (FBMC) can be considered as an alternative to conventional orthogonal frequency division multiplexing (OFDM) for transmission over the CR networks. This paper addresses the problem of resource allocation in multicarrier-based CR networks. The objective is to maximize the downlink capacity of the network under both total power and interference introduced to the primary users (PUs) constraints. The optimal solution has high computational complexity which makes it unsuitable for practical applications and hence a low complexity suboptimal solution is proposed. The proposed algorithm utilizes the spectrum holes in PUs bands as well as active PU bands. The performance of the proposed algorithm is investigated for OFDM and FBMC based CR systems. Simulation results illustrate that the proposed resource allocation algorithm with low computational complexity achieves near optimal performance and proves the efficiency of using FBMC in CR context.
Investigating power capping toward energy-efficient scientific applications: Investigating Power Capping toward Energy-Efficient Scientific Applications

DOE PAGES

Haidar, Azzam; Jagode, Heike; Vaccaro, Phil; ...

2018-03-22

The emergence of power efficiency as a primary constraint in processor and system design poses new challenges concerning power and energy awareness for numerical libraries and scientific applications. Power consumption also plays a major role in the design of data centers, which may house petascale or exascale-level computing systems. At these extreme scales, understanding and improving the energy efficiency of numerical libraries and their related applications becomes a crucial part of the successful implementation and operation of the computing system. In this paper, we study and investigate the practice of controlling a compute system's power usage, and we explore howmore » different power caps affect the performance of numerical algorithms with different computational intensities. Further, we determine the impact, in terms of performance and energy usage, that these caps have on a system running scientific applications. This analysis will enable us to characterize the types of algorithms that benefit most from these power management schemes. Our experiments are performed using a set of representative kernels and several popular scientific benchmarks. Lastly, we quantify a number of power and performance measurements and draw observations and conclusions that can be viewed as a roadmap to achieving energy efficiency in the design and execution of scientific algorithms.« less
Investigating power capping toward energy-efficient scientific applications: Investigating Power Capping toward Energy-Efficient Scientific Applications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Haidar, Azzam; Jagode, Heike; Vaccaro, Phil

The emergence of power efficiency as a primary constraint in processor and system design poses new challenges concerning power and energy awareness for numerical libraries and scientific applications. Power consumption also plays a major role in the design of data centers, which may house petascale or exascale-level computing systems. At these extreme scales, understanding and improving the energy efficiency of numerical libraries and their related applications becomes a crucial part of the successful implementation and operation of the computing system. In this paper, we study and investigate the practice of controlling a compute system's power usage, and we explore howmore » different power caps affect the performance of numerical algorithms with different computational intensities. Further, we determine the impact, in terms of performance and energy usage, that these caps have on a system running scientific applications. This analysis will enable us to characterize the types of algorithms that benefit most from these power management schemes. Our experiments are performed using a set of representative kernels and several popular scientific benchmarks. Lastly, we quantify a number of power and performance measurements and draw observations and conclusions that can be viewed as a roadmap to achieving energy efficiency in the design and execution of scientific algorithms.« less
Demonstration of essentiality of entanglement in a Deutsch-like quantum algorithm

NASA Astrophysics Data System (ADS)

Huang, He-Liang; Goswami, Ashutosh K.; Bao, Wan-Su; Panigrahi, Prasanta K.

2018-06-01

Quantum algorithms can be used to efficiently solve certain classically intractable problems by exploiting quantum parallelism. However, the effectiveness of quantum entanglement in quantum computing remains a question of debate. This study presents a new quantum algorithm that shows entanglement could provide advantages over both classical algorithms and quantum algo- rithms without entanglement. Experiments are implemented to demonstrate the proposed algorithm using superconducting qubits. Results show the viability of the algorithm and suggest that entanglement is essential in obtaining quantum speedup for certain problems in quantum computing. The study provides reliable and clear guidance for developing useful quantum algorithms.
An Efficient Supervised Training Algorithm for Multilayer Spiking Neural Networks

PubMed Central

Xie, Xiurui; Qu, Hong; Liu, Guisong; Zhang, Malu; Kurths, Jürgen

2016-01-01

The spiking neural networks (SNNs) are the third generation of neural networks and perform remarkably well in cognitive tasks such as pattern recognition. The spike emitting and information processing mechanisms found in biological cognitive systems motivate the application of the hierarchical structure and temporal encoding mechanism in spiking neural networks, which have exhibited strong computational capability. However, the hierarchical structure and temporal encoding approach require neurons to process information serially in space and time respectively, which reduce the training efficiency significantly. For training the hierarchical SNNs, most existing methods are based on the traditional back-propagation algorithm, inheriting its drawbacks of the gradient diffusion and the sensitivity on parameters. To keep the powerful computation capability of the hierarchical structure and temporal encoding mechanism, but to overcome the low efficiency of the existing algorithms, a new training algorithm, the Normalized Spiking Error Back Propagation (NSEBP) is proposed in this paper. In the feedforward calculation, the output spike times are calculated by solving the quadratic function in the spike response model instead of detecting postsynaptic voltage states at all time points in traditional algorithms. Besides, in the feedback weight modification, the computational error is propagated to previous layers by the presynaptic spike jitter instead of the gradient decent rule, which realizes the layer-wised training. Furthermore, our algorithm investigates the mathematical relation between the weight variation and voltage error change, which makes the normalization in the weight modification applicable. Adopting these strategies, our algorithm outperforms the traditional SNN multi-layer algorithms in terms of learning efficiency and parameter sensitivity, that are also demonstrated by the comprehensive experimental results in this paper. PMID:27044001
A novel iris localization algorithm using correlation filtering

NASA Astrophysics Data System (ADS)

Pohit, Mausumi; Sharma, Jitu

2015-06-01

Fast and efficient segmentation of iris from the eye images is a primary requirement for robust database independent iris recognition. In this paper we have presented a new algorithm for computing the inner and outer boundaries of the iris and locating the pupil centre. Pupil-iris boundary computation is based on correlation filtering approach, whereas iris-sclera boundary is determined through one dimensional intensity mapping. The proposed approach is computationally less extensive when compared with the existing algorithms like Hough transform.
A High Performance Cloud-Based Protein-Ligand Docking Prediction Algorithm

PubMed Central

Chen, Jui-Le; Yang, Chu-Sing

2013-01-01

The potential of predicting druggability for a particular disease by integrating biological and computer science technologies has witnessed success in recent years. Although the computer science technologies can be used to reduce the costs of the pharmaceutical research, the computation time of the structure-based protein-ligand docking prediction is still unsatisfied until now. Hence, in this paper, a novel docking prediction algorithm, named fast cloud-based protein-ligand docking prediction algorithm (FCPLDPA), is presented to accelerate the docking prediction algorithm. The proposed algorithm works by leveraging two high-performance operators: (1) the novel migration (information exchange) operator is designed specially for cloud-based environments to reduce the computation time; (2) the efficient operator is aimed at filtering out the worst search directions. Our simulation results illustrate that the proposed method outperforms the other docking algorithms compared in this paper in terms of both the computation time and the quality of the end result. PMID:23762864
Algorithmic Mechanism Design of Evolutionary Computation.

PubMed

Pei, Yan

2015-01-01

We consider algorithmic design, enhancement, and improvement of evolutionary computation as a mechanism design problem. All individuals or several groups of individuals can be considered as self-interested agents. The individuals in evolutionary computation can manipulate parameter settings and operations by satisfying their own preferences, which are defined by an evolutionary computation algorithm designer, rather than by following a fixed algorithm rule. Evolutionary computation algorithm designers or self-adaptive methods should construct proper rules and mechanisms for all agents (individuals) to conduct their evolution behaviour correctly in order to definitely achieve the desired and preset objective(s). As a case study, we propose a formal framework on parameter setting, strategy selection, and algorithmic design of evolutionary computation by considering the Nash strategy equilibrium of a mechanism design in the search process. The evaluation results present the efficiency of the framework. This primary principle can be implemented in any evolutionary computation algorithm that needs to consider strategy selection issues in its optimization process. The final objective of our work is to solve evolutionary computation design as an algorithmic mechanism design problem and establish its fundamental aspect by taking this perspective. This paper is the first step towards achieving this objective by implementing a strategy equilibrium solution (such as Nash equilibrium) in evolutionary computation algorithm.
Algorithmic Mechanism Design of Evolutionary Computation

PubMed Central

2015-01-01

We consider algorithmic design, enhancement, and improvement of evolutionary computation as a mechanism design problem. All individuals or several groups of individuals can be considered as self-interested agents. The individuals in evolutionary computation can manipulate parameter settings and operations by satisfying their own preferences, which are defined by an evolutionary computation algorithm designer, rather than by following a fixed algorithm rule. Evolutionary computation algorithm designers or self-adaptive methods should construct proper rules and mechanisms for all agents (individuals) to conduct their evolution behaviour correctly in order to definitely achieve the desired and preset objective(s). As a case study, we propose a formal framework on parameter setting, strategy selection, and algorithmic design of evolutionary computation by considering the Nash strategy equilibrium of a mechanism design in the search process. The evaluation results present the efficiency of the framework. This primary principle can be implemented in any evolutionary computation algorithm that needs to consider strategy selection issues in its optimization process. The final objective of our work is to solve evolutionary computation design as an algorithmic mechanism design problem and establish its fundamental aspect by taking this perspective. This paper is the first step towards achieving this objective by implementing a strategy equilibrium solution (such as Nash equilibrium) in evolutionary computation algorithm. PMID:26257777

Robust efficient video fingerprinting

NASA Astrophysics Data System (ADS)

Puri, Manika; Lubin, Jeffrey

2009-02-01

We have developed a video fingerprinting system with robustness and efficiency as the primary and secondary design criteria. In extensive testing, the system has shown robustness to cropping, letter-boxing, sub-titling, blur, drastic compression, frame rate changes, size changes and color changes, as well as to the geometric distortions often associated with camcorder capture in cinema settings. Efficiency is afforded by a novel two-stage detection process in which a fast matching process first computes a number of likely candidates, which are then passed to a second slower process that computes the overall best match with minimal false alarm probability. One key component of the algorithm is a maximally stable volume computation - a three-dimensional generalization of maximally stable extremal regions - that provides a content-centric coordinate system for subsequent hash function computation, independent of any affine transformation or extensive cropping. Other key features include an efficient bin-based polling strategy for initial candidate selection, and a final SIFT feature-based computation for final verification. We describe the algorithm and its performance, and then discuss additional modifications that can provide further improvement to efficiency and accuracy.
Non-Evolutionary Algorithms for Scheduling Dependent Tasks in Distributed Heterogeneous Computing Environments

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wayne F. Boyer; Gurdeep S. Hura

2005-09-01

The Problem of obtaining an optimal matching and scheduling of interdependent tasks in distributed heterogeneous computing (DHC) environments is well known to be an NP-hard problem. In a DHC system, task execution time is dependent on the machine to which it is assigned and task precedence constraints are represented by a directed acyclic graph. Recent research in evolutionary techniques has shown that genetic algorithms usually obtain more efficient schedules that other known algorithms. We propose a non-evolutionary random scheduling (RS) algorithm for efficient matching and scheduling of inter-dependent tasks in a DHC system. RS is a succession of randomized taskmore » orderings and a heuristic mapping from task order to schedule. Randomized task ordering is effectively a topological sort where the outcome may be any possible task order for which the task precedent constraints are maintained. A detailed comparison to existing evolutionary techniques (GA and PSGA) shows the proposed algorithm is less complex than evolutionary techniques, computes schedules in less time, requires less memory and fewer tuning parameters. Simulation results show that the average schedules produced by RS are approximately as efficient as PSGA schedules for all cases studied and clearly more efficient than PSGA for certain cases. The standard formulation for the scheduling problem addressed in this paper is Rm|prec|Cmax.,« less
An algorithm of discovering signatures from DNA databases on a computer cluster.

PubMed

Lee, Hsiao Ping; Sheu, Tzu-Fang

2014-10-05

Signatures are short sequences that are unique and not similar to any other sequence in a database that can be used as the basis to identify different species. Even though several signature discovery algorithms have been proposed in the past, these algorithms require the entirety of databases to be loaded in the memory, thus restricting the amount of data that they can process. It makes those algorithms unable to process databases with large amounts of data. Also, those algorithms use sequential models and have slower discovery speeds, meaning that the efficiency can be improved. In this research, we are debuting the utilization of a divide-and-conquer strategy in signature discovery and have proposed a parallel signature discovery algorithm on a computer cluster. The algorithm applies the divide-and-conquer strategy to solve the problem posed to the existing algorithms where they are unable to process large databases and uses a parallel computing mechanism to effectively improve the efficiency of signature discovery. Even when run with just the memory of regular personal computers, the algorithm can still process large databases such as the human whole-genome EST database which were previously unable to be processed by the existing algorithms. The algorithm proposed in this research is not limited by the amount of usable memory and can rapidly find signatures in large databases, making it useful in applications such as Next Generation Sequencing and other large database analysis and processing. The implementation of the proposed algorithm is available at http://www.cs.pu.edu.tw/~fang/DDCSDPrograms/DDCSD.htm.
Towards developing robust algorithms for solving partial differential equations on MIMD machines

NASA Technical Reports Server (NTRS)

Saltz, Joel H.; Naik, Vijay K.

1988-01-01

Methods for efficient computation of numerical algorithms on a wide variety of MIMD machines are proposed. These techniques reorganize the data dependency patterns to improve the processor utilization. The model problem finds the time-accurate solution to a parabolic partial differential equation discretized in space and implicitly marched forward in time. The algorithms are extensions of Jacobi and SOR. The extensions consist of iterating over a window of several timesteps, allowing efficient overlap of computation with communication. The methods increase the degree to which work can be performed while data are communicated between processors. The effect of the window size and of domain partitioning on the system performance is examined both by implementing the algorithm on a simulated multiprocessor system.
Towards developing robust algorithms for solving partial differential equations on MIMD machines

NASA Technical Reports Server (NTRS)

Saltz, J. H.; Naik, V. K.

1985-01-01

Methods for efficient computation of numerical algorithms on a wide variety of MIMD machines are proposed. These techniques reorganize the data dependency patterns to improve the processor utilization. The model problem finds the time-accurate solution to a parabolic partial differential equation discretized in space and implicitly marched forward in time. The algorithms are extensions of Jacobi and SOR. The extensions consist of iterating over a window of several timesteps, allowing efficient overlap of computation with communication. The methods increase the degree to which work can be performed while data are communicated between processors. The effect of the window size and of domain partitioning on the system performance is examined both by implementing the algorithm on a simulated multiprocessor system.
Fragmenting networks by targeting collective influencers at a mesoscopic level.

PubMed

Kobayashi, Teruyoshi; Masuda, Naoki

2016-11-25

A practical approach to protecting networks against epidemic processes such as spreading of infectious diseases, malware, and harmful viral information is to remove some influential nodes beforehand to fragment the network into small components. Because determining the optimal order to remove nodes is a computationally hard problem, various approximate algorithms have been proposed to efficiently fragment networks by sequential node removal. Morone and Makse proposed an algorithm employing the non-backtracking matrix of given networks, which outperforms various existing algorithms. In fact, many empirical networks have community structure, compromising the assumption of local tree-like structure on which the original algorithm is based. We develop an immunization algorithm by synergistically combining the Morone-Makse algorithm and coarse graining of the network in which we regard a community as a supernode. In this way, we aim to identify nodes that connect different communities at a reasonable computational cost. The proposed algorithm works more efficiently than the Morone-Makse and other algorithms on networks with community structure.
Fragmenting networks by targeting collective influencers at a mesoscopic level

NASA Astrophysics Data System (ADS)

Kobayashi, Teruyoshi; Masuda, Naoki

2016-11-01

A practical approach to protecting networks against epidemic processes such as spreading of infectious diseases, malware, and harmful viral information is to remove some influential nodes beforehand to fragment the network into small components. Because determining the optimal order to remove nodes is a computationally hard problem, various approximate algorithms have been proposed to efficiently fragment networks by sequential node removal. Morone and Makse proposed an algorithm employing the non-backtracking matrix of given networks, which outperforms various existing algorithms. In fact, many empirical networks have community structure, compromising the assumption of local tree-like structure on which the original algorithm is based. We develop an immunization algorithm by synergistically combining the Morone-Makse algorithm and coarse graining of the network in which we regard a community as a supernode. In this way, we aim to identify nodes that connect different communities at a reasonable computational cost. The proposed algorithm works more efficiently than the Morone-Makse and other algorithms on networks with community structure.
Fragmenting networks by targeting collective influencers at a mesoscopic level

PubMed Central

Kobayashi, Teruyoshi; Masuda, Naoki

2016-01-01

A practical approach to protecting networks against epidemic processes such as spreading of infectious diseases, malware, and harmful viral information is to remove some influential nodes beforehand to fragment the network into small components. Because determining the optimal order to remove nodes is a computationally hard problem, various approximate algorithms have been proposed to efficiently fragment networks by sequential node removal. Morone and Makse proposed an algorithm employing the non-backtracking matrix of given networks, which outperforms various existing algorithms. In fact, many empirical networks have community structure, compromising the assumption of local tree-like structure on which the original algorithm is based. We develop an immunization algorithm by synergistically combining the Morone-Makse algorithm and coarse graining of the network in which we regard a community as a supernode. In this way, we aim to identify nodes that connect different communities at a reasonable computational cost. The proposed algorithm works more efficiently than the Morone-Makse and other algorithms on networks with community structure. PMID:27886251
Efficient Craig Interpolation for Linear Diophantine (Dis)Equations and Linear Modular Equations

DTIC Science & Technology

2008-02-01

Craig interpolants has enabled the development of powerful hardware and software model checking techniques. Efficient algorithms are known for computing...interpolants in rational and real linear arithmetic. We focus on subsets of integer linear arithmetic. Our main results are polynomial time algorithms ...congruences), and linear diophantine disequations. We show the utility of the proposed interpolation algorithms for discovering modular/divisibility predicates
A computationally efficient parallel Levenberg-Marquardt algorithm for highly parameterized inverse model analyses

NASA Astrophysics Data System (ADS)

Lin, Youzuo; O'Malley, Daniel; Vesselinov, Velimir V.

2016-09-01

Inverse modeling seeks model parameters given a set of observations. However, for practical problems because the number of measurements is often large and the model parameters are also numerous, conventional methods for inverse modeling can be computationally expensive. We have developed a new, computationally efficient parallel Levenberg-Marquardt method for solving inverse modeling problems with a highly parameterized model space. Levenberg-Marquardt methods require the solution of a linear system of equations which can be prohibitively expensive to compute for moderate to large-scale problems. Our novel method projects the original linear problem down to a Krylov subspace such that the dimensionality of the problem can be significantly reduced. Furthermore, we store the Krylov subspace computed when using the first damping parameter and recycle the subspace for the subsequent damping parameters. The efficiency of our new inverse modeling algorithm is significantly improved using these computational techniques. We apply this new inverse modeling method to invert for random transmissivity fields in 2-D and a random hydraulic conductivity field in 3-D. Our algorithm is fast enough to solve for the distributed model parameters (transmissivity) in the model domain. The algorithm is coded in Julia and implemented in the MADS computational framework (http://mads.lanl.gov). By comparing with Levenberg-Marquardt methods using standard linear inversion techniques such as QR or SVD methods, our Levenberg-Marquardt method yields a speed-up ratio on the order of ˜101 to ˜102 in a multicore computational environment. Therefore, our new inverse modeling method is a powerful tool for characterizing subsurface heterogeneity for moderate to large-scale problems.
Computationally efficient method for Fourier transform of highly chirped pulses for laser and parametric amplifier modeling.

PubMed

Andrianov, Alexey; Szabo, Aron; Sergeev, Alexander; Kim, Arkady; Chvykov, Vladimir; Kalashnikov, Mikhail

2016-11-14

We developed an improved approach to calculate the Fourier transform of signals with arbitrary large quadratic phase which can be efficiently implemented in numerical simulations utilizing Fast Fourier transform. The proposed algorithm significantly reduces the computational cost of Fourier transform of a highly chirped and stretched pulse by splitting it into two separate transforms of almost transform limited pulses, thereby reducing the required grid size roughly by a factor of the pulse stretching. The application of our improved Fourier transform algorithm in the split-step method for numerical modeling of CPA and OPCPA shows excellent agreement with standard algorithms.
Function-Based Algorithms for Biological Sequences

ERIC Educational Resources Information Center

Mohanty, Pragyan Sheela P.

2015-01-01

Two problems at two different abstraction levels of computational biology are studied. At the molecular level, efficient pattern matching algorithms in DNA sequences are presented. For gene order data, an efficient data structure is presented capable of storing all gene re-orderings in a systematic manner. A common characteristic of presented…
Zero-block mode decision algorithm for H.264/AVC.

PubMed

Lee, Yu-Ming; Lin, Yinyi

2009-03-01

In the previous paper , we proposed a zero-block intermode decision algorithm for H.264 video coding based upon the number of zero-blocks of 4 x 4 DCT coefficients between the current macroblock and the co-located macroblock. The proposed algorithm can achieve significant improvement in computation, but the computation performance is limited for high bit-rate coding. To improve computation efficiency, in this paper, we suggest an enhanced zero-block decision algorithm, which uses an early zero-block detection method to compute the number of zero-blocks instead of direct DCT and quantization (DCT/Q) calculation and incorporates two adequate decision methods into semi-stationary and nonstationary regions of a video sequence. In addition, the zero-block decision algorithm is also applied to the intramode prediction in the P frame. The enhanced zero-block decision algorithm brings out a reduction of average 27% of total encoding time compared to the zero-block decision algorithm.
Performance improvement of multi-class detection using greedy algorithm for Viola-Jones cascade selection

NASA Astrophysics Data System (ADS)

Tereshin, Alexander A.; Usilin, Sergey A.; Arlazarov, Vladimir V.

2018-04-01

This paper aims to study the problem of multi-class object detection in video stream with Viola-Jones cascades. An adaptive algorithm for selecting Viola-Jones cascade based on greedy choice strategy in solution of the N-armed bandit problem is proposed. The efficiency of the algorithm on the problem of detection and recognition of the bank card logos in the video stream is shown. The proposed algorithm can be effectively used in documents localization and identification, recognition of road scene elements, localization and tracking of the lengthy objects , and for solving other problems of rigid object detection in a heterogeneous data flows. The computational efficiency of the algorithm makes it possible to use it both on personal computers and on mobile devices based on processors with low power consumption.
Sensitivity analysis of dynamic biological systems with time-delays.

PubMed

Wu, Wu Hsiung; Wang, Feng Sheng; Chang, Maw Shang

2010-10-15

Mathematical modeling has been applied to the study and analysis of complex biological systems for a long time. Some processes in biological systems, such as the gene expression and feedback control in signal transduction networks, involve a time delay. These systems are represented as delay differential equation (DDE) models. Numerical sensitivity analysis of a DDE model by the direct method requires the solutions of model and sensitivity equations with time-delays. The major effort is the computation of Jacobian matrix when computing the solution of sensitivity equations. The computation of partial derivatives of complex equations either by the analytic method or by symbolic manipulation is time consuming, inconvenient, and prone to introduce human errors. To address this problem, an automatic approach to obtain the derivatives of complex functions efficiently and accurately is necessary. We have proposed an efficient algorithm with an adaptive step size control to compute the solution and dynamic sensitivities of biological systems described by ordinal differential equations (ODEs). The adaptive direct-decoupled algorithm is extended to solve the solution and dynamic sensitivities of time-delay systems describing by DDEs. To save the human effort and avoid the human errors in the computation of partial derivatives, an automatic differentiation technique is embedded in the extended algorithm to evaluate the Jacobian matrix. The extended algorithm is implemented and applied to two realistic models with time-delays: the cardiovascular control system and the TNF-α signal transduction network. The results show that the extended algorithm is a good tool for dynamic sensitivity analysis on DDE models with less user intervention. By comparing with direct-coupled methods in theory, the extended algorithm is efficient, accurate, and easy to use for end users without programming background to do dynamic sensitivity analysis on complex biological systems with time-delays.
Binary mesh partitioning for cache-efficient visualization.

PubMed

Tchiboukdjian, Marc; Danjean, Vincent; Raffin, Bruno

2010-01-01

One important bottleneck when visualizing large data sets is the data transfer between processor and memory. Cache-aware (CA) and cache-oblivious (CO) algorithms take into consideration the memory hierarchy to design cache efficient algorithms. CO approaches have the advantage to adapt to unknown and varying memory hierarchies. Recent CA and CO algorithms developed for 3D mesh layouts significantly improve performance of previous approaches, but they lack of theoretical performance guarantees. We present in this paper a {\\schmi O}(N\\log N) algorithm to compute a CO layout for unstructured but well shaped meshes. We prove that a coherent traversal of a N-size mesh in dimension d induces less than N/B+{\\schmi O}(N/M;{1/d}) cache-misses where B and M are the block size and the cache size, respectively. Experiments show that our layout computation is faster and significantly less memory consuming than the best known CO algorithm. Performance is comparable to this algorithm for classical visualization algorithm access patterns, or better when the BSP tree produced while computing the layout is used as an acceleration data structure adjusted to the layout. We also show that cache oblivious approaches lead to significant performance increases on recent GPU architectures.
Tools for Analyzing Computing Resource Management Strategies and Algorithms for SDR Clouds

NASA Astrophysics Data System (ADS)

Marojevic, Vuk; Gomez-Miguelez, Ismael; Gelonch, Antoni

2012-09-01

Software defined radio (SDR) clouds centralize the computing resources of base stations. The computing resource pool is shared between radio operators and dynamically loads and unloads digital signal processing chains for providing wireless communications services on demand. Each new user session request particularly requires the allocation of computing resources for executing the corresponding SDR transceivers. The huge amount of computing resources of SDR cloud data centers and the numerous session requests at certain hours of a day require an efficient computing resource management. We propose a hierarchical approach, where the data center is divided in clusters that are managed in a distributed way. This paper presents a set of computing resource management tools for analyzing computing resource management strategies and algorithms for SDR clouds. We use the tools for evaluating a different strategies and algorithms. The results show that more sophisticated algorithms can achieve higher resource occupations and that a tradeoff exists between cluster size and algorithm complexity.
Massively parallel algorithms for real-time wavefront control of a dense adaptive optics system

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fijany, A.; Milman, M.; Redding, D.

1994-12-31

In this paper massively parallel algorithms and architectures for real-time wavefront control of a dense adaptive optic system (SELENE) are presented. The authors have already shown that the computation of a near optimal control algorithm for SELENE can be reduced to the solution of a discrete Poisson equation on a regular domain. Although, this represents an optimal computation, due the large size of the system and the high sampling rate requirement, the implementation of this control algorithm poses a computationally challenging problem since it demands a sustained computational throughput of the order of 10 GFlops. They develop a novel algorithm,more » designated as Fast Invariant Imbedding algorithm, which offers a massive degree of parallelism with simple communication and synchronization requirements. Due to these features, this algorithm is significantly more efficient than other Fast Poisson Solvers for implementation on massively parallel architectures. The authors also discuss two massively parallel, algorithmically specialized, architectures for low-cost and optimal implementation of the Fast Invariant Imbedding algorithm.« less
Quantum Computation: Entangling with the Future

NASA Technical Reports Server (NTRS)

Jiang, Zhang

2017-01-01

Commercial applications of quantum computation have become viable due to the rapid progress of the field in the recent years. Efficient quantum algorithms are discovered to cope with the most challenging real-world problems that are too hard for classical computers. Manufactured quantum hardware has reached unprecedented precision and controllability, enabling fault-tolerant quantum computation. Here, I give a brief introduction on what principles in quantum mechanics promise its unparalleled computational power. I will discuss several important quantum algorithms that achieve exponential or polynomial speedup over any classical algorithm. Building a quantum computer is a daunting task, and I will talk about the criteria and various implementations of quantum computers. I conclude the talk with near-future commercial applications of a quantum computer.
Rapid solution of large-scale systems of equations

NASA Technical Reports Server (NTRS)

Storaasli, Olaf O.

1994-01-01

The analysis and design of complex aerospace structures requires the rapid solution of large systems of linear and nonlinear equations, eigenvalue extraction for buckling, vibration and flutter modes, structural optimization and design sensitivity calculation. Computers with multiple processors and vector capabilities can offer substantial computational advantages over traditional scalar computer for these analyses. These computers fall into two categories: shared memory computers and distributed memory computers. This presentation covers general-purpose, highly efficient algorithms for generation/assembly or element matrices, solution of systems of linear and nonlinear equations, eigenvalue and design sensitivity analysis and optimization. All algorithms are coded in FORTRAN for shared memory computers and many are adapted to distributed memory computers. The capability and numerical performance of these algorithms will be addressed.

A Stochastic Total Least Squares Solution of Adaptive Filtering Problem

PubMed Central

Ahmad, Noor Atinah

2014-01-01

An efficient and computationally linear algorithm is derived for total least squares solution of adaptive filtering problem, when both input and output signals are contaminated by noise. The proposed total least mean squares (TLMS) algorithm is designed by recursively computing an optimal solution of adaptive TLS problem by minimizing instantaneous value of weighted cost function. Convergence analysis of the algorithm is given to show the global convergence of the proposed algorithm, provided that the stepsize parameter is appropriately chosen. The TLMS algorithm is computationally simpler than the other TLS algorithms and demonstrates a better performance as compared with the least mean square (LMS) and normalized least mean square (NLMS) algorithms. It provides minimum mean square deviation by exhibiting better convergence in misalignment for unknown system identification under noisy inputs. PMID:24688412
Complexity control algorithm based on adaptive mode selection for interframe coding in high efficiency video coding

NASA Astrophysics Data System (ADS)

Chen, Gang; Yang, Bing; Zhang, Xiaoyun; Gao, Zhiyong

2017-07-01

The latest high efficiency video coding (HEVC) standard significantly increases the encoding complexity for improving its coding efficiency. Due to the limited computational capability of handheld devices, complexity constrained video coding has drawn great attention in recent years. A complexity control algorithm based on adaptive mode selection is proposed for interframe coding in HEVC. Considering the direct proportionality between encoding time and computational complexity, the computational complexity is measured in terms of encoding time. First, complexity is mapped to a target in terms of prediction modes. Then, an adaptive mode selection algorithm is proposed for the mode decision process. Specifically, the optimal mode combination scheme that is chosen through offline statistics is developed at low complexity. If the complexity budget has not been used up, an adaptive mode sorting method is employed to further improve coding efficiency. The experimental results show that the proposed algorithm achieves a very large complexity control range (as low as 10%) for the HEVC encoder while maintaining good rate-distortion performance. For the lowdelayP condition, compared with the direct resource allocation method and the state-of-the-art method, an average gain of 0.63 and 0.17 dB in BDPSNR is observed for 18 sequences when the target complexity is around 40%.
A Fast Algorithm for the Convolution of Functions with Compact Support Using Fourier Extensions

DOE PAGES

Xu, Kuan; Austin, Anthony P.; Wei, Ke

2017-12-21

In this paper, we present a new algorithm for computing the convolution of two compactly supported functions. The algorithm approximates the functions to be convolved using Fourier extensions and then uses the fast Fourier transform to efficiently compute Fourier extension approximations to the pieces of the result. Finally, the complexity of the algorithm is O(N(log N) 2), where N is the number of degrees of freedom used in each of the Fourier extensions.
A Fast Algorithm for the Convolution of Functions with Compact Support Using Fourier Extensions

DOE Office of Scientific and Technical Information (OSTI.GOV)

Xu, Kuan; Austin, Anthony P.; Wei, Ke

In this paper, we present a new algorithm for computing the convolution of two compactly supported functions. The algorithm approximates the functions to be convolved using Fourier extensions and then uses the fast Fourier transform to efficiently compute Fourier extension approximations to the pieces of the result. Finally, the complexity of the algorithm is O(N(log N) 2), where N is the number of degrees of freedom used in each of the Fourier extensions.
A Hierarchical Algorithm for Fast Debye Summation with Applications to Small Angle Scattering

PubMed Central

Gumerov, Nail A.; Berlin, Konstantin; Fushman, David; Duraiswami, Ramani

2012-01-01

Debye summation, which involves the summation of sinc functions of distances between all pair of atoms in three dimensional space, arises in computations performed in crystallography, small/wide angle X-ray scattering (SAXS/WAXS) and small angle neutron scattering (SANS). Direct evaluation of Debye summation has quadratic complexity, which results in computational bottleneck when determining crystal properties, or running structure refinement protocols that involve SAXS or SANS, even for moderately sized molecules. We present a fast approximation algorithm that efficiently computes the summation to any prescribed accuracy ε in linear time. The algorithm is similar to the fast multipole method (FMM), and is based on a hierarchical spatial decomposition of the molecule coupled with local harmonic expansions and translation of these expansions. An even more efficient implementation is possible when the scattering profile is all that is required, as in small angle scattering reconstruction (SAS) of macromolecules. We examine the relationship of the proposed algorithm to existing approximate methods for profile computations, and show that these methods may result in inaccurate profile computations, unless an error bound derived in this paper is used. Our theoretical and computational results show orders of magnitude improvement in computation complexity over existing methods, while maintaining prescribed accuracy. PMID:22707386
A lightweight QRS detector for single lead ECG signals using a max-min difference algorithm.

PubMed

Pandit, Diptangshu; Zhang, Li; Liu, Chengyu; Chattopadhyay, Samiran; Aslam, Nauman; Lim, Chee Peng

2017-06-01

Detection of the R-peak pertaining to the QRS complex of an ECG signal plays an important role for the diagnosis of a patient's heart condition. To accurately identify the QRS locations from the acquired raw ECG signals, we need to handle a number of challenges, which include noise, baseline wander, varying peak amplitudes, and signal abnormality. This research aims to address these challenges by developing an efficient lightweight algorithm for QRS (i.e., R-peak) detection from raw ECG signals. A lightweight real-time sliding window-based Max-Min Difference (MMD) algorithm for QRS detection from Lead II ECG signals is proposed. Targeting to achieve the best trade-off between computational efficiency and detection accuracy, the proposed algorithm consists of five key steps for QRS detection, namely, baseline correction, MMD curve generation, dynamic threshold computation, R-peak detection, and error correction. Five annotated databases from Physionet are used for evaluating the proposed algorithm in R-peak detection. Integrated with a feature extraction technique and a neural network classifier, the proposed ORS detection algorithm has also been extended to undertake normal and abnormal heartbeat detection from ECG signals. The proposed algorithm exhibits a high degree of robustness in QRS detection and achieves an average sensitivity of 99.62% and an average positive predictivity of 99.67%. Its performance compares favorably with those from the existing state-of-the-art models reported in the literature. In regards to normal and abnormal heartbeat detection, the proposed QRS detection algorithm in combination with the feature extraction technique and neural network classifier achieves an overall accuracy rate of 93.44% based on an empirical evaluation using the MIT-BIH Arrhythmia data set with 10-fold cross validation. In comparison with other related studies, the proposed algorithm offers a lightweight adaptive alternative for R-peak detection with good computational efficiency. The empirical results indicate that it not only yields a high accuracy rate in QRS detection, but also exhibits efficient computational complexity at the order of O(n), where n is the length of an ECG signal. Copyright © 2017 Elsevier B.V. All rights reserved.
Error and Symmetry Analysis of Misner's Algorithm for Spherical Harmonic Decomposition on a Cubic Grid

NASA Technical Reports Server (NTRS)

Fiske, David R.

2004-01-01

In an earlier paper, Misner (2004, Class. Quant. Grav., 21, S243) presented a novel algorithm for computing the spherical harmonic components of data represented on a cubic grid. I extend Misner s original analysis by making detailed error estimates of the numerical errors accrued by the algorithm, by using symmetry arguments to suggest a more efficient implementation scheme, and by explaining how the algorithm can be applied efficiently on data with explicit reflection symmetries.
Trellises and Trellis-Based Decoding Algorithms for Linear Block Codes. Part 3

NASA Technical Reports Server (NTRS)

Lin, Shu

1998-01-01

Decoding algorithms based on the trellis representation of a code (block or convolutional) drastically reduce decoding complexity. The best known and most commonly used trellis-based decoding algorithm is the Viterbi algorithm. It is a maximum likelihood decoding algorithm. Convolutional codes with the Viterbi decoding have been widely used for error control in digital communications over the last two decades. This chapter is concerned with the application of the Viterbi decoding algorithm to linear block codes. First, the Viterbi algorithm is presented. Then, optimum sectionalization of a trellis to minimize the computational complexity of a Viterbi decoder is discussed and an algorithm is presented. Some design issues for IC (integrated circuit) implementation of a Viterbi decoder are considered and discussed. Finally, a new decoding algorithm based on the principle of compare-select-add is presented. This new algorithm can be applied to both block and convolutional codes and is more efficient than the conventional Viterbi algorithm based on the add-compare-select principle. This algorithm is particularly efficient for rate 1/n antipodal convolutional codes and their high-rate punctured codes. It reduces computational complexity by one-third compared with the Viterbi algorithm.
A Novel Latin Hypercube Algorithm via Translational Propagation

PubMed Central

Pan, Guang; Ye, Pengcheng

2014-01-01

Metamodels have been widely used in engineering design to facilitate analysis and optimization of complex systems that involve computationally expensive simulation programs. The accuracy of metamodels is directly related to the experimental designs used. Optimal Latin hypercube designs are frequently used and have been shown to have good space-filling and projective properties. However, the high cost in constructing them limits their use. In this paper, a methodology for creating novel Latin hypercube designs via translational propagation and successive local enumeration algorithm (TPSLE) is developed without using formal optimization. TPSLE algorithm is based on the inspiration that a near optimal Latin Hypercube design can be constructed by a simple initial block with a few points generated by algorithm SLE as a building block. In fact, TPSLE algorithm offers a balanced trade-off between the efficiency and sampling performance. The proposed algorithm is compared to two existing algorithms and is found to be much more efficient in terms of the computation time and has acceptable space-filling and projective properties. PMID:25276844
zipHMMlib: a highly optimised HMM library exploiting repetitions in the input to speed up the forward algorithm.

PubMed

Sand, Andreas; Kristiansen, Martin; Pedersen, Christian N S; Mailund, Thomas

2013-11-22

Hidden Markov models are widely used for genome analysis as they combine ease of modelling with efficient analysis algorithms. Calculating the likelihood of a model using the forward algorithm has worst case time complexity linear in the length of the sequence and quadratic in the number of states in the model. For genome analysis, however, the length runs to millions or billions of observations, and when maximising the likelihood hundreds of evaluations are often needed. A time efficient forward algorithm is therefore a key ingredient in an efficient hidden Markov model library. We have built a software library for efficiently computing the likelihood of a hidden Markov model. The library exploits commonly occurring substrings in the input to reuse computations in the forward algorithm. In a pre-processing step our library identifies common substrings and builds a structure over the computations in the forward algorithm which can be reused. This analysis can be saved between uses of the library and is independent of concrete hidden Markov models so one preprocessing can be used to run a number of different models.Using this library, we achieve up to 78 times shorter wall-clock time for realistic whole-genome analyses with a real and reasonably complex hidden Markov model. In one particular case the analysis was performed in less than 8 minutes compared to 9.6 hours for the previously fastest library. We have implemented the preprocessing procedure and forward algorithm as a C++ library, zipHMM, with Python bindings for use in scripts. The library is available at http://birc.au.dk/software/ziphmm/.
Molecular simulation workflows as parallel algorithms: the execution engine of Copernicus, a distributed high-performance computing platform.

PubMed

Pronk, Sander; Pouya, Iman; Lundborg, Magnus; Rotskoff, Grant; Wesén, Björn; Kasson, Peter M; Lindahl, Erik

2015-06-09

Computational chemistry and other simulation fields are critically dependent on computing resources, but few problems scale efficiently to the hundreds of thousands of processors available in current supercomputers-particularly for molecular dynamics. This has turned into a bottleneck as new hardware generations primarily provide more processing units rather than making individual units much faster, which simulation applications are addressing by increasingly focusing on sampling with algorithms such as free-energy perturbation, Markov state modeling, metadynamics, or milestoning. All these rely on combining results from multiple simulations into a single observation. They are potentially powerful approaches that aim to predict experimental observables directly, but this comes at the expense of added complexity in selecting sampling strategies and keeping track of dozens to thousands of simulations and their dependencies. Here, we describe how the distributed execution framework Copernicus allows the expression of such algorithms in generic workflows: dataflow programs. Because dataflow algorithms explicitly state dependencies of each constituent part, algorithms only need to be described on conceptual level, after which the execution is maximally parallel. The fully automated execution facilitates the optimization of these algorithms with adaptive sampling, where undersampled regions are automatically detected and targeted without user intervention. We show how several such algorithms can be formulated for computational chemistry problems, and how they are executed efficiently with many loosely coupled simulations using either distributed or parallel resources with Copernicus.
SA-SOM algorithm for detecting communities in complex networks

NASA Astrophysics Data System (ADS)

Chen, Luogeng; Wang, Yanran; Huang, Xiaoming; Hu, Mengyu; Hu, Fang

2017-10-01

Currently, community detection is a hot topic. This paper, based on the self-organizing map (SOM) algorithm, introduced the idea of self-adaptation (SA) that the number of communities can be identified automatically, a novel algorithm SA-SOM of detecting communities in complex networks is proposed. Several representative real-world networks and a set of computer-generated networks by LFR-benchmark are utilized to verify the accuracy and the efficiency of this algorithm. The experimental findings demonstrate that this algorithm can identify the communities automatically, accurately and efficiently. Furthermore, this algorithm can also acquire higher values of modularity, NMI and density than the SOM algorithm does.
Efficient frequent pattern mining algorithm based on node sets in cloud computing environment

NASA Astrophysics Data System (ADS)

Billa, V. N. Vinay Kumar; Lakshmanna, K.; Rajesh, K.; Reddy, M. Praveen Kumar; Nagaraja, G.; Sudheer, K.

2017-11-01

The ultimate goal of Data Mining is to determine the hidden information which is useful in making decisions using the large databases collected by an organization. This Data Mining involves many tasks that are to be performed during the process. Mining frequent itemsets is the one of the most important tasks in case of transactional databases. These transactional databases contain the data in very large scale where the mining of these databases involves the consumption of physical memory and time in proportion to the size of the database. A frequent pattern mining algorithm is said to be efficient only if it consumes less memory and time to mine the frequent itemsets from the given large database. Having these points in mind in this thesis we proposed a system which mines frequent itemsets in an optimized way in terms of memory and time by using cloud computing as an important factor to make the process parallel and the application is provided as a service. A complete framework which uses a proven efficient algorithm called FIN algorithm. FIN algorithm works on Nodesets and POC (pre-order coding) tree. In order to evaluate the performance of the system we conduct the experiments to compare the efficiency of the same algorithm applied in a standalone manner and in cloud computing environment on a real time data set which is traffic accidents data set. The results show that the memory consumption and execution time taken for the process in the proposed system is much lesser than those of standalone system.
A hardware-oriented concurrent TZ search algorithm for High-Efficiency Video Coding

NASA Astrophysics Data System (ADS)

Doan, Nghia; Kim, Tae Sung; Rhee, Chae Eun; Lee, Hyuk-Jae

2017-12-01

High-Efficiency Video Coding (HEVC) is the latest video coding standard, in which the compression performance is double that of its predecessor, the H.264/AVC standard, while the video quality remains unchanged. In HEVC, the test zone (TZ) search algorithm is widely used for integer motion estimation because it effectively searches the good-quality motion vector with a relatively small amount of computation. However, the complex computation structure of the TZ search algorithm makes it difficult to implement it in the hardware. This paper proposes a new integer motion estimation algorithm which is designed for hardware execution by modifying the conventional TZ search to allow parallel motion estimations of all prediction unit (PU) partitions. The algorithm consists of the three phases of zonal, raster, and refinement searches. At the beginning of each phase, the algorithm obtains the search points required by the original TZ search for all PU partitions in a coding unit (CU). Then, all redundant search points are removed prior to the estimation of the motion costs, and the best search points are then selected for all PUs. Compared to the conventional TZ search algorithm, experimental results show that the proposed algorithm significantly decreases the Bjøntegaard Delta bitrate (BD-BR) by 0.84%, and it also reduces the computational complexity by 54.54%.
Quantum rendering

NASA Astrophysics Data System (ADS)

Lanzagorta, Marco O.; Gomez, Richard B.; Uhlmann, Jeffrey K.

2003-08-01

In recent years, computer graphics has emerged as a critical component of the scientific and engineering process, and it is recognized as an important computer science research area. Computer graphics are extensively used for a variety of aerospace and defense training systems and by Hollywood's special effects companies. All these applications require the computer graphics systems to produce high quality renderings of extremely large data sets in short periods of time. Much research has been done in "classical computing" toward the development of efficient methods and techniques to reduce the rendering time required for large datasets. Quantum Computing's unique algorithmic features offer the possibility of speeding up some of the known rendering algorithms currently used in computer graphics. In this paper we discuss possible implementations of quantum rendering algorithms. In particular, we concentrate on the implementation of Grover's quantum search algorithm for Z-buffering, ray-tracing, radiosity, and scene management techniques. We also compare the theoretical performance between the classical and quantum versions of the algorithms.
Node fingerprinting: an efficient heuristic for aligning biological networks.

PubMed

Radu, Alex; Charleston, Michael

2014-10-01

With the continuing increase in availability of biological data and improvements to biological models, biological network analysis has become a promising area of research. An emerging technique for the analysis of biological networks is through network alignment. Network alignment has been used to calculate genetic distance, similarities between regulatory structures, and the effect of external forces on gene expression, and to depict conditional activity of expression modules in cancer. Network alignment is algorithmically complex, and therefore we must rely on heuristics, ideally as efficient and accurate as possible. The majority of current techniques for network alignment rely on precomputed information, such as with protein sequence alignment, or on tunable network alignment parameters, which may introduce an increased computational overhead. Our presented algorithm, which we call Node Fingerprinting (NF), is appropriate for performing global pairwise network alignment without precomputation or tuning, can be fully parallelized, and is able to quickly compute an accurate alignment between two biological networks. It has performed as well as or better than existing algorithms on biological and simulated data, and with fewer computational resources. The algorithmic validation performed demonstrates the low computational resource requirements of NF.
Efficient Grammar Induction Algorithm with Parse Forests from Real Corpora

NASA Astrophysics Data System (ADS)

Kurihara, Kenichi; Kameya, Yoshitaka; Sato, Taisuke

The task of inducing grammar structures has received a great deal of attention. The reasons why researchers have studied are different; to use grammar induction as the first stage in building large treebanks or to make up better language models. However, grammar induction has inherent computational complexity. To overcome it, some grammar induction algorithms add new production rules incrementally. They refine the grammar while keeping their computational complexity low. In this paper, we propose a new efficient grammar induction algorithm. Although our algorithm is similar to algorithms which learn a grammar incrementally, our algorithm uses the graphical EM algorithm instead of the Inside-Outside algorithm. We report results of learning experiments in terms of learning speeds. The results show that our algorithm learns a grammar in constant time regardless of the size of the grammar. Since our algorithm decreases syntactic ambiguities in each step, our algorithm reduces required time for learning. This constant-time learning considerably affects learning time for larger grammars. We also reports results of evaluation of criteria to choose nonterminals. Our algorithm refines a grammar based on a nonterminal in each step. Since there can be several criteria to decide which nonterminal is the best, we evaluate them by learning experiments.
Low-cost autonomous perceptron neural network inspired by quantum computation

NASA Astrophysics Data System (ADS)

Zidan, Mohammed; Abdel-Aty, Abdel-Haleem; El-Sadek, Alaa; Zanaty, E. A.; Abdel-Aty, Mahmoud

2017-11-01

Achieving low cost learning with reliable accuracy is one of the important goals to achieve intelligent machines to save time, energy and perform learning process over limited computational resources machines. In this paper, we propose an efficient algorithm for a perceptron neural network inspired by quantum computing composite from a single neuron to classify inspirable linear applications after a single training iteration O(1). The algorithm is applied over a real world data set and the results are outer performs the other state-of-the art algorithms.
Efficiency of parallel direct optimization

NASA Technical Reports Server (NTRS)

Janies, D. A.; Wheeler, W. C.

2001-01-01

Tremendous progress has been made at the level of sequential computation in phylogenetics. However, little attention has been paid to parallel computation. Parallel computing is particularly suited to phylogenetics because of the many ways large computational problems can be broken into parts that can be analyzed concurrently. In this paper, we investigate the scaling factors and efficiency of random addition and tree refinement strategies using the direct optimization software, POY, on a small (10 slave processors) and a large (256 slave processors) cluster of networked PCs running LINUX. These algorithms were tested on several data sets composed of DNA and morphology ranging from 40 to 500 taxa. Various algorithms in POY show fundamentally different properties within and between clusters. All algorithms are efficient on the small cluster for the 40-taxon data set. On the large cluster, multibuilding exhibits excellent parallel efficiency, whereas parallel building is inefficient. These results are independent of data set size. Branch swapping in parallel shows excellent speed-up for 16 slave processors on the large cluster. However, there is no appreciable speed-up for branch swapping with the further addition of slave processors (>16). This result is independent of data set size. Ratcheting in parallel is efficient with the addition of up to 32 processors in the large cluster. This result is independent of data set size. c2001 The Willi Hennig Society.
A Parallel Compact Multi-Dimensional Numerical Algorithm with Aeroacoustics Applications

NASA Technical Reports Server (NTRS)

Povitsky, Alex; Morris, Philip J.

1999-01-01

In this study we propose a novel method to parallelize high-order compact numerical algorithms for the solution of three-dimensional PDEs (Partial Differential Equations) in a space-time domain. For this numerical integration most of the computer time is spent in computation of spatial derivatives at each stage of the Runge-Kutta temporal update. The most efficient direct method to compute spatial derivatives on a serial computer is a version of Gaussian elimination for narrow linear banded systems known as the Thomas algorithm. In a straightforward pipelined implementation of the Thomas algorithm processors are idle due to the forward and backward recurrences of the Thomas algorithm. To utilize processors during this time, we propose to use them for either non-local data independent computations, solving lines in the next spatial direction, or local data-dependent computations by the Runge-Kutta method. To achieve this goal, control of processor communication and computations by a static schedule is adopted. Thus, our parallel code is driven by a communication and computation schedule instead of the usual "creative, programming" approach. The obtained parallelization speed-up of the novel algorithm is about twice as much as that for the standard pipelined algorithm and close to that for the explicit DRP algorithm.

A computationally efficient parallel Levenberg-Marquardt algorithm for highly parameterized inverse model analyses

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lin, Youzuo; O'Malley, Daniel; Vesselinov, Velimir V.

Inverse modeling seeks model parameters given a set of observations. However, for practical problems because the number of measurements is often large and the model parameters are also numerous, conventional methods for inverse modeling can be computationally expensive. We have developed a new, computationally-efficient parallel Levenberg-Marquardt method for solving inverse modeling problems with a highly parameterized model space. Levenberg-Marquardt methods require the solution of a linear system of equations which can be prohibitively expensive to compute for moderate to large-scale problems. Our novel method projects the original linear problem down to a Krylov subspace, such that the dimensionality of themore » problem can be significantly reduced. Furthermore, we store the Krylov subspace computed when using the first damping parameter and recycle the subspace for the subsequent damping parameters. The efficiency of our new inverse modeling algorithm is significantly improved using these computational techniques. We apply this new inverse modeling method to invert for random transmissivity fields in 2D and a random hydraulic conductivity field in 3D. Our algorithm is fast enough to solve for the distributed model parameters (transmissivity) in the model domain. The algorithm is coded in Julia and implemented in the MADS computational framework (http://mads.lanl.gov). By comparing with Levenberg-Marquardt methods using standard linear inversion techniques such as QR or SVD methods, our Levenberg-Marquardt method yields a speed-up ratio on the order of ~10 1 to ~10 2 in a multi-core computational environment. Furthermore, our new inverse modeling method is a powerful tool for characterizing subsurface heterogeneity for moderate- to large-scale problems.« less
A computationally efficient parallel Levenberg-Marquardt algorithm for highly parameterized inverse model analyses

DOE PAGES

Lin, Youzuo; O'Malley, Daniel; Vesselinov, Velimir V.

2016-09-01

Inverse modeling seeks model parameters given a set of observations. However, for practical problems because the number of measurements is often large and the model parameters are also numerous, conventional methods for inverse modeling can be computationally expensive. We have developed a new, computationally-efficient parallel Levenberg-Marquardt method for solving inverse modeling problems with a highly parameterized model space. Levenberg-Marquardt methods require the solution of a linear system of equations which can be prohibitively expensive to compute for moderate to large-scale problems. Our novel method projects the original linear problem down to a Krylov subspace, such that the dimensionality of themore » problem can be significantly reduced. Furthermore, we store the Krylov subspace computed when using the first damping parameter and recycle the subspace for the subsequent damping parameters. The efficiency of our new inverse modeling algorithm is significantly improved using these computational techniques. We apply this new inverse modeling method to invert for random transmissivity fields in 2D and a random hydraulic conductivity field in 3D. Our algorithm is fast enough to solve for the distributed model parameters (transmissivity) in the model domain. The algorithm is coded in Julia and implemented in the MADS computational framework (http://mads.lanl.gov). By comparing with Levenberg-Marquardt methods using standard linear inversion techniques such as QR or SVD methods, our Levenberg-Marquardt method yields a speed-up ratio on the order of ~10 1 to ~10 2 in a multi-core computational environment. Furthermore, our new inverse modeling method is a powerful tool for characterizing subsurface heterogeneity for moderate- to large-scale problems.« less
Aspects of numerical and representational methods related to the finite-difference simulation of advective and dispersive transport of freshwater in a thin brackish aquifer

USGS Publications Warehouse

Merritt, M.L.

1993-01-01

The simulation of the transport of injected freshwater in a thin brackish aquifer, overlain and underlain by confining layers containing more saline water, is shown to be influenced by the choice of the finite-difference approximation method, the algorithm for representing vertical advective and dispersive fluxes, and the values assigned to parametric coefficients that specify the degree of vertical dispersion and molecular diffusion that occurs. Computed potable water recovery efficiencies will differ depending upon the choice of algorithm and approximation method, as will dispersion coefficients estimated based on the calibration of simulations to match measured data. A comparison of centered and backward finite-difference approximation methods shows that substantially different transition zones between injected and native waters are depicted by the different methods, and computed recovery efficiencies vary greatly. Standard and experimental algorithms and a variety of values for molecular diffusivity, transverse dispersivity, and vertical scaling factor were compared in simulations of freshwater storage in a thin brackish aquifer. Computed recovery efficiencies vary considerably, and appreciable differences are observed in the distribution of injected freshwater in the various cases tested. The results demonstrate both a qualitatively different description of transport using the experimental algorithms and the interrelated influences of molecular diffusion and transverse dispersion on simulated recovery efficiency. When simulating natural aquifer flow in cross-section, flushing of the aquifer occurred for all tested coefficient choices using both standard and experimental algorithms. ?? 1993.
A Hybrid Shared-Memory Parallel Max-Tree Algorithm for Extreme Dynamic-Range Images.

PubMed

Moschini, Ugo; Meijster, Arnold; Wilkinson, Michael H F

2018-03-01

Max-trees, or component trees, are graph structures that represent the connected components of an image in a hierarchical way. Nowadays, many application fields rely on images with high-dynamic range or floating point values. Efficient sequential algorithms exist to build trees and compute attributes for images of any bit depth. However, we show that the current parallel algorithms perform poorly already with integers at bit depths higher than 16 bits per pixel. We propose a parallel method combining the two worlds of flooding and merging max-tree algorithms. First, a pilot max-tree of a quantized version of the image is built in parallel using a flooding method. Later, this structure is used in a parallel leaf-to-root approach to compute efficiently the final max-tree and to drive the merging of the sub-trees computed by the threads. We present an analysis of the performance both on simulated and actual 2D images and 3D volumes. Execution times are about better than the fastest sequential algorithm and speed-up goes up to on 64 threads.
Reducing Earth Topography Resolution for SMAP Mission Ground Tracks Using K-Means Clustering

NASA Technical Reports Server (NTRS)

Rizvi, Farheen

2013-01-01

The K-means clustering algorithm is used to reduce Earth topography resolution for the SMAP mission ground tracks. As SMAP propagates in orbit, knowledge of the radar antenna footprints on Earth is required for the antenna misalignment calibration. Each antenna footprint contains a latitude and longitude location pair on the Earth surface. There are 400 pairs in one data set for the calibration model. It is computationally expensive to calculate corresponding Earth elevation for these data pairs. Thus, the antenna footprint resolution is reduced. Similar topographical data pairs are grouped together with the K-means clustering algorithm. The resolution is reduced to the mean of each topographical cluster called the cluster centroid. The corresponding Earth elevation for each cluster centroid is assigned to the entire group. Results show that 400 data points are reduced to 60 while still maintaining algorithm performance and computational efficiency. In this work, sensitivity analysis is also performed to show a trade-off between algorithm performance versus computational efficiency as the number of cluster centroids and algorithm iterations are increased.
Effects of Computer Architecture on FFT (Fast Fourier Transform) Algorithm Performance.

DTIC Science & Technology

1983-12-01

Criteria for Efficient Implementation of FFT Algorithms," IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. ASSP-30, pp. 107-109, Feb...1982. Burrus, C. S. and P. W. Eschenbacher. "An In-Place, In-Order Prime Factor FFT Algorithm," IEEE Transactions on Acoustics, Speech, and Signal... Transactions on Acoustics, Speech, and Signal Processing, Vol. ASSP-30, pp. 217-226, Apr. 1982. Control Data Corporation. CDC Cyber 170 Computer Systems
Designing Waveform Sets with Good Correlation and Stopband Properties for MIMO Radar via the Gradient-Based Method

PubMed Central

Tang, Liang; Zhu, Yongfeng; Fu, Qiang

2017-01-01

Waveform sets with good correlation and/or stopband properties have received extensive attention and been widely used in multiple-input multiple-output (MIMO) radar. In this paper, we aim at designing unimodular waveform sets with good correlation and stopband properties. To formulate the problem, we construct two criteria to measure the correlation and stopband properties and then establish an unconstrained problem in the frequency domain. After deducing the phase gradient and the step size, an efficient gradient-based algorithm with monotonicity is proposed to minimize the objective function directly. For the design problem without considering the correlation weights, we develop a simplified algorithm, which only requires a few fast Fourier transform (FFT) operations and is more efficient. Because both of the algorithms can be implemented via the FFT operations and the Hadamard product, they are computationally efficient and can be used to design waveform sets with a large waveform number and waveform length. Numerical experiments show that the proposed algorithms can provide better performance than the state-of-the-art algorithms in terms of the computational complexity. PMID:28468308
Designing Waveform Sets with Good Correlation and Stopband Properties for MIMO Radar via the Gradient-Based Method.

PubMed

Tang, Liang; Zhu, Yongfeng; Fu, Qiang

2017-05-01

Waveform sets with good correlation and/or stopband properties have received extensive attention and been widely used in multiple-input multiple-output (MIMO) radar. In this paper, we aim at designing unimodular waveform sets with good correlation and stopband properties. To formulate the problem, we construct two criteria to measure the correlation and stopband properties and then establish an unconstrained problem in the frequency domain. After deducing the phase gradient and the step size, an efficient gradient-based algorithm with monotonicity is proposed to minimize the objective function directly. For the design problem without considering the correlation weights, we develop a simplified algorithm, which only requires a few fast Fourier transform (FFT) operations and is more efficient. Because both of the algorithms can be implemented via the FFT operations and the Hadamard product, they are computationally efficient and can be used to design waveform sets with a large waveform number and waveform length. Numerical experiments show that the proposed algorithms can provide better performance than the state-of-the-art algorithms in terms of the computational complexity.
Use of parallel computing in mass processing of laser data

NASA Astrophysics Data System (ADS)

Będkowski, J.; Bratuś, R.; Prochaska, M.; Rzonca, A.

2015-12-01

The first part of the paper includes a description of the rules used to generate the algorithm needed for the purpose of parallel computing and also discusses the origins of the idea of research on the use of graphics processors in large scale processing of laser scanning data. The next part of the paper includes the results of an efficiency assessment performed for an array of different processing options, all of which were substantially accelerated with parallel computing. The processing options were divided into the generation of orthophotos using point clouds, coloring of point clouds, transformations, and the generation of a regular grid, as well as advanced processes such as the detection of planes and edges, point cloud classification, and the analysis of data for the purpose of quality control. Most algorithms had to be formulated from scratch in the context of the requirements of parallel computing. A few of the algorithms were based on existing technology developed by the Dephos Software Company and then adapted to parallel computing in the course of this research study. Processing time was determined for each process employed for a typical quantity of data processed, which helped confirm the high efficiency of the solutions proposed and the applicability of parallel computing to the processing of laser scanning data. The high efficiency of parallel computing yields new opportunities in the creation and organization of processing methods for laser scanning data.
Computationally Efficient Clustering of Audio-Visual Meeting Data

NASA Astrophysics Data System (ADS)

Hung, Hayley; Friedland, Gerald; Yeo, Chuohao

This chapter presents novel computationally efficient algorithms to extract semantically meaningful acoustic and visual events related to each of the participants in a group discussion using the example of business meeting recordings. The recording setup involves relatively few audio-visual sensors, comprising a limited number of cameras and microphones. We first demonstrate computationally efficient algorithms that can identify who spoke and when, a problem in speech processing known as speaker diarization. We also extract visual activity features efficiently from MPEG4 video by taking advantage of the processing that was already done for video compression. Then, we present a method of associating the audio-visual data together so that the content of each participant can be managed individually. The methods presented in this article can be used as a principal component that enables many higher-level semantic analysis tasks needed in search, retrieval, and navigation.
An efficient parallel algorithm for the calculation of canonical MP2 energies.

PubMed

Baker, Jon; Pulay, Peter

2002-09-01

We present the parallel version of a previous serial algorithm for the efficient calculation of canonical MP2 energies (Pulay, P.; Saebo, S.; Wolinski, K. Chem Phys Lett 2001, 344, 543). It is based on the Saebo-Almlöf direct-integral transformation, coupled with an efficient prescreening of the AO integrals. The parallel algorithm avoids synchronization delays by spawning a second set of slaves during the bin-sort prior to the second half-transformation. Results are presented for systems with up to 2000 basis functions. MP2 energies for molecules with 400-500 basis functions can be routinely calculated to microhartree accuracy on a small number of processors (6-8) in a matter of minutes with modern PC-based parallel computers. Copyright 2002 Wiley Periodicals, Inc. J Comput Chem 23: 1150-1156, 2002
Approximate maximum likelihood decoding of block codes

NASA Technical Reports Server (NTRS)

Greenberger, H. J.

1979-01-01

Approximate maximum likelihood decoding algorithms, based upon selecting a small set of candidate code words with the aid of the estimated probability of error of each received symbol, can give performance close to optimum with a reasonable amount of computation. By combining the best features of various algorithms and taking care to perform each step as efficiently as possible, a decoding scheme was developed which can decode codes which have better performance than those presently in use and yet not require an unreasonable amount of computation. The discussion of the details and tradeoffs of presently known efficient optimum and near optimum decoding algorithms leads, naturally, to the one which embodies the best features of all of them.
Multipole Algorithms for Molecular Dynamics Simulation on High Performance Computers.

NASA Astrophysics Data System (ADS)

Elliott, William Dewey

1995-01-01

A fundamental problem in modeling large molecular systems with molecular dynamics (MD) simulations is the underlying N-body problem of computing the interactions between all pairs of N atoms. The simplest algorithm to compute pair-wise atomic interactions scales in runtime {cal O}(N^2), making it impractical for interesting biomolecular systems, which can contain millions of atoms. Recently, several algorithms have become available that solve the N-body problem by computing the effects of all pair-wise interactions while scaling in runtime less than {cal O}(N^2). One algorithm, which scales {cal O}(N) for a uniform distribution of particles, is called the Greengard-Rokhlin Fast Multipole Algorithm (FMA). This work describes an FMA-like algorithm called the Molecular Dynamics Multipole Algorithm (MDMA). The algorithm contains several features that are new to N-body algorithms. MDMA uses new, efficient series expansion equations to compute general 1/r^{n } potentials to arbitrary accuracy. In particular, the 1/r Coulomb potential and the 1/r^6 portion of the Lennard-Jones potential are implemented. The new equations are based on multivariate Taylor series expansions. In addition, MDMA uses a cell-to-cell interaction region of cells that is closely tied to worst case error bounds. The worst case error bounds for MDMA are derived in this work also. These bounds apply to other multipole algorithms as well. Several implementation enhancements are described which apply to MDMA as well as other N-body algorithms such as FMA and tree codes. The mathematics of the cell -to-cell interactions are converted to the Fourier domain for reduced operation count and faster computation. A relative indexing scheme was devised to locate cells in the interaction region which allows efficient pre-computation of redundant information and prestorage of much of the cell-to-cell interaction. Also, MDMA was integrated into the MD program SIgMA to demonstrate the performance of the program over several simulation timesteps. One MD application described here highlights the utility of including long range contributions to Lennard-Jones potential in constant pressure simulations. Another application shows the time dependence of long range forces in a multiple time step MD simulation.
Quantum speedup of Monte Carlo methods.

PubMed

Montanaro, Ashley

2015-09-08

Monte Carlo methods use random sampling to estimate numerical quantities which are hard to compute deterministically. One important example is the use in statistical physics of rapidly mixing Markov chains to approximately compute partition functions. In this work, we describe a quantum algorithm which can accelerate Monte Carlo methods in a very general setting. The algorithm estimates the expected output value of an arbitrary randomized or quantum subroutine with bounded variance, achieving a near-quadratic speedup over the best possible classical algorithm. Combining the algorithm with the use of quantum walks gives a quantum speedup of the fastest known classical algorithms with rigorous performance bounds for computing partition functions, which use multiple-stage Markov chain Monte Carlo techniques. The quantum algorithm can also be used to estimate the total variation distance between probability distributions efficiently.
A Benders based rolling horizon algorithm for a dynamic facility location problem

DOE PAGES

Marufuzzaman,, Mohammad; Gedik, Ridvan; Roni, Mohammad S.

2016-06-28

This study presents a well-known capacitated dynamic facility location problem (DFLP) that satisfies the customer demand at a minimum cost by determining the time period for opening, closing, or retaining an existing facility in a given location. To solve this challenging NP-hard problem, this paper develops a unique hybrid solution algorithm that combines a rolling horizon algorithm with an accelerated Benders decomposition algorithm. Extensive computational experiments are performed on benchmark test instances to evaluate the hybrid algorithm’s efficiency and robustness in solving the DFLP problem. Computational results indicate that the hybrid Benders based rolling horizon algorithm consistently offers high qualitymore » feasible solutions in a much shorter computational time period than the standalone rolling horizon and accelerated Benders decomposition algorithms in the experimental range.« less
Quantum speedup of Monte Carlo methods

PubMed Central

Montanaro, Ashley

2015-01-01

Monte Carlo methods use random sampling to estimate numerical quantities which are hard to compute deterministically. One important example is the use in statistical physics of rapidly mixing Markov chains to approximately compute partition functions. In this work, we describe a quantum algorithm which can accelerate Monte Carlo methods in a very general setting. The algorithm estimates the expected output value of an arbitrary randomized or quantum subroutine with bounded variance, achieving a near-quadratic speedup over the best possible classical algorithm. Combining the algorithm with the use of quantum walks gives a quantum speedup of the fastest known classical algorithms with rigorous performance bounds for computing partition functions, which use multiple-stage Markov chain Monte Carlo techniques. The quantum algorithm can also be used to estimate the total variation distance between probability distributions efficiently. PMID:26528079
Evaluation of stochastic algorithms for financial mathematics problems from point of view of energy-efficiency

DOE Office of Scientific and Technical Information (OSTI.GOV)

Atanassov, E.; Dimitrov, D., E-mail: d.slavov@bas.bg, E-mail: emanouil@parallel.bas.bg, E-mail: gurov@bas.bg; Gurov, T.

2015-10-28

The recent developments in the area of high-performance computing are driven not only by the desire for ever higher performance but also by the rising costs of electricity. The use of various types of accelerators like GPUs, Intel Xeon Phi has become mainstream and many algorithms and applications have been ported to make use of them where available. In Financial Mathematics the question of optimal use of computational resources should also take into account the limitations on space, because in many use cases the servers are deployed close to the exchanges. In this work we evaluate various algorithms for optionmore » pricing that we have implemented for different target architectures in terms of their energy and space efficiency. Since it has been established that low-discrepancy sequences may be better than pseudorandom numbers for these types of algorithms, we also test the Sobol and Halton sequences. We present the raw results, the computed metrics and conclusions from our tests.« less
A diagonal algorithm for the method of pseudocompressibility. [for steady-state solution to incompressible Navier-Stokes equation

NASA Technical Reports Server (NTRS)

Rogers, S. E.; Kwak, D.; Chang, J. L. C.

1986-01-01

The method of pseudocompressibility has been shown to be an efficient method for obtaining a steady-state solution to the incompressible Navier-Stokes equations. Recent improvements to this method include the use of a diagonal scheme for the inversion of the equations at each iteration. The necessary transformations have been derived for the pseudocompressibility equations in generalized coordinates. The diagonal algorithm reduces the computing time necessary to obtain a steady-state solution by a factor of nearly three. Implicit viscous terms are maintained in the equations, and it has become possible to use fourth-order implicit dissipation. The steady-state solution is unchanged by the approximations resulting from the diagonalization of the equations. Computed results for flow over a two-dimensional backward-facing step and a three-dimensional cylinder mounted normal to a flat plate are presented for both the old and new algorithms. The accuracy and computing efficiency of these algorithms are compared.
Evaluation of stochastic algorithms for financial mathematics problems from point of view of energy-efficiency

NASA Astrophysics Data System (ADS)

Atanassov, E.; Dimitrov, D.; Gurov, T.

2015-10-01

The recent developments in the area of high-performance computing are driven not only by the desire for ever higher performance but also by the rising costs of electricity. The use of various types of accelerators like GPUs, Intel Xeon Phi has become mainstream and many algorithms and applications have been ported to make use of them where available. In Financial Mathematics the question of optimal use of computational resources should also take into account the limitations on space, because in many use cases the servers are deployed close to the exchanges. In this work we evaluate various algorithms for option pricing that we have implemented for different target architectures in terms of their energy and space efficiency. Since it has been established that low-discrepancy sequences may be better than pseudorandom numbers for these types of algorithms, we also test the Sobol and Halton sequences. We present the raw results, the computed metrics and conclusions from our tests.
Efficient parallel implementation of active appearance model fitting algorithm on GPU.

PubMed

Wang, Jinwei; Ma, Xirong; Zhu, Yuanping; Sun, Jizhou

2014-01-01

The active appearance model (AAM) is one of the most powerful model-based object detecting and tracking methods which has been widely used in various situations. However, the high-dimensional texture representation causes very time-consuming computations, which makes the AAM difficult to apply to real-time systems. The emergence of modern graphics processing units (GPUs) that feature a many-core, fine-grained parallel architecture provides new and promising solutions to overcome the computational challenge. In this paper, we propose an efficient parallel implementation of the AAM fitting algorithm on GPUs. Our design idea is fine grain parallelism in which we distribute the texture data of the AAM, in pixels, to thousands of parallel GPU threads for processing, which makes the algorithm fit better into the GPU architecture. We implement our algorithm using the compute unified device architecture (CUDA) on the Nvidia's GTX 650 GPU, which has the latest Kepler architecture. To compare the performance of our algorithm with different data sizes, we built sixteen face AAM models of different dimensional textures. The experiment results show that our parallel AAM fitting algorithm can achieve real-time performance for videos even on very high-dimensional textures.

Efficient Parallel Implementation of Active Appearance Model Fitting Algorithm on GPU

PubMed Central

Wang, Jinwei; Ma, Xirong; Zhu, Yuanping; Sun, Jizhou

2014-01-01

The active appearance model (AAM) is one of the most powerful model-based object detecting and tracking methods which has been widely used in various situations. However, the high-dimensional texture representation causes very time-consuming computations, which makes the AAM difficult to apply to real-time systems. The emergence of modern graphics processing units (GPUs) that feature a many-core, fine-grained parallel architecture provides new and promising solutions to overcome the computational challenge. In this paper, we propose an efficient parallel implementation of the AAM fitting algorithm on GPUs. Our design idea is fine grain parallelism in which we distribute the texture data of the AAM, in pixels, to thousands of parallel GPU threads for processing, which makes the algorithm fit better into the GPU architecture. We implement our algorithm using the compute unified device architecture (CUDA) on the Nvidia's GTX 650 GPU, which has the latest Kepler architecture. To compare the performance of our algorithm with different data sizes, we built sixteen face AAM models of different dimensional textures. The experiment results show that our parallel AAM fitting algorithm can achieve real-time performance for videos even on very high-dimensional textures. PMID:24723812
Nonlinear Dynamic Model-Based Multiobjective Sensor Network Design Algorithm for a Plant with an Estimator-Based Control System

DOE Office of Scientific and Technical Information (OSTI.GOV)

Paul, Prokash; Bhattacharyya, Debangsu; Turton, Richard

Here, a novel sensor network design (SND) algorithm is developed for maximizing process efficiency while minimizing sensor network cost for a nonlinear dynamic process with an estimator-based control system. The multiobjective optimization problem is solved following a lexicographic approach where the process efficiency is maximized first followed by minimization of the sensor network cost. The partial net present value, which combines the capital cost due to the sensor network and the operating cost due to deviation from the optimal efficiency, is proposed as an alternative objective. The unscented Kalman filter is considered as the nonlinear estimator. The large-scale combinatorial optimizationmore » problem is solved using a genetic algorithm. The developed SND algorithm is applied to an acid gas removal (AGR) unit as part of an integrated gasification combined cycle (IGCC) power plant with CO 2 capture. Due to the computational expense, a reduced order nonlinear model of the AGR process is identified and parallel computation is performed during implementation.« less
Nonlinear Dynamic Model-Based Multiobjective Sensor Network Design Algorithm for a Plant with an Estimator-Based Control System

DOE PAGES

Paul, Prokash; Bhattacharyya, Debangsu; Turton, Richard; ...

2017-06-06

Here, a novel sensor network design (SND) algorithm is developed for maximizing process efficiency while minimizing sensor network cost for a nonlinear dynamic process with an estimator-based control system. The multiobjective optimization problem is solved following a lexicographic approach where the process efficiency is maximized first followed by minimization of the sensor network cost. The partial net present value, which combines the capital cost due to the sensor network and the operating cost due to deviation from the optimal efficiency, is proposed as an alternative objective. The unscented Kalman filter is considered as the nonlinear estimator. The large-scale combinatorial optimizationmore » problem is solved using a genetic algorithm. The developed SND algorithm is applied to an acid gas removal (AGR) unit as part of an integrated gasification combined cycle (IGCC) power plant with CO 2 capture. Due to the computational expense, a reduced order nonlinear model of the AGR process is identified and parallel computation is performed during implementation.« less
Framework for computationally efficient optimal irrigation scheduling using ant colony optimization

USDA-ARS?s Scientific Manuscript database

A general optimization framework is introduced with the overall goal of reducing search space size and increasing the computational efficiency of evolutionary algorithm application for optimal irrigation scheduling. The framework achieves this goal by representing the problem in the form of a decisi...
An Accurate and Efficient Algorithm for Detection of Radio Bursts with an Unknown Dispersion Measure, for Single-dish Telescopes and Interferometers

NASA Astrophysics Data System (ADS)

Zackay, Barak; Ofek, Eran O.

2017-01-01

Astronomical radio signals are subjected to phase dispersion while traveling through the interstellar medium. To optimally detect a short-duration signal within a frequency band, we have to precisely compensate for the unknown pulse dispersion, which is a computationally demanding task. We present the “fast dispersion measure transform” algorithm for optimal detection of such signals. Our algorithm has a low theoretical complexity of 2{N}f{N}t+{N}t{N}{{Δ }}{{log}}2({N}f), where Nf, Nt, and NΔ are the numbers of frequency bins, time bins, and dispersion measure bins, respectively. Unlike previously suggested fast algorithms, our algorithm conserves the sensitivity of brute-force dedispersion. Our tests indicate that this algorithm, running on a standard desktop computer and implemented in a high-level programming language, is already faster than the state-of-the-art dedispersion codes running on graphical processing units (GPUs). We also present a variant of the algorithm that can be efficiently implemented on GPUs. The latter algorithm’s computation and data-transport requirements are similar to those of a two-dimensional fast Fourier transform, indicating that incoherent dedispersion can now be considered a nonissue while planning future surveys. We further present a fast algorithm for sensitive detection of pulses shorter than the dispersive smearing limits of incoherent dedispersion. In typical cases, this algorithm is orders of magnitude faster than enumerating dispersion measures and coherently dedispersing by convolution. We analyze the computational complexity of pulsed signal searches by radio interferometers. We conclude that, using our suggested algorithms, maximally sensitive blind searches for dispersed pulses are feasible using existing facilities. We provide an implementation of these algorithms in Python and MATLAB.
Three-dimensional photoacoustic tomography based on graphics-processing-unit-accelerated finite element method.

PubMed

Peng, Kuan; He, Ling; Zhu, Ziqiang; Tang, Jingtian; Xiao, Jiaying

2013-12-01

Compared with commonly used analytical reconstruction methods, the frequency-domain finite element method (FEM) based approach has proven to be an accurate and flexible algorithm for photoacoustic tomography. However, the FEM-based algorithm is computationally demanding, especially for three-dimensional cases. To enhance the algorithm's efficiency, in this work a parallel computational strategy is implemented in the framework of the FEM-based reconstruction algorithm using a graphic-processing-unit parallel frame named the "compute unified device architecture." A series of simulation experiments is carried out to test the accuracy and accelerating effect of the improved method. The results obtained indicate that the parallel calculation does not change the accuracy of the reconstruction algorithm, while its computational cost is significantly reduced by a factor of 38.9 with a GTX 580 graphics card using the improved method.
Fast algorithm for bilinear transforms in optics

NASA Astrophysics Data System (ADS)

Ostrovsky, Andrey S.; Martinez-Niconoff, Gabriel C.; Ramos Romero, Obdulio; Cortes, Liliana

2000-10-01

The fast algorithm for calculating the bilinear transform in the optical system is proposed. This algorithm is based on the coherent-mode representation of the cross-spectral density function of the illumination. The algorithm is computationally efficient when the illumination is partially coherent. Numerical examples are studied and compared with the theoretical results.
Development of generalized pressure velocity coupling scheme for the analysis of compressible and incompressible combusting flows

NASA Technical Reports Server (NTRS)

Chen, C. P.; Wu, S. T.

1992-01-01

The objective of this investigation has been to develop an algorithm (or algorithms) for the improvement of the accuracy and efficiency of the computer fluid dynamics (CFD) models to study the fundamental physics of combustion chamber flows, which are necessary ultimately for the design of propulsion systems such as SSME and STME. During this three year study (May 19, 1978 - May 18, 1992), a unique algorithm was developed for all speed flows. This newly developed algorithm basically consists of two pressure-based algorithms (i.e. PISOC and MFICE). This PISOC is a non-iterative scheme and the FICE is an iterative scheme where PISOC has the characteristic advantages on low and high speed flows and the modified FICE has shown its efficiency and accuracy to compute the flows in the transonic region. A new algorithm is born from a combination of these two algorithms. This newly developed algorithm has general application in both time-accurate and steady state flows, and also was tested extensively for various flow conditions, such as turbulent flows, chemically reacting flows, and multiphase flows.
Computationally efficient algorithm for high sampling-frequency operation of active noise control

NASA Astrophysics Data System (ADS)

Rout, Nirmal Kumar; Das, Debi Prasad; Panda, Ganapati

2015-05-01

In high sampling-frequency operation of active noise control (ANC) system the length of the secondary path estimate and the ANC filter are very long. This increases the computational complexity of the conventional filtered-x least mean square (FXLMS) algorithm. To reduce the computational complexity of long order ANC system using FXLMS algorithm, frequency domain block ANC algorithms have been proposed in past. These full block frequency domain ANC algorithms are associated with some disadvantages such as large block delay, quantization error due to computation of large size transforms and implementation difficulties in existing low-end DSP hardware. To overcome these shortcomings, the partitioned block ANC algorithm is newly proposed where the long length filters in ANC are divided into a number of equal partitions and suitably assembled to perform the FXLMS algorithm in the frequency domain. The complexity of this proposed frequency domain partitioned block FXLMS (FPBFXLMS) algorithm is quite reduced compared to the conventional FXLMS algorithm. It is further reduced by merging one fast Fourier transform (FFT)-inverse fast Fourier transform (IFFT) combination to derive the reduced structure FPBFXLMS (RFPBFXLMS) algorithm. Computational complexity analysis for different orders of filter and partition size are presented. Systematic computer simulations are carried out for both the proposed partitioned block ANC algorithms to show its accuracy compared to the time domain FXLMS algorithm.
Bandwidth reduction for video-on-demand broadcasting using secondary content insertion

NASA Astrophysics Data System (ADS)

Golynski, Alexander; Lopez-Ortiz, Alejandro; Poirier, Guillaume; Quimper, Claude-Guy

2005-01-01

An optimal broadcasting scheme under the presence of secondary content (i.e. advertisements) is proposed. The proposed scheme works both for movies encoded in a Constant Bit Rate (CBR) or a Variable Bit Rate (VBR) format. It is shown experimentally that secondary content in movies can make Video-on-Demand (VoD) broadcasting systems more efficient. An efficient algorithm is given to compute the optimal broadcasting schedule with secondary content, which in particular significantly improves over the best previously known algorithm for computing the optimal broadcasting schedule without secondary content.
Efficient quantum algorithm for computing n-time correlation functions.

PubMed

Pedernales, J S; Di Candia, R; Egusquiza, I L; Casanova, J; Solano, E

2014-07-11

We propose a method for computing n-time correlation functions of arbitrary spinorial, fermionic, and bosonic operators, consisting of an efficient quantum algorithm that encodes these correlations in an initially added ancillary qubit for probe and control tasks. For spinorial and fermionic systems, the reconstruction of arbitrary n-time correlation functions requires the measurement of two ancilla observables, while for bosonic variables time derivatives of the same observables are needed. Finally, we provide examples applicable to different quantum platforms in the frame of the linear response theory.
Fast template matching with polynomials.

PubMed

Omachi, Shinichiro; Omachi, Masako

2007-08-01

Template matching is widely used for many applications in image and signal processing. This paper proposes a novel template matching algorithm, called algebraic template matching. Given a template and an input image, algebraic template matching efficiently calculates similarities between the template and the partial images of the input image, for various widths and heights. The partial image most similar to the template image is detected from the input image for any location, width, and height. In the proposed algorithm, a polynomial that approximates the template image is used to match the input image instead of the template image. The proposed algorithm is effective especially when the width and height of the template image differ from the partial image to be matched. An algorithm using the Legendre polynomial is proposed for efficient approximation of the template image. This algorithm not only reduces computational costs, but also improves the quality of the approximated image. It is shown theoretically and experimentally that the computational cost of the proposed algorithm is much smaller than the existing methods.
Evaluation of the discrete vortex wake cross flow model using vector computers. Part 1: Theory and application

NASA Technical Reports Server (NTRS)

1979-01-01

The current program had the objective to modify a discrete vortex wake method to efficiently compute the aerodynamic forces and moments on high fineness ratio bodies (f approximately 10.0). The approach is to increase computational efficiency by structuring the program to take advantage of new computer vector software and by developing new algorithms when vector software can not efficiently be used. An efficient program was written and substantial savings achieved. Several test cases were run for fineness ratios up to f = 16.0 and angles of attack up to 50 degrees.
Computational complexities and storage requirements of some Riccati equation solvers

NASA Technical Reports Server (NTRS)

Utku, Senol; Garba, John A.; Ramesh, A. V.

1989-01-01

The linear optimal control problem of an nth-order time-invariant dynamic system with a quadratic performance functional is usually solved by the Hamilton-Jacobi approach. This leads to the solution of the differential matrix Riccati equation with a terminal condition. The bulk of the computation for the optimal control problem is related to the solution of this equation. There are various algorithms in the literature for solving the matrix Riccati equation. However, computational complexities and storage requirements as a function of numbers of state variables, control variables, and sensors are not available for all these algorithms. In this work, the computational complexities and storage requirements for some of these algorithms are given. These expressions show the immensity of the computational requirements of the algorithms in solving the Riccati equation for large-order systems such as the control of highly flexible space structures. The expressions are also needed to compute the speedup and efficiency of any implementation of these algorithms on concurrent machines.
A Branch-and-Bound Algorithm for Fitting Anti-Robinson Structures to Symmetric Dissimilarity Matrices.

ERIC Educational Resources Information Center

Brusco, Michael J.

2002-01-01

Developed a branch-and-bound algorithm that can be used to seriate a symmetric dissimilarity matrix by identifying a reordering of rows and columns of the matrix optimizing an anti-Robinson criterion. Computational results suggest that with respect to computational efficiency, the approach is generally competitive with dynamic programming. (SLD)
Structure-guided Protein Transition Modeling with a Probabilistic Roadmap Algorithm.

PubMed

Maximova, Tatiana; Plaku, Erion; Shehu, Amarda

2016-07-07

Proteins are macromolecules in perpetual motion, switching between structural states to modulate their function. A detailed characterization of the precise yet complex relationship between protein structure, dynamics, and function requires elucidating transitions between functionally-relevant states. Doing so challenges both wet and dry laboratories, as protein dynamics involves disparate temporal scales. In this paper we present a novel, sampling-based algorithm to compute transition paths. The algorithm exploits two main ideas. First, it leverages known structures to initialize its search and define a reduced conformation space for rapid sampling. This is key to address the insufficient sampling issue suffered by sampling-based algorithms. Second, the algorithm embeds samples in a nearest-neighbor graph where transition paths can be efficiently computed via queries. The algorithm adapts the probabilistic roadmap framework that is popular in robot motion planning. In addition to efficiently computing lowest-cost paths between any given structures, the algorithm allows investigating hypotheses regarding the order of experimentally-known structures in a transition event. This novel contribution is likely to open up new venues of research. Detailed analysis is presented on multiple-basin proteins of relevance to human disease. Multiscaling and the AMBER ff14SB force field are used to obtain energetically-credible paths at atomistic detail.
Group implicit concurrent algorithms in nonlinear structural dynamics

NASA Technical Reports Server (NTRS)

Ortiz, M.; Sotelino, E. D.

1989-01-01

During the 70's and 80's, considerable effort was devoted to developing efficient and reliable time stepping procedures for transient structural analysis. Mathematically, the equations governing this type of problems are generally stiff, i.e., they exhibit a wide spectrum in the linear range. The algorithms best suited to this type of applications are those which accurately integrate the low frequency content of the response without necessitating the resolution of the high frequency modes. This means that the algorithms must be unconditionally stable, which in turn rules out explicit integration. The most exciting possibility in the algorithms development area in recent years has been the advent of parallel computers with multiprocessing capabilities. So, this work is mainly concerned with the development of parallel algorithms in the area of structural dynamics. A primary objective is to devise unconditionally stable and accurate time stepping procedures which lend themselves to an efficient implementation in concurrent machines. Some features of the new computer architecture are summarized. A brief survey of current efforts in the area is presented. A new class of concurrent procedures, or Group Implicit algorithms is introduced and analyzed. The numerical simulation shows that GI algorithms hold considerable promise for application in coarse grain as well as medium grain parallel computers.
An index-based algorithm for fast on-line query processing of latent semantic analysis

PubMed Central

Li, Pohan; Wang, Wei

2017-01-01

Latent Semantic Analysis (LSA) is widely used for finding the documents whose semantic is similar to the query of keywords. Although LSA yield promising similar results, the existing LSA algorithms involve lots of unnecessary operations in similarity computation and candidate check during on-line query processing, which is expensive in terms of time cost and cannot efficiently response the query request especially when the dataset becomes large. In this paper, we study the efficiency problem of on-line query processing for LSA towards efficiently searching the similar documents to a given query. We rewrite the similarity equation of LSA combined with an intermediate value called partial similarity that is stored in a designed index called partial index. For reducing the searching space, we give an approximate form of similarity equation, and then develop an efficient algorithm for building partial index, which skips the partial similarities lower than a given threshold θ. Based on partial index, we develop an efficient algorithm called ILSA for supporting fast on-line query processing. The given query is transformed into a pseudo document vector, and the similarities between query and candidate documents are computed by accumulating the partial similarities obtained from the index nodes corresponds to non-zero entries in the pseudo document vector. Compared to the LSA algorithm, ILSA reduces the time cost of on-line query processing by pruning the candidate documents that are not promising and skipping the operations that make little contribution to similarity scores. Extensive experiments through comparison with LSA have been done, which demonstrate the efficiency and effectiveness of our proposed algorithm. PMID:28520747
An index-based algorithm for fast on-line query processing of latent semantic analysis.

PubMed

Zhang, Mingxi; Li, Pohan; Wang, Wei

2017-01-01

Latent Semantic Analysis (LSA) is widely used for finding the documents whose semantic is similar to the query of keywords. Although LSA yield promising similar results, the existing LSA algorithms involve lots of unnecessary operations in similarity computation and candidate check during on-line query processing, which is expensive in terms of time cost and cannot efficiently response the query request especially when the dataset becomes large. In this paper, we study the efficiency problem of on-line query processing for LSA towards efficiently searching the similar documents to a given query. We rewrite the similarity equation of LSA combined with an intermediate value called partial similarity that is stored in a designed index called partial index. For reducing the searching space, we give an approximate form of similarity equation, and then develop an efficient algorithm for building partial index, which skips the partial similarities lower than a given threshold θ. Based on partial index, we develop an efficient algorithm called ILSA for supporting fast on-line query processing. The given query is transformed into a pseudo document vector, and the similarities between query and candidate documents are computed by accumulating the partial similarities obtained from the index nodes corresponds to non-zero entries in the pseudo document vector. Compared to the LSA algorithm, ILSA reduces the time cost of on-line query processing by pruning the candidate documents that are not promising and skipping the operations that make little contribution to similarity scores. Extensive experiments through comparison with LSA have been done, which demonstrate the efficiency and effectiveness of our proposed algorithm.
A ridge tracking algorithm and error estimate for efficient computation of Lagrangian coherent structures.

PubMed

Lipinski, Doug; Mohseni, Kamran

2010-03-01

A ridge tracking algorithm for the computation and extraction of Lagrangian coherent structures (LCS) is developed. This algorithm takes advantage of the spatial coherence of LCS by tracking the ridges which form LCS to avoid unnecessary computations away from the ridges. We also make use of the temporal coherence of LCS by approximating the time dependent motion of the LCS with passive tracer particles. To justify this approximation, we provide an estimate of the difference between the motion of the LCS and that of tracer particles which begin on the LCS. In addition to the speedup in computational time, the ridge tracking algorithm uses less memory and results in smaller output files than the standard LCS algorithm. Finally, we apply our ridge tracking algorithm to two test cases, an analytically defined double gyre as well as the more complicated example of the numerical simulation of a swimming jellyfish. In our test cases, we find up to a 35 times speedup when compared with the standard LCS algorithm.

Computational path planner for product assembly in complex environments

NASA Astrophysics Data System (ADS)

Shang, Wei; Liu, Jianhua; Ning, Ruxin; Liu, Mi

2013-03-01

Assembly path planning is a crucial problem in assembly related design and manufacturing processes. Sampling based motion planning algorithms are used for computational assembly path planning. However, the performance of such algorithms may degrade much in environments with complex product structure, narrow passages or other challenging scenarios. A computational path planner for automatic assembly path planning in complex 3D environments is presented. The global planning process is divided into three phases based on the environment and specific algorithms are proposed and utilized in each phase to solve the challenging issues. A novel ray test based stochastic collision detection method is proposed to evaluate the intersection between two polyhedral objects. This method avoids fake collisions in conventional methods and degrades the geometric constraint when a part has to be removed with surface contact with other parts. A refined history based rapidly-exploring random tree (RRT) algorithm which bias the growth of the tree based on its planning history is proposed and employed in the planning phase where the path is simple but the space is highly constrained. A novel adaptive RRT algorithm is developed for the path planning problem with challenging scenarios and uncertain environment. With extending values assigned on each tree node and extending schemes applied, the tree can adapts its growth to explore complex environments more efficiently. Experiments on the key algorithms are carried out and comparisons are made between the conventional path planning algorithms and the presented ones. The comparing results show that based on the proposed algorithms, the path planner can compute assembly path in challenging complex environments more efficiently and with higher success. This research provides the references to the study of computational assembly path planning under complex environments.
PSC algorithm description

NASA Technical Reports Server (NTRS)

Nobbs, Steven G.

1995-01-01

An overview of the performance seeking control (PSC) algorithm and details of the important components of the algorithm are given. The onboard propulsion system models, the linear programming optimization, and engine control interface are described. The PSC algorithm receives input from various computers on the aircraft including the digital flight computer, digital engine control, and electronic inlet control. The PSC algorithm contains compact models of the propulsion system including the inlet, engine, and nozzle. The models compute propulsion system parameters, such as inlet drag and fan stall margin, which are not directly measurable in flight. The compact models also compute sensitivities of the propulsion system parameters to change in control variables. The engine model consists of a linear steady state variable model (SSVM) and a nonlinear model. The SSVM is updated with efficiency factors calculated in the engine model update logic, or Kalman filter. The efficiency factors are used to adjust the SSVM to match the actual engine. The propulsion system models are mathematically integrated to form an overall propulsion system model. The propulsion system model is then optimized using a linear programming optimization scheme. The goal of the optimization is determined from the selected PSC mode of operation. The resulting trims are used to compute a new operating point about which the optimization process is repeated. This process is continued until an overall (global) optimum is reached before applying the trims to the controllers.
High performance transcription factor-DNA docking with GPU computing

PubMed Central

2012-01-01

Background Protein-DNA docking is a very challenging problem in structural bioinformatics and has important implications in a number of applications, such as structure-based prediction of transcription factor binding sites and rational drug design. Protein-DNA docking is very computational demanding due to the high cost of energy calculation and the statistical nature of conformational sampling algorithms. More importantly, experiments show that the docking quality depends on the coverage of the conformational sampling space. It is therefore desirable to accelerate the computation of the docking algorithm, not only to reduce computing time, but also to improve docking quality. Methods In an attempt to accelerate the sampling process and to improve the docking performance, we developed a graphics processing unit (GPU)-based protein-DNA docking algorithm. The algorithm employs a potential-based energy function to describe the binding affinity of a protein-DNA pair, and integrates Monte-Carlo simulation and a simulated annealing method to search through the conformational space. Algorithmic techniques were developed to improve the computation efficiency and scalability on GPU-based high performance computing systems. Results The effectiveness of our approach is tested on a non-redundant set of 75 TF-DNA complexes and a newly developed TF-DNA docking benchmark. We demonstrated that the GPU-based docking algorithm can significantly accelerate the simulation process and thereby improving the chance of finding near-native TF-DNA complex structures. This study also suggests that further improvement in protein-DNA docking research would require efforts from two integral aspects: improvement in computation efficiency and energy function design. Conclusions We present a high performance computing approach for improving the prediction accuracy of protein-DNA docking. The GPU-based docking algorithm accelerates the search of the conformational space and thus increases the chance of finding more near-native structures. To the best of our knowledge, this is the first ad hoc effort of applying GPU or GPU clusters to the protein-DNA docking problem. PMID:22759575
Single-step reinitialization and extending algorithms for level-set based multi-phase flow simulations

NASA Astrophysics Data System (ADS)

Fu, Lin; Hu, Xiangyu Y.; Adams, Nikolaus A.

2017-12-01

We propose efficient single-step formulations for reinitialization and extending algorithms, which are critical components of level-set based interface-tracking methods. The level-set field is reinitialized with a single-step (non iterative) "forward tracing" algorithm. A minimum set of cells is defined that describes the interface, and reinitialization employs only data from these cells. Fluid states are extrapolated or extended across the interface by a single-step "backward tracing" algorithm. Both algorithms, which are motivated by analogy to ray-tracing, avoid multiple block-boundary data exchanges that are inevitable for iterative reinitialization and extending approaches within a parallel-computing environment. The single-step algorithms are combined with a multi-resolution conservative sharp-interface method and validated by a wide range of benchmark test cases. We demonstrate that the proposed reinitialization method achieves second-order accuracy in conserving the volume of each phase. The interface location is invariant to reapplication of the single-step reinitialization. Generally, we observe smaller absolute errors than for standard iterative reinitialization on the same grid. The computational efficiency is higher than for the standard and typical high-order iterative reinitialization methods. We observe a 2- to 6-times efficiency improvement over the standard method for serial execution. The proposed single-step extending algorithm, which is commonly employed for assigning data to ghost cells with ghost-fluid or conservative interface interaction methods, shows about 10-times efficiency improvement over the standard method while maintaining same accuracy. Despite their simplicity, the proposed algorithms offer an efficient and robust alternative to iterative reinitialization and extending methods for level-set based multi-phase simulations.
Sparse Regression as a Sparse Eigenvalue Problem

NASA Technical Reports Server (NTRS)

Moghaddam, Baback; Gruber, Amit; Weiss, Yair; Avidan, Shai

2008-01-01

We extend the l0-norm "subspectral" algorithms for sparse-LDA [5] and sparse-PCA [6] to general quadratic costs such as MSE in linear (kernel) regression. The resulting "Sparse Least Squares" (SLS) problem is also NP-hard, by way of its equivalence to a rank-1 sparse eigenvalue problem (e.g., binary sparse-LDA [7]). Specifically, for a general quadratic cost we use a highly-efficient technique for direct eigenvalue computation using partitioned matrix inverses which leads to dramatic x103 speed-ups over standard eigenvalue decomposition. This increased efficiency mitigates the O(n4) scaling behaviour that up to now has limited the previous algorithms' utility for high-dimensional learning problems. Moreover, the new computation prioritizes the role of the less-myopic backward elimination stage which becomes more efficient than forward selection. Similarly, branch-and-bound search for Exact Sparse Least Squares (ESLS) also benefits from partitioned matrix inverse techniques. Our Greedy Sparse Least Squares (GSLS) generalizes Natarajan's algorithm [9] also known as Order-Recursive Matching Pursuit (ORMP). Specifically, the forward half of GSLS is exactly equivalent to ORMP but more efficient. By including the backward pass, which only doubles the computation, we can achieve lower MSE than ORMP. Experimental comparisons to the state-of-the-art LARS algorithm [3] show forward-GSLS is faster, more accurate and more flexible in terms of choice of regularization
Texture functions in image analysis: A computationally efficient solution

NASA Technical Reports Server (NTRS)

Cox, S. C.; Rose, J. F.

1983-01-01

A computationally efficient means for calculating texture measurements from digital images by use of the co-occurrence technique is presented. The calculation of the statistical descriptors of image texture and a solution that circumvents the need for calculating and storing a co-occurrence matrix are discussed. The results show that existing efficient algorithms for calculating sums, sums of squares, and cross products can be used to compute complex co-occurrence relationships directly from the digital image input.
Algorithms for Zonal Methods and Development of Three Dimensional Mesh Generation Procedures.

DTIC Science & Technology

1984-02-01

a r-re complete set of equations is used, but their effect is imposed by means of a right hand side forcing function, not by means of a zonal boundary...modifications of flow-simulation algorithms The explicit finite-difference code of Magnus and are discussed. Computational tests in two dimensions...used to simplify the task of grid generation without an adverse achieve computational efficiency. More recently, effect on flow-field algorithms and
Cache and energy efficient algorithms for Nussinov's RNA Folding.

PubMed

Zhao, Chunchun; Sahni, Sartaj

2017-12-06

An RNA folding/RNA secondary structure prediction algorithm determines the non-nested/pseudoknot-free structure by maximizing the number of complementary base pairs and minimizing the energy. Several implementations of Nussinov's classical RNA folding algorithm have been proposed. Our focus is to obtain run time and energy efficiency by reducing the number of cache misses. Three cache-efficient algorithms, ByRow, ByRowSegment and ByBox, for Nussinov's RNA folding are developed. Using a simple LRU cache model, we show that the Classical algorithm of Nussinov has the highest number of cache misses followed by the algorithms Transpose (Li et al.), ByRow, ByRowSegment, and ByBox (in this order). Extensive experiments conducted on four computational platforms-Xeon E5, AMD Athlon 64 X2, Intel I7 and PowerPC A2-using two programming languages-C and Java-show that our cache efficient algorithms are also efficient in terms of run time and energy. Our benchmarking shows that, depending on the computational platform and programming language, either ByRow or ByBox give best run time and energy performance. The C version of these algorithms reduce run time by as much as 97.2% and energy consumption by as much as 88.8% relative to Classical and by as much as 56.3% and 57.8% relative to Transpose. The Java versions reduce run time by as much as 98.3% relative to Classical and by as much as 75.2% relative to Transpose. Transpose achieves run time and energy efficiency at the expense of memory as it takes twice the memory required by Classical. The memory required by ByRow, ByRowSegment, and ByBox is the same as that of Classical. As a result, using the same amount of memory, the algorithms proposed by us can solve problems up to 40% larger than those solvable by Transpose.
Three-dimensional near-field MIMO array imaging using range migration techniques.

PubMed

Zhuge, Xiaodong; Yarovoy, Alexander G

2012-06-01

This paper presents a 3-D near-field imaging algorithm that is formulated for 2-D wideband multiple-input-multiple-output (MIMO) imaging array topology. The proposed MIMO range migration technique performs the image reconstruction procedure in the frequency-wavenumber domain. The algorithm is able to completely compensate the curvature of the wavefront in the near-field through a specifically defined interpolation process and provides extremely high computational efficiency by the application of the fast Fourier transform. The implementation aspects of the algorithm and the sampling criteria of a MIMO aperture are discussed. The image reconstruction performance and computational efficiency of the algorithm are demonstrated both with numerical simulations and measurements using 2-D MIMO arrays. Real-time 3-D near-field imaging can be achieved with a real-aperture array by applying the proposed MIMO range migration techniques.
Distributed Algorithm for Voronoi Partition of Wireless Sensor Networks with a Limited Sensing Range.

PubMed

He, Chenlong; Feng, Zuren; Ren, Zhigang

2018-02-03

For Wireless Sensor Networks (WSNs), the Voronoi partition of a region is a challenging problem owing to the limited sensing ability of each sensor and the distributed organization of the network. In this paper, an algorithm is proposed for each sensor having a limited sensing range to compute its limited Voronoi cell autonomously, so that the limited Voronoi partition of the entire WSN is generated in a distributed manner. Inspired by Graham's Scan (GS) algorithm used to compute the convex hull of a point set, the limited Voronoi cell of each sensor is obtained by sequentially scanning two consecutive bisectors between the sensor and its neighbors. The proposed algorithm called the Boundary Scan (BS) algorithm has a lower computational complexity than the existing Range-Constrained Voronoi Cell (RCVC) algorithm and reaches the lower bound of the computational complexity of the algorithms used to solve the problem of this kind. Moreover, it also improves the time efficiency of a key step in the Adjust-Sensing-Radius (ASR) algorithm used to compute the exact Voronoi cell. Extensive numerical simulations are performed to demonstrate the correctness and effectiveness of the BS algorithm. The distributed realization of the BS combined with a localization algorithm in WSNs is used to justify the WSN nature of the proposed algorithm.
Distributed Algorithm for Voronoi Partition of Wireless Sensor Networks with a Limited Sensing Range

PubMed Central

Feng, Zuren; Ren, Zhigang

2018-01-01

For Wireless Sensor Networks (WSNs), the Voronoi partition of a region is a challenging problem owing to the limited sensing ability of each sensor and the distributed organization of the network. In this paper, an algorithm is proposed for each sensor having a limited sensing range to compute its limited Voronoi cell autonomously, so that the limited Voronoi partition of the entire WSN is generated in a distributed manner. Inspired by Graham’s Scan (GS) algorithm used to compute the convex hull of a point set, the limited Voronoi cell of each sensor is obtained by sequentially scanning two consecutive bisectors between the sensor and its neighbors. The proposed algorithm called the Boundary Scan (BS) algorithm has a lower computational complexity than the existing Range-Constrained Voronoi Cell (RCVC) algorithm and reaches the lower bound of the computational complexity of the algorithms used to solve the problem of this kind. Moreover, it also improves the time efficiency of a key step in the Adjust-Sensing-Radius (ASR) algorithm used to compute the exact Voronoi cell. Extensive numerical simulations are performed to demonstrate the correctness and effectiveness of the BS algorithm. The distributed realization of the BS combined with a localization algorithm in WSNs is used to justify the WSN nature of the proposed algorithm. PMID:29401649
An accurate and computationally efficient algorithm for ground peak identification in large footprint waveform LiDAR data

NASA Astrophysics Data System (ADS)

Zhuang, Wei; Mountrakis, Giorgos

2014-09-01

Large footprint waveform LiDAR sensors have been widely used for numerous airborne studies. Ground peak identification in a large footprint waveform is a significant bottleneck in exploring full usage of the waveform datasets. In the current study, an accurate and computationally efficient algorithm was developed for ground peak identification, called Filtering and Clustering Algorithm (FICA). The method was evaluated on Land, Vegetation, and Ice Sensor (LVIS) waveform datasets acquired over Central NY. FICA incorporates a set of multi-scale second derivative filters and a k-means clustering algorithm in order to avoid detecting false ground peaks. FICA was tested in five different land cover types (deciduous trees, coniferous trees, shrub, grass and developed area) and showed more accurate results when compared to existing algorithms. More specifically, compared with Gaussian decomposition, the RMSE ground peak identification by FICA was 2.82 m (5.29 m for GD) in deciduous plots, 3.25 m (4.57 m for GD) in coniferous plots, 2.63 m (2.83 m for GD) in shrub plots, 0.82 m (0.93 m for GD) in grass plots, and 0.70 m (0.51 m for GD) in plots of developed areas. FICA performance was also relatively consistent under various slope and canopy coverage (CC) conditions. In addition, FICA showed better computational efficiency compared to existing methods. FICA's major computational and accuracy advantage is a result of the adopted multi-scale signal processing procedures that concentrate on local portions of the signal as opposed to the Gaussian decomposition that uses a curve-fitting strategy applied in the entire signal. The FICA algorithm is a good candidate for large-scale implementation on future space-borne waveform LiDAR sensors.
Computing Bounds on Resource Levels for Flexible Plans

NASA Technical Reports Server (NTRS)

Muscvettola, Nicola; Rijsman, David

2009-01-01

A new algorithm efficiently computes the tightest exact bound on the levels of resources induced by a flexible activity plan (see figure). Tightness of bounds is extremely important for computations involved in planning because tight bounds can save potentially exponential amounts of search (through early backtracking and detection of solutions), relative to looser bounds. The bound computed by the new algorithm, denoted the resource-level envelope, constitutes the measure of maximum and minimum consumption of resources at any time for all fixed-time schedules in the flexible plan. At each time, the envelope guarantees that there are two fixed-time instantiations one that produces the minimum level and one that produces the maximum level. Therefore, the resource-level envelope is the tightest possible resource-level bound for a flexible plan because any tighter bound would exclude the contribution of at least one fixed-time schedule. If the resource- level envelope can be computed efficiently, one could substitute looser bounds that are currently used in the inner cores of constraint-posting scheduling algorithms, with the potential for great improvements in performance. What is needed to reduce the cost of computation is an algorithm, the measure of complexity of which is no greater than a low-degree polynomial in N (where N is the number of activities). The new algorithm satisfies this need. In this algorithm, the computation of resource-level envelopes is based on a novel combination of (1) the theory of shortest paths in the temporal-constraint network for the flexible plan and (2) the theory of maximum flows for a flow network derived from the temporal and resource constraints. The measure of asymptotic complexity of the algorithm is O(N O(maxflow(N)), where O(x) denotes an amount of computing time or a number of arithmetic operations proportional to a number of the order of x and O(maxflow(N)) is the measure of complexity (and thus of cost) of a maximumflow algorithm applied to an auxiliary flow network of 2N nodes. The algorithm is believed to be efficient in practice; experimental analysis shows the practical cost of maxflow to be as low as O(N1.5). The algorithm could be enhanced following at least two approaches. In the first approach, incremental subalgorithms for the computation of the envelope could be developed. By use of temporal scanning of the events in the temporal network, it may be possible to significantly reduce the size of the networks on which it is necessary to run the maximum-flow subalgorithm, thereby significantly reducing the time required for envelope calculation. In the second approach, the practical effectiveness of resource envelopes in the inner loops of search algorithms could be tested for multi-capacity resource scheduling. This testing would include inner-loop backtracking and termination tests and variable and value-ordering heuristics that exploit the properties of resource envelopes more directly.
Efficient Training of Supervised Spiking Neural Network via Accurate Synaptic-Efficiency Adjustment Method.

PubMed

Xie, Xiurui; Qu, Hong; Yi, Zhang; Kurths, Jurgen

2017-06-01

The spiking neural network (SNN) is the third generation of neural networks and performs remarkably well in cognitive tasks, such as pattern recognition. The temporal neural encode mechanism found in biological hippocampus enables SNN to possess more powerful computation capability than networks with other encoding schemes. However, this temporal encoding approach requires neurons to process information serially on time, which reduces learning efficiency significantly. To keep the powerful computation capability of the temporal encoding mechanism and to overcome its low efficiency in the training of SNNs, a new training algorithm, the accurate synaptic-efficiency adjustment method is proposed in this paper. Inspired by the selective attention mechanism of the primate visual system, our algorithm selects only the target spike time as attention areas, and ignores voltage states of the untarget ones, resulting in a significant reduction of training time. Besides, our algorithm employs a cost function based on the voltage difference between the potential of the output neuron and the firing threshold of the SNN, instead of the traditional precise firing time distance. A normalized spike-timing-dependent-plasticity learning window is applied to assigning this error to different synapses for instructing their training. Comprehensive simulations are conducted to investigate the learning properties of our algorithm, with input neurons emitting both single spike and multiple spikes. Simulation results indicate that our algorithm possesses higher learning performance than the existing other methods and achieves the state-of-the-art efficiency in the training of SNN.
Efficient enumeration of monocyclic chemical graphs with given path frequencies

PubMed Central

2014-01-01

Background The enumeration of chemical graphs (molecular graphs) satisfying given constraints is one of the fundamental problems in chemoinformatics and bioinformatics because it leads to a variety of useful applications including structure determination and development of novel chemical compounds. Results We consider the problem of enumerating chemical graphs with monocyclic structure (a graph structure that contains exactly one cycle) from a given set of feature vectors, where a feature vector represents the frequency of the prescribed paths in a chemical compound to be constructed and the set is specified by a pair of upper and lower feature vectors. To enumerate all tree-like (acyclic) chemical graphs from a given set of feature vectors, Shimizu et al. and Suzuki et al. proposed efficient branch-and-bound algorithms based on a fast tree enumeration algorithm. In this study, we devise a novel method for extending these algorithms to enumeration of chemical graphs with monocyclic structure by designing a fast algorithm for testing uniqueness. The results of computational experiments reveal that the computational efficiency of the new algorithm is as good as those for enumeration of tree-like chemical compounds. Conclusions We succeed in expanding the class of chemical graphs that are able to be enumerated efficiently. PMID:24955135
An Improved Clustering Algorithm of Tunnel Monitoring Data for Cloud Computing

PubMed Central

Zhong, Luo; Tang, KunHao; Li, Lin; Yang, Guang; Ye, JingJing

2014-01-01

With the rapid development of urban construction, the number of urban tunnels is increasing and the data they produce become more and more complex. It results in the fact that the traditional clustering algorithm cannot handle the mass data of the tunnel. To solve this problem, an improved parallel clustering algorithm based on k-means has been proposed. It is a clustering algorithm using the MapReduce within cloud computing that deals with data. It not only has the advantage of being used to deal with mass data but also is more efficient. Moreover, it is able to compute the average dissimilarity degree of each cluster in order to clean the abnormal data. PMID:24982971
Overcoming Geometry-Induced Stiffness with IMplicit-Explicit (IMEX) Runge-Kutta Algorithms on Unstructured Grids with Applications to CEM, CFD, and CAA

NASA Technical Reports Server (NTRS)

Kanevsky, Alex

2004-01-01

My goal is to develop and implement efficient, accurate, and robust Implicit-Explicit Runge-Kutta (IMEX RK) methods [9] for overcoming geometry-induced stiffness with applications to computational electromagnetics (CEM), computational fluid dynamics (CFD) and computational aeroacoustics (CAA). IMEX algorithms solve the non-stiff portions of the domain using explicit methods, and isolate and solve the more expensive stiff portions using implicit methods. Current algorithms in CEM can only simulate purely harmonic (up to lOGHz plane wave) EM scattering by fighter aircraft, which are assumed to be pure metallic shells, and cannot handle the inclusion of coatings, penetration into and radiation out of the aircraft. Efficient MEX RK methods could potentially increase current CEM capabilities by 1-2 orders of magnitude, allowing scientists and engineers to attack more challenging and realistic problems.
Cut set-based risk and reliability analysis for arbitrarily interconnected networks

DOEpatents

Wyss, Gregory D.

2000-01-01

Method for computing all-terminal reliability for arbitrarily interconnected networks such as the United States public switched telephone network. The method includes an efficient search algorithm to generate minimal cut sets for nonhierarchical networks directly from the network connectivity diagram. Efficiency of the search algorithm stems in part from its basis on only link failures. The method also includes a novel quantification scheme that likewise reduces computational effort associated with assessing network reliability based on traditional risk importance measures. Vast reductions in computational effort are realized since combinatorial expansion and subsequent Boolean reduction steps are eliminated through analysis of network segmentations using a technique of assuming node failures to occur on only one side of a break in the network, and repeating the technique for all minimal cut sets generated with the search algorithm. The method functions equally well for planar and non-planar networks.
A High-Performance Genetic Algorithm: Using Traveling Salesman Problem as a Case

PubMed Central

Tsai, Chun-Wei; Tseng, Shih-Pang; Yang, Chu-Sing

2014-01-01

This paper presents a simple but efficient algorithm for reducing the computation time of genetic algorithm (GA) and its variants. The proposed algorithm is motivated by the observation that genes common to all the individuals of a GA have a high probability of surviving the evolution and ending up being part of the final solution; as such, they can be saved away to eliminate the redundant computations at the later generations of a GA. To evaluate the performance of the proposed algorithm, we use it not only to solve the traveling salesman problem but also to provide an extensive analysis on the impact it may have on the quality of the end result. Our experimental results indicate that the proposed algorithm can significantly reduce the computation time of GA and GA-based algorithms while limiting the degradation of the quality of the end result to a very small percentage compared to traditional GA. PMID:24892038
A high-performance genetic algorithm: using traveling salesman problem as a case.

PubMed

Tsai, Chun-Wei; Tseng, Shih-Pang; Chiang, Ming-Chao; Yang, Chu-Sing; Hong, Tzung-Pei

2014-01-01

This paper presents a simple but efficient algorithm for reducing the computation time of genetic algorithm (GA) and its variants. The proposed algorithm is motivated by the observation that genes common to all the individuals of a GA have a high probability of surviving the evolution and ending up being part of the final solution; as such, they can be saved away to eliminate the redundant computations at the later generations of a GA. To evaluate the performance of the proposed algorithm, we use it not only to solve the traveling salesman problem but also to provide an extensive analysis on the impact it may have on the quality of the end result. Our experimental results indicate that the proposed algorithm can significantly reduce the computation time of GA and GA-based algorithms while limiting the degradation of the quality of the end result to a very small percentage compared to traditional GA.

Cross-indexing of binary SIFT codes for large-scale image search.

PubMed

Liu, Zhen; Li, Houqiang; Zhang, Liyan; Zhou, Wengang; Tian, Qi

2014-05-01

In recent years, there has been growing interest in mapping visual features into compact binary codes for applications on large-scale image collections. Encoding high-dimensional data as compact binary codes reduces the memory cost for storage. Besides, it benefits the computational efficiency since the computation of similarity can be efficiently measured by Hamming distance. In this paper, we propose a novel flexible scale invariant feature transform (SIFT) binarization (FSB) algorithm for large-scale image search. The FSB algorithm explores the magnitude patterns of SIFT descriptor. It is unsupervised and the generated binary codes are demonstrated to be dispreserving. Besides, we propose a new searching strategy to find target features based on the cross-indexing in the binary SIFT space and original SIFT space. We evaluate our approach on two publicly released data sets. The experiments on large-scale partial duplicate image retrieval system demonstrate the effectiveness and efficiency of the proposed algorithm.
Gradient-based Optimization for Poroelastic and Viscoelastic MR Elastography

PubMed Central

Tan, Likun; McGarry, Matthew D.J.; Van Houten, Elijah E.W.; Ji, Ming; Solamen, Ligin; Weaver, John B.

2017-01-01

We describe an efficient gradient computation for solving inverse problems arising in magnetic resonance elastography (MRE). The algorithm can be considered as a generalized ‘adjoint method’ based on a Lagrangian formulation. One requirement for the classic adjoint method is assurance of the self-adjoint property of the stiffness matrix in the elasticity problem. In this paper, we show this property is no longer a necessary condition in our algorithm, but the computational performance can be as efficient as the classic method, which involves only two forward solutions and is independent of the number of parameters to be estimated. The algorithm is developed and implemented in material property reconstructions using poroelastic and viscoelastic modeling. Various gradient- and Hessian-based optimization techniques have been tested on simulation, phantom and in vivo brain data. The numerical results show the feasibility and the efficiency of the proposed scheme for gradient calculation. PMID:27608454
Efficient Exact Inference With Loss Augmented Objective in Structured Learning.

PubMed

Bauer, Alexander; Nakajima, Shinichi; Muller, Klaus-Robert

2016-08-19

Structural support vector machine (SVM) is an elegant approach for building complex and accurate models with structured outputs. However, its applicability relies on the availability of efficient inference algorithms--the state-of-the-art training algorithms repeatedly perform inference to compute a subgradient or to find the most violating configuration. In this paper, we propose an exact inference algorithm for maximizing nondecomposable objectives due to special type of a high-order potential having a decomposable internal structure. As an important application, our method covers the loss augmented inference, which enables the slack and margin scaling formulations of structural SVM with a variety of dissimilarity measures, e.g., Hamming loss, precision and recall, Fβ-loss, intersection over union, and many other functions that can be efficiently computed from the contingency table. We demonstrate the advantages of our approach in natural language parsing and sequence segmentation applications.
Parallel discontinuous Galerkin FEM for computing hyperbolic conservation law on unstructured grids

NASA Astrophysics Data System (ADS)

Ma, Xinrong; Duan, Zhijian

2018-04-01

High-order resolution Discontinuous Galerkin finite element methods (DGFEM) has been known as a good method for solving Euler equations and Navier-Stokes equations on unstructured grid, but it costs too much computational resources. An efficient parallel algorithm was presented for solving the compressible Euler equations. Moreover, the multigrid strategy based on three-stage three-order TVD Runge-Kutta scheme was used in order to improve the computational efficiency of DGFEM and accelerate the convergence of the solution of unsteady compressible Euler equations. In order to make each processor maintain load balancing, the domain decomposition method was employed. Numerical experiment performed for the inviscid transonic flow fluid problems around NACA0012 airfoil and M6 wing. The results indicated that our parallel algorithm can improve acceleration and efficiency significantly, which is suitable for calculating the complex flow fluid.
Marc Henry de Frahan | NREL

Science.gov Websites

Computing Project, Marc develops high-fidelity turbulence models to enhance simulation accuracy and efficient numerical algorithms for future high performance computing hardware architectures. Research Interests High performance computing High order numerical methods for computational fluid dynamics Fluid
Total variation-based neutron computed tomography

NASA Astrophysics Data System (ADS)

Barnard, Richard C.; Bilheux, Hassina; Toops, Todd; Nafziger, Eric; Finney, Charles; Splitter, Derek; Archibald, Rick

2018-05-01

We perform the neutron computed tomography reconstruction problem via an inverse problem formulation with a total variation penalty. In the case of highly under-resolved angular measurements, the total variation penalty suppresses high-frequency artifacts which appear in filtered back projections. In order to efficiently compute solutions for this problem, we implement a variation of the split Bregman algorithm; due to the error-forgetting nature of the algorithm, the computational cost of updating can be significantly reduced via very inexact approximate linear solvers. We present the effectiveness of the algorithm in the significantly low-angular sampling case using synthetic test problems as well as data obtained from a high flux neutron source. The algorithm removes artifacts and can even roughly capture small features when an extremely low number of angles are used.
Biological network motif detection and evaluation

PubMed Central

2011-01-01

Background Molecular level of biological data can be constructed into system level of data as biological networks. Network motifs are defined as over-represented small connected subgraphs in networks and they have been used for many biological applications. Since network motif discovery involves computationally challenging processes, previous algorithms have focused on computational efficiency. However, we believe that the biological quality of network motifs is also very important. Results We define biological network motifs as biologically significant subgraphs and traditional network motifs are differentiated as structural network motifs in this paper. We develop five algorithms, namely, EDGEGO-BNM, EDGEBETWEENNESS-BNM, NMF-BNM, NMFGO-BNM and VOLTAGE-BNM, for efficient detection of biological network motifs, and introduce several evaluation measures including motifs included in complex, motifs included in functional module and GO term clustering score in this paper. Experimental results show that EDGEGO-BNM and EDGEBETWEENNESS-BNM perform better than existing algorithms and all of our algorithms are applicable to find structural network motifs as well. Conclusion We provide new approaches to finding network motifs in biological networks. Our algorithms efficiently detect biological network motifs and further improve existing algorithms to find high quality structural network motifs, which would be impossible using existing algorithms. The performances of the algorithms are compared based on our new evaluation measures in biological contexts. We believe that our work gives some guidelines of network motifs research for the biological networks. PMID:22784624
Efficient iterative image reconstruction algorithm for dedicated breast CT

NASA Astrophysics Data System (ADS)

Antropova, Natalia; Sanchez, Adrian; Reiser, Ingrid S.; Sidky, Emil Y.; Boone, John; Pan, Xiaochuan

2016-03-01

Dedicated breast computed tomography (bCT) is currently being studied as a potential screening method for breast cancer. The X-ray exposure is set low to achieve an average glandular dose comparable to that of mammography, yielding projection data that contains high levels of noise. Iterative image reconstruction (IIR) algorithms may be well-suited for the system since they potentially reduce the effects of noise in the reconstructed images. However, IIR outcomes can be difficult to control since the algorithm parameters do not directly correspond to the image properties. Also, IIR algorithms are computationally demanding and have optimal parameter settings that depend on the size and shape of the breast and positioning of the patient. In this work, we design an efficient IIR algorithm with meaningful parameter specifications and that can be used on a large, diverse sample of bCT cases. The flexibility and efficiency of this method comes from having the final image produced by a linear combination of two separately reconstructed images - one containing gray level information and the other with enhanced high frequency components. Both of the images result from few iterations of separate IIR algorithms. The proposed algorithm depends on two parameters both of which have a well-defined impact on image quality. The algorithm is applied to numerous bCT cases from a dedicated bCT prototype system developed at University of California, Davis.
Cloud computing-based TagSNP selection algorithm for human genome data.

PubMed

Hung, Che-Lun; Chen, Wen-Pei; Hua, Guan-Jie; Zheng, Huiru; Tsai, Suh-Jen Jane; Lin, Yaw-Ling

2015-01-05

Single nucleotide polymorphisms (SNPs) play a fundamental role in human genetic variation and are used in medical diagnostics, phylogeny construction, and drug design. They provide the highest-resolution genetic fingerprint for identifying disease associations and human features. Haplotypes are regions of linked genetic variants that are closely spaced on the genome and tend to be inherited together. Genetics research has revealed SNPs within certain haplotype blocks that introduce few distinct common haplotypes into most of the population. Haplotype block structures are used in association-based methods to map disease genes. In this paper, we propose an efficient algorithm for identifying haplotype blocks in the genome. In chromosomal haplotype data retrieved from the HapMap project website, the proposed algorithm identified longer haplotype blocks than an existing algorithm. To enhance its performance, we extended the proposed algorithm into a parallel algorithm that copies data in parallel via the Hadoop MapReduce framework. The proposed MapReduce-paralleled combinatorial algorithm performed well on real-world data obtained from the HapMap dataset; the improvement in computational efficiency was proportional to the number of processors used.
Cloud Computing-Based TagSNP Selection Algorithm for Human Genome Data

PubMed Central

Hung, Che-Lun; Chen, Wen-Pei; Hua, Guan-Jie; Zheng, Huiru; Tsai, Suh-Jen Jane; Lin, Yaw-Ling

2015-01-01

Single nucleotide polymorphisms (SNPs) play a fundamental role in human genetic variation and are used in medical diagnostics, phylogeny construction, and drug design. They provide the highest-resolution genetic fingerprint for identifying disease associations and human features. Haplotypes are regions of linked genetic variants that are closely spaced on the genome and tend to be inherited together. Genetics research has revealed SNPs within certain haplotype blocks that introduce few distinct common haplotypes into most of the population. Haplotype block structures are used in association-based methods to map disease genes. In this paper, we propose an efficient algorithm for identifying haplotype blocks in the genome. In chromosomal haplotype data retrieved from the HapMap project website, the proposed algorithm identified longer haplotype blocks than an existing algorithm. To enhance its performance, we extended the proposed algorithm into a parallel algorithm that copies data in parallel via the Hadoop MapReduce framework. The proposed MapReduce-paralleled combinatorial algorithm performed well on real-world data obtained from the HapMap dataset; the improvement in computational efficiency was proportional to the number of processors used. PMID:25569088
Polynomial-time quantum algorithm for the simulation of chemical dynamics

PubMed Central

Kassal, Ivan; Jordan, Stephen P.; Love, Peter J.; Mohseni, Masoud; Aspuru-Guzik, Alán

2008-01-01

The computational cost of exact methods for quantum simulation using classical computers grows exponentially with system size. As a consequence, these techniques can be applied only to small systems. By contrast, we demonstrate that quantum computers could exactly simulate chemical reactions in polynomial time. Our algorithm uses the split-operator approach and explicitly simulates all electron-nuclear and interelectronic interactions in quadratic time. Surprisingly, this treatment is not only more accurate than the Born–Oppenheimer approximation but faster and more efficient as well, for all reactions with more than about four atoms. This is the case even though the entire electronic wave function is propagated on a grid with appropriately short time steps. Although the preparation and measurement of arbitrary states on a quantum computer is inefficient, here we demonstrate how to prepare states of chemical interest efficiently. We also show how to efficiently obtain chemically relevant observables, such as state-to-state transition probabilities and thermal reaction rates. Quantum computers using these techniques could outperform current classical computers with 100 qubits. PMID:19033207
The new and computationally efficient MIL-SOM algorithm: potential benefits for visualization and analysis of a large-scale high-dimensional clinically acquired geographic data.

PubMed

Oyana, Tonny J; Achenie, Luke E K; Heo, Joon

2012-01-01

The objective of this paper is to introduce an efficient algorithm, namely, the mathematically improved learning-self organizing map (MIL-SOM) algorithm, which speeds up the self-organizing map (SOM) training process. In the proposed MIL-SOM algorithm, the weights of Kohonen's SOM are based on the proportional-integral-derivative (PID) controller. Thus, in a typical SOM learning setting, this improvement translates to faster convergence. The basic idea is primarily motivated by the urgent need to develop algorithms with the competence to converge faster and more efficiently than conventional techniques. The MIL-SOM algorithm is tested on four training geographic datasets representing biomedical and disease informatics application domains. Experimental results show that the MIL-SOM algorithm provides a competitive, better updating procedure and performance, good robustness, and it runs faster than Kohonen's SOM.
The New and Computationally Efficient MIL-SOM Algorithm: Potential Benefits for Visualization and Analysis of a Large-Scale High-Dimensional Clinically Acquired Geographic Data

PubMed Central

Oyana, Tonny J.; Achenie, Luke E. K.; Heo, Joon

2012-01-01

The objective of this paper is to introduce an efficient algorithm, namely, the mathematically improved learning-self organizing map (MIL-SOM) algorithm, which speeds up the self-organizing map (SOM) training process. In the proposed MIL-SOM algorithm, the weights of Kohonen's SOM are based on the proportional-integral-derivative (PID) controller. Thus, in a typical SOM learning setting, this improvement translates to faster convergence. The basic idea is primarily motivated by the urgent need to develop algorithms with the competence to converge faster and more efficiently than conventional techniques. The MIL-SOM algorithm is tested on four training geographic datasets representing biomedical and disease informatics application domains. Experimental results show that the MIL-SOM algorithm provides a competitive, better updating procedure and performance, good robustness, and it runs faster than Kohonen's SOM. PMID:22481977
QRev—Software for computation and quality assurance of acoustic doppler current profiler moving-boat streamflow measurements—Technical manual for version 2.8

USGS Publications Warehouse

Mueller, David S.

2016-06-21

The software program, QRev applies common and consistent computational algorithms combined with automated filtering and quality assessment of the data to improve the quality and efficiency of streamflow measurements and helps ensure that U.S. Geological Survey streamflow measurements are consistent, accurate, and independent of the manufacturer of the instrument used to make the measurement. Software from different manufacturers uses different algorithms for various aspects of the data processing and discharge computation. The algorithms used by QRev to filter data, interpolate data, and compute discharge are documented and compared to the algorithms used in the manufacturers’ software. QRev applies consistent algorithms and creates a data structure that is independent of the data source. QRev saves an extensible markup language (XML) file that can be imported into databases or electronic field notes software. This report is the technical manual for version 2.8 of QRev.
Ndarts

NASA Technical Reports Server (NTRS)

Jain, Abhinandan

2011-01-01

Ndarts software provides algorithms for computing quantities associated with the dynamics of articulated, rigid-link, multibody systems. It is designed as a general-purpose dynamics library that can be used for the modeling of robotic platforms, space vehicles, molecular dynamics, and other such applications. The architecture and algorithms in Ndarts are based on the Spatial Operator Algebra (SOA) theory for computational multibody and robot dynamics developed at JPL. It uses minimal, internal coordinate models. The algorithms are low-order, recursive scatter/ gather algorithms. In comparison with the earlier Darts++ software, this version has a more general and cleaner design needed to support a larger class of computational dynamics needs. It includes a frames infrastructure, allows algorithms to operate on subgraphs of the system, and implements lazy and deferred computation for better efficiency. Dynamics modeling modules such as Ndarts are core building blocks of control and simulation software for space, robotic, mechanism, bio-molecular, and material systems modeling.
Demonstration of a small programmable quantum computer with atomic qubits.

PubMed

Debnath, S; Linke, N M; Figgatt, C; Landsman, K A; Wright, K; Monroe, C

2016-08-04

Quantum computers can solve certain problems more efficiently than any possible conventional computer. Small quantum algorithms have been demonstrated on multiple quantum computing platforms, many specifically tailored in hardware to implement a particular algorithm or execute a limited number of computational paths. Here we demonstrate a five-qubit trapped-ion quantum computer that can be programmed in software to implement arbitrary quantum algorithms by executing any sequence of universal quantum logic gates. We compile algorithms into a fully connected set of gate operations that are native to the hardware and have a mean fidelity of 98 per cent. Reconfiguring these gate sequences provides the flexibility to implement a variety of algorithms without altering the hardware. As examples, we implement the Deutsch-Jozsa and Bernstein-Vazirani algorithms with average success rates of 95 and 90 per cent, respectively. We also perform a coherent quantum Fourier transform on five trapped-ion qubits for phase estimation and period finding with average fidelities of 62 and 84 per cent, respectively. This small quantum computer can be scaled to larger numbers of qubits within a single register, and can be further expanded by connecting several such modules through ion shuttling or photonic quantum channels.
Demonstration of a small programmable quantum computer with atomic qubits

NASA Astrophysics Data System (ADS)

Debnath, S.; Linke, N. M.; Figgatt, C.; Landsman, K. A.; Wright, K.; Monroe, C.

2016-08-01

Quantum computers can solve certain problems more efficiently than any possible conventional computer. Small quantum algorithms have been demonstrated on multiple quantum computing platforms, many specifically tailored in hardware to implement a particular algorithm or execute a limited number of computational paths. Here we demonstrate a five-qubit trapped-ion quantum computer that can be programmed in software to implement arbitrary quantum algorithms by executing any sequence of universal quantum logic gates. We compile algorithms into a fully connected set of gate operations that are native to the hardware and have a mean fidelity of 98 per cent. Reconfiguring these gate sequences provides the flexibility to implement a variety of algorithms without altering the hardware. As examples, we implement the Deutsch-Jozsa and Bernstein-Vazirani algorithms with average success rates of 95 and 90 per cent, respectively. We also perform a coherent quantum Fourier transform on five trapped-ion qubits for phase estimation and period finding with average fidelities of 62 and 84 per cent, respectively. This small quantum computer can be scaled to larger numbers of qubits within a single register, and can be further expanded by connecting several such modules through ion shuttling or photonic quantum channels.
76 FR 45786 - Advanced Scientific Computing Advisory Committee; Meeting

Federal Register 2010, 2011, 2012, 2013, 2014

2011-08-01

... updates. EU Data Initiative. HPC & EERE Wind Program. Early Career Research on Energy Efficient Interconnect for Exascale Computing. Separating Algorithm and Implentation. Update on ASCR exascale planning...
Efficient image compression algorithm for computer-animated images

NASA Astrophysics Data System (ADS)

Yfantis, Evangelos A.; Au, Matthew Y.; Miel, G.

1992-10-01

An image compression algorithm is described. The algorithm is an extension of the run-length image compression algorithm and its implementation is relatively easy. This algorithm was implemented and compared with other existing popular compression algorithms and with the Lempel-Ziv (LZ) coding. The Lempel-Ziv algorithm is available as a utility in the UNIX operating system and is also referred to as the UNIX uncompress. Sometimes our algorithm is best in terms of saving memory space, and sometimes one of the competing algorithms is best. The algorithm is lossless, and the intent is for the algorithm to be used in computer graphics animated images. Comparisons made with the LZ algorithm indicate that the decompression time using our algorithm is faster than that using the LZ algorithm. Once the data are in memory, a relatively simple and fast transformation is applied to uncompress the file.
Duality quantum algorithm efficiently simulates open quantum systems

PubMed Central

Wei, Shi-Jie; Ruan, Dong; Long, Gui-Lu

2016-01-01

Because of inevitable coupling with the environment, nearly all practical quantum systems are open system, where the evolution is not necessarily unitary. In this paper, we propose a duality quantum algorithm for simulating Hamiltonian evolution of an open quantum system. In contrast to unitary evolution in a usual quantum computer, the evolution operator in a duality quantum computer is a linear combination of unitary operators. In this duality quantum algorithm, the time evolution of the open quantum system is realized by using Kraus operators which is naturally implemented in duality quantum computer. This duality quantum algorithm has two distinct advantages compared to existing quantum simulation algorithms with unitary evolution operations. Firstly, the query complexity of the algorithm is O(d3) in contrast to O(d4) in existing unitary simulation algorithm, where d is the dimension of the open quantum system. Secondly, By using a truncated Taylor series of the evolution operators, this duality quantum algorithm provides an exponential improvement in precision compared with previous unitary simulation algorithm. PMID:27464855

Spacecraft Angular State Estimation After Sensor Failure

NASA Technical Reports Server (NTRS)

Bauer, Frank (Technical Monitor); BarItzhack, Itzhack Y.; Harman, Richard R.

2002-01-01

This work describes two algorithms for computing the angular rate and attitude in case of a gyro failure in a spacecraft (SC) with a special mission profile. The source of the problem is presented, two algorithms are suggested, an observability study is carried out, and the efficiency of the algorithms is demonstrated.
Fast GPU-based computation of spatial multigrid multiframe LMEM for PET.

PubMed

Nassiri, Moulay Ali; Carrier, Jean-François; Després, Philippe

2015-09-01

Significant efforts were invested during the last decade to accelerate PET list-mode reconstructions, notably with GPU devices. However, the computation time per event is still relatively long, and the list-mode efficiency on the GPU is well below the histogram-mode efficiency. Since list-mode data are not arranged in any regular pattern, costly accesses to the GPU global memory can hardly be optimized and geometrical symmetries cannot be used. To overcome obstacles that limit the acceleration of reconstruction from list-mode on the GPU, a multigrid and multiframe approach of an expectation-maximization algorithm was developed. The reconstruction process is started during data acquisition, and calculations are executed concurrently on the GPU and the CPU, while the system matrix is computed on-the-fly. A new convergence criterion also was introduced, which is computationally more efficient on the GPU. The implementation was tested on a Tesla C2050 GPU device for a Gemini GXL PET system geometry. The results show that the proposed algorithm (multigrid and multiframe list-mode expectation-maximization, MGMF-LMEM) converges to the same solution as the LMEM algorithm more than three times faster. The execution time of the MGMF-LMEM algorithm was 1.1 s per million of events on the Tesla C2050 hardware used, for a reconstructed space of 188 x 188 x 57 voxels of 2 x 2 x 3.15 mm3. For 17- and 22-mm simulated hot lesions, the MGMF-LMEM algorithm led on the first iteration to contrast recovery coefficients (CRC) of more than 75 % of the maximum CRC while achieving a minimum in the relative mean square error. Therefore, the MGMF-LMEM algorithm can be used as a one-pass method to perform real-time reconstructions for low-count acquisitions, as in list-mode gated studies. The computation time for one iteration and 60 millions of events was approximately 66 s.
Combinatorial-topological framework for the analysis of global dynamics.

PubMed

Bush, Justin; Gameiro, Marcio; Harker, Shaun; Kokubu, Hiroshi; Mischaikow, Konstantin; Obayashi, Ippei; Pilarczyk, Paweł

2012-12-01

We discuss an algorithmic framework based on efficient graph algorithms and algebraic-topological computational tools. The framework is aimed at automatic computation of a database of global dynamics of a given m-parameter semidynamical system with discrete time on a bounded subset of the n-dimensional phase space. We introduce the mathematical background, which is based upon Conley's topological approach to dynamics, describe the algorithms for the analysis of the dynamics using rectangular grids both in phase space and parameter space, and show two sample applications.
Combinatorial-topological framework for the analysis of global dynamics

NASA Astrophysics Data System (ADS)

Bush, Justin; Gameiro, Marcio; Harker, Shaun; Kokubu, Hiroshi; Mischaikow, Konstantin; Obayashi, Ippei; Pilarczyk, Paweł

2012-12-01

We discuss an algorithmic framework based on efficient graph algorithms and algebraic-topological computational tools. The framework is aimed at automatic computation of a database of global dynamics of a given m-parameter semidynamical system with discrete time on a bounded subset of the n-dimensional phase space. We introduce the mathematical background, which is based upon Conley's topological approach to dynamics, describe the algorithms for the analysis of the dynamics using rectangular grids both in phase space and parameter space, and show two sample applications.
Approximate, computationally efficient online learning in Bayesian spiking neurons.

PubMed

Kuhlmann, Levin; Hauser-Raspe, Michael; Manton, Jonathan H; Grayden, David B; Tapson, Jonathan; van Schaik, André

2014-03-01

Bayesian spiking neurons (BSNs) provide a probabilistic interpretation of how neurons perform inference and learning. Online learning in BSNs typically involves parameter estimation based on maximum-likelihood expectation-maximization (ML-EM) which is computationally slow and limits the potential of studying networks of BSNs. An online learning algorithm, fast learning (FL), is presented that is more computationally efficient than the benchmark ML-EM for a fixed number of time steps as the number of inputs to a BSN increases (e.g., 16.5 times faster run times for 20 inputs). Although ML-EM appears to converge 2.0 to 3.6 times faster than FL, the computational cost of ML-EM means that ML-EM takes longer to simulate to convergence than FL. FL also provides reasonable convergence performance that is robust to initialization of parameter estimates that are far from the true parameter values. However, parameter estimation depends on the range of true parameter values. Nevertheless, for a physiologically meaningful range of parameter values, FL gives very good average estimation accuracy, despite its approximate nature. The FL algorithm therefore provides an efficient tool, complementary to ML-EM, for exploring BSN networks in more detail in order to better understand their biological relevance. Moreover, the simplicity of the FL algorithm means it can be easily implemented in neuromorphic VLSI such that one can take advantage of the energy-efficient spike coding of BSNs.
Multiresolution molecular mechanics: Implementation and efficiency

NASA Astrophysics Data System (ADS)

Biyikli, Emre; To, Albert C.

2017-01-01

Atomistic/continuum coupling methods combine accurate atomistic methods and efficient continuum methods to simulate the behavior of highly ordered crystalline systems. Coupled methods utilize the advantages of both approaches to simulate systems at a lower computational cost, while retaining the accuracy associated with atomistic methods. Many concurrent atomistic/continuum coupling methods have been proposed in the past; however, their true computational efficiency has not been demonstrated. The present work presents an efficient implementation of a concurrent coupling method called the Multiresolution Molecular Mechanics (MMM) for serial, parallel, and adaptive analysis. First, we present the features of the software implemented along with the associated technologies. The scalability of the software implementation is demonstrated, and the competing effects of multiscale modeling and parallelization are discussed. Then, the algorithms contributing to the efficiency of the software are presented. These include algorithms for eliminating latent ghost atoms from calculations and measurement-based dynamic balancing of parallel workload. The efficiency improvements made by these algorithms are demonstrated by benchmark tests. The efficiency of the software is found to be on par with LAMMPS, a state-of-the-art Molecular Dynamics (MD) simulation code, when performing full atomistic simulations. Speed-up of the MMM method is shown to be directly proportional to the reduction of the number of the atoms visited in force computation. Finally, an adaptive MMM analysis on a nanoindentation problem, containing over a million atoms, is performed, yielding an improvement of 6.3-8.5 times in efficiency, over the full atomistic MD method. For the first time, the efficiency of a concurrent atomistic/continuum coupling method is comprehensively investigated and demonstrated.
Efficient characterisation of large deviations using population dynamics

NASA Astrophysics Data System (ADS)

Brewer, Tobias; Clark, Stephen R.; Bradford, Russell; Jack, Robert L.

2018-05-01

We consider population dynamics as implemented by the cloning algorithm for analysis of large deviations of time-averaged quantities. We use the simple symmetric exclusion process with periodic boundary conditions as a prototypical example and investigate the convergence of the results with respect to the algorithmic parameters, focussing on the dynamical phase transition between homogeneous and inhomogeneous states, where convergence is relatively difficult to achieve. We discuss how the performance of the algorithm can be optimised, and how it can be efficiently exploited on parallel computing platforms.
Efficient Classical Algorithm for Boson Sampling with Partially Distinguishable Photons

NASA Astrophysics Data System (ADS)

Renema, J. J.; Menssen, A.; Clements, W. R.; Triginer, G.; Kolthammer, W. S.; Walmsley, I. A.

2018-06-01

We demonstrate how boson sampling with photons of partial distinguishability can be expressed in terms of interference of fewer photons. We use this observation to propose a classical algorithm to simulate the output of a boson sampler fed with photons of partial distinguishability. We find conditions for which this algorithm is efficient, which gives a lower limit on the required indistinguishability to demonstrate a quantum advantage. Under these conditions, adding more photons only polynomially increases the computational cost to simulate a boson sampling experiment.
Trellises and Trellis-Based Decoding Algorithms for Linear Block Codes. Part 3; A Recursive Maximum Likelihood Decoding

NASA Technical Reports Server (NTRS)

Lin, Shu; Fossorier, Marc

1998-01-01

The Viterbi algorithm is indeed a very simple and efficient method of implementing the maximum likelihood decoding. However, if we take advantage of the structural properties in a trellis section, other efficient trellis-based decoding algorithms can be devised. Recently, an efficient trellis-based recursive maximum likelihood decoding (RMLD) algorithm for linear block codes has been proposed. This algorithm is more efficient than the conventional Viterbi algorithm in both computation and hardware requirements. Most importantly, the implementation of this algorithm does not require the construction of the entire code trellis, only some special one-section trellises of relatively small state and branch complexities are needed for constructing path (or branch) metric tables recursively. At the end, there is only one table which contains only the most likely code-word and its metric for a given received sequence r = (r(sub 1), r(sub 2),...,r(sub n)). This algorithm basically uses the divide and conquer strategy. Furthermore, it allows parallel/pipeline processing of received sequences to speed up decoding.
An efficient parallel algorithm: Poststack and prestack Kirchhoff 3D depth migration using flexi-depth iterations

NASA Astrophysics Data System (ADS)

Rastogi, Richa; Srivastava, Abhishek; Khonde, Kiran; Sirasala, Kirannmayi M.; Londhe, Ashutosh; Chavhan, Hitesh

2015-07-01

This paper presents an efficient parallel 3D Kirchhoff depth migration algorithm suitable for current class of multicore architecture. The fundamental Kirchhoff depth migration algorithm exhibits inherent parallelism however, when it comes to 3D data migration, as the data size increases the resource requirement of the algorithm also increases. This challenges its practical implementation even on current generation high performance computing systems. Therefore a smart parallelization approach is essential to handle 3D data for migration. The most compute intensive part of Kirchhoff depth migration algorithm is the calculation of traveltime tables due to its resource requirements such as memory/storage and I/O. In the current research work, we target this area and develop a competent parallel algorithm for post and prestack 3D Kirchhoff depth migration, using hybrid MPI+OpenMP programming techniques. We introduce a concept of flexi-depth iterations while depth migrating data in parallel imaging space, using optimized traveltime table computations. This concept provides flexibility to the algorithm by migrating data in a number of depth iterations, which depends upon the available node memory and the size of data to be migrated during runtime. Furthermore, it minimizes the requirements of storage, I/O and inter-node communication, thus making it advantageous over the conventional parallelization approaches. The developed parallel algorithm is demonstrated and analysed on Yuva II, a PARAM series of supercomputers. Optimization, performance and scalability experiment results along with the migration outcome show the effectiveness of the parallel algorithm.
Rapid execution of fan beam image reconstruction algorithms using efficient computational techniques and special-purpose processors

NASA Astrophysics Data System (ADS)

Gilbert, B. K.; Robb, R. A.; Chu, A.; Kenue, S. K.; Lent, A. H.; Swartzlander, E. E., Jr.

1981-02-01

Rapid advances during the past ten years of several forms of computer-assisted tomography (CT) have resulted in the development of numerous algorithms to convert raw projection data into cross-sectional images. These reconstruction algorithms are either 'iterative,' in which a large matrix algebraic equation is solved by successive approximation techniques; or 'closed form'. Continuing evolution of the closed form algorithms has allowed the newest versions to produce excellent reconstructed images in most applications. This paper will review several computer software and special-purpose digital hardware implementations of closed form algorithms, either proposed during the past several years by a number of workers or actually implemented in commercial or research CT scanners. The discussion will also cover a number of recently investigated algorithmic modifications which reduce the amount of computation required to execute the reconstruction process, as well as several new special-purpose digital hardware implementations under development in laboratories at the Mayo Clinic.
A Decentralized Eigenvalue Computation Method for Spectrum Sensing Based on Average Consensus

NASA Astrophysics Data System (ADS)

Mohammadi, Jafar; Limmer, Steffen; Stańczak, Sławomir

2016-07-01

This paper considers eigenvalue estimation for the decentralized inference problem for spectrum sensing. We propose a decentralized eigenvalue computation algorithm based on the power method, which is referred to as generalized power method GPM; it is capable of estimating the eigenvalues of a given covariance matrix under certain conditions. Furthermore, we have developed a decentralized implementation of GPM by splitting the iterative operations into local and global computation tasks. The global tasks require data exchange to be performed among the nodes. For this task, we apply an average consensus algorithm to efficiently perform the global computations. As a special case, we consider a structured graph that is a tree with clusters of nodes at its leaves. For an accelerated distributed implementation, we propose to use computation over multiple access channel (CoMAC) as a building block of the algorithm. Numerical simulations are provided to illustrate the performance of the two algorithms.
Spectral Regularization Algorithms for Learning Large Incomplete Matrices.

PubMed

Mazumder, Rahul; Hastie, Trevor; Tibshirani, Robert

2010-03-01

We use convex relaxation techniques to provide a sequence of regularized low-rank solutions for large-scale matrix completion problems. Using the nuclear norm as a regularizer, we provide a simple and very efficient convex algorithm for minimizing the reconstruction error subject to a bound on the nuclear norm. Our algorithm Soft-Impute iteratively replaces the missing elements with those obtained from a soft-thresholded SVD. With warm starts this allows us to efficiently compute an entire regularization path of solutions on a grid of values of the regularization parameter. The computationally intensive part of our algorithm is in computing a low-rank SVD of a dense matrix. Exploiting the problem structure, we show that the task can be performed with a complexity linear in the matrix dimensions. Our semidefinite-programming algorithm is readily scalable to large matrices: for example it can obtain a rank-80 approximation of a 10(6) × 10(6) incomplete matrix with 10(5) observed entries in 2.5 hours, and can fit a rank 40 approximation to the full Netflix training set in 6.6 hours. Our methods show very good performance both in training and test error when compared to other competitive state-of-the art techniques.
Spectral Regularization Algorithms for Learning Large Incomplete Matrices

PubMed Central

Mazumder, Rahul; Hastie, Trevor; Tibshirani, Robert

2010-01-01

We use convex relaxation techniques to provide a sequence of regularized low-rank solutions for large-scale matrix completion problems. Using the nuclear norm as a regularizer, we provide a simple and very efficient convex algorithm for minimizing the reconstruction error subject to a bound on the nuclear norm. Our algorithm Soft-Impute iteratively replaces the missing elements with those obtained from a soft-thresholded SVD. With warm starts this allows us to efficiently compute an entire regularization path of solutions on a grid of values of the regularization parameter. The computationally intensive part of our algorithm is in computing a low-rank SVD of a dense matrix. Exploiting the problem structure, we show that the task can be performed with a complexity linear in the matrix dimensions. Our semidefinite-programming algorithm is readily scalable to large matrices: for example it can obtain a rank-80 approximation of a 106 × 106 incomplete matrix with 105 observed entries in 2.5 hours, and can fit a rank 40 approximation to the full Netflix training set in 6.6 hours. Our methods show very good performance both in training and test error when compared to other competitive state-of-the art techniques. PMID:21552465
Scalable domain decomposition solvers for stochastic PDEs in high performance computing

DOE PAGES

Desai, Ajit; Khalil, Mohammad; Pettit, Chris; ...

2017-09-21

Stochastic spectral finite element models of practical engineering systems may involve solutions of linear systems or linearized systems for non-linear problems with billions of unknowns. For stochastic modeling, it is therefore essential to design robust, parallel and scalable algorithms that can efficiently utilize high-performance computing to tackle such large-scale systems. Domain decomposition based iterative solvers can handle such systems. And though these algorithms exhibit excellent scalabilities, significant algorithmic and implementational challenges exist to extend them to solve extreme-scale stochastic systems using emerging computing platforms. Intrusive polynomial chaos expansion based domain decomposition algorithms are extended here to concurrently handle high resolutionmore » in both spatial and stochastic domains using an in-house implementation. Sparse iterative solvers with efficient preconditioners are employed to solve the resulting global and subdomain level local systems through multi-level iterative solvers. We also use parallel sparse matrix–vector operations to reduce the floating-point operations and memory requirements. Numerical and parallel scalabilities of these algorithms are presented for the diffusion equation having spatially varying diffusion coefficient modeled by a non-Gaussian stochastic process. Scalability of the solvers with respect to the number of random variables is also investigated.« less
Scalable domain decomposition solvers for stochastic PDEs in high performance computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Desai, Ajit; Khalil, Mohammad; Pettit, Chris

Stochastic spectral finite element models of practical engineering systems may involve solutions of linear systems or linearized systems for non-linear problems with billions of unknowns. For stochastic modeling, it is therefore essential to design robust, parallel and scalable algorithms that can efficiently utilize high-performance computing to tackle such large-scale systems. Domain decomposition based iterative solvers can handle such systems. And though these algorithms exhibit excellent scalabilities, significant algorithmic and implementational challenges exist to extend them to solve extreme-scale stochastic systems using emerging computing platforms. Intrusive polynomial chaos expansion based domain decomposition algorithms are extended here to concurrently handle high resolutionmore » in both spatial and stochastic domains using an in-house implementation. Sparse iterative solvers with efficient preconditioners are employed to solve the resulting global and subdomain level local systems through multi-level iterative solvers. We also use parallel sparse matrix–vector operations to reduce the floating-point operations and memory requirements. Numerical and parallel scalabilities of these algorithms are presented for the diffusion equation having spatially varying diffusion coefficient modeled by a non-Gaussian stochastic process. Scalability of the solvers with respect to the number of random variables is also investigated.« less
An efficient dynamic load balancing algorithm

NASA Astrophysics Data System (ADS)

Lagaros, Nikos D.

2014-01-01

In engineering problems, randomness and uncertainties are inherent. Robust design procedures, formulated in the framework of multi-objective optimization, have been proposed in order to take into account sources of randomness and uncertainty. These design procedures require orders of magnitude more computational effort than conventional analysis or optimum design processes since a very large number of finite element analyses is required to be dealt. It is therefore an imperative need to exploit the capabilities of computing resources in order to deal with this kind of problems. In particular, parallel computing can be implemented at the level of metaheuristic optimization, by exploiting the physical parallelization feature of the nondominated sorting evolution strategies method, as well as at the level of repeated structural analyses required for assessing the behavioural constraints and for calculating the objective functions. In this study an efficient dynamic load balancing algorithm for optimum exploitation of available computing resources is proposed and, without loss of generality, is applied for computing the desired Pareto front. In such problems the computation of the complete Pareto front with feasible designs only, constitutes a very challenging task. The proposed algorithm achieves linear speedup factors and almost 100% speedup factor values with reference to the sequential procedure.
Edge-Based Efficient Search over Encrypted Data Mobile Cloud Storage

PubMed Central

Liu, Fang; Cai, Zhiping; Xiao, Nong; Zhao, Ziming

2018-01-01

Smart sensor-equipped mobile devices sense, collect, and process data generated by the edge network to achieve intelligent control, but such mobile devices usually have limited storage and computing resources. Mobile cloud storage provides a promising solution owing to its rich storage resources, great accessibility, and low cost. But it also brings a risk of information leakage. The encryption of sensitive data is the basic step to resist the risk. However, deploying a high complexity encryption and decryption algorithm on mobile devices will greatly increase the burden of terminal operation and the difficulty to implement the necessary privacy protection algorithm. In this paper, we propose ENSURE (EfficieNt and SecURE), an efficient and secure encrypted search architecture over mobile cloud storage. ENSURE is inspired by edge computing. It allows mobile devices to offload the computation intensive task onto the edge server to achieve a high efficiency. Besides, to protect data security, it reduces the information acquisition of untrusted cloud by hiding the relevance between query keyword and search results from the cloud. Experiments on a real data set show that ENSURE reduces the computation time by 15% to 49% and saves the energy consumption by 38% to 69% per query. PMID:29652810
Edge-Based Efficient Search over Encrypted Data Mobile Cloud Storage.

PubMed

Guo, Yeting; Liu, Fang; Cai, Zhiping; Xiao, Nong; Zhao, Ziming

2018-04-13

Smart sensor-equipped mobile devices sense, collect, and process data generated by the edge network to achieve intelligent control, but such mobile devices usually have limited storage and computing resources. Mobile cloud storage provides a promising solution owing to its rich storage resources, great accessibility, and low cost. But it also brings a risk of information leakage. The encryption of sensitive data is the basic step to resist the risk. However, deploying a high complexity encryption and decryption algorithm on mobile devices will greatly increase the burden of terminal operation and the difficulty to implement the necessary privacy protection algorithm. In this paper, we propose ENSURE (EfficieNt and SecURE), an efficient and secure encrypted search architecture over mobile cloud storage. ENSURE is inspired by edge computing. It allows mobile devices to offload the computation intensive task onto the edge server to achieve a high efficiency. Besides, to protect data security, it reduces the information acquisition of untrusted cloud by hiding the relevance between query keyword and search results from the cloud. Experiments on a real data set show that ENSURE reduces the computation time by 15% to 49% and saves the energy consumption by 38% to 69% per query.
An efficient sparse matrix multiplication scheme for the CYBER 205 computer

NASA Technical Reports Server (NTRS)

Lambiotte, Jules J., Jr.

1988-01-01

This paper describes the development of an efficient algorithm for computing the product of a matrix and vector on a CYBER 205 vector computer. The desire to provide software which allows the user to choose between the often conflicting goals of minimizing central processing unit (CPU) time or storage requirements has led to a diagonal-based algorithm in which one of four types of storage is selected for each diagonal. The candidate storage types employed were chosen to be efficient on the CYBER 205 for diagonals which have nonzero structure which is dense, moderately sparse, very sparse and short, or very sparse and long; however, for many densities, no diagonal type is most efficient with respect to both resource requirements, and a trade-off must be made. For each diagonal, an initialization subroutine estimates the CPU time and storage required for each storage type based on results from previously performed numerical experimentation. These requirements are adjusted by weights provided by the user which reflect the relative importance the user places on the two resources. The adjusted resource requirements are then compared to select the most efficient storage and computational scheme.

A fast and automatic fusion algorithm for unregistered multi-exposure image sequence

NASA Astrophysics Data System (ADS)

Liu, Yan; Yu, Feihong

2014-09-01

Human visual system (HVS) can visualize all the brightness levels of the scene through visual adaptation. However, the dynamic range of most commercial digital cameras and display devices are smaller than the dynamic range of human eye. This implies low dynamic range (LDR) images captured by normal digital camera may lose image details. We propose an efficient approach to high dynamic (HDR) image fusion that copes with image displacement and image blur degradation in a computationally efficient manner, which is suitable for implementation on mobile devices. The various image registration algorithms proposed in the previous literatures are unable to meet the efficiency and performance requirements in the application of mobile devices. In this paper, we selected Oriented Brief (ORB) detector to extract local image structures. The descriptor selected in multi-exposure image fusion algorithm has to be fast and robust to illumination variations and geometric deformations. ORB descriptor is the best candidate in our algorithm. Further, we perform an improved RANdom Sample Consensus (RANSAC) algorithm to reject incorrect matches. For the fusion of images, a new approach based on Stationary Wavelet Transform (SWT) is used. The experimental results demonstrate that the proposed algorithm generates high quality images at low computational cost. Comparisons with a number of other feature matching methods show that our method gets better performance.
Small target detection using objectness and saliency

NASA Astrophysics Data System (ADS)

Zhang, Naiwen; Xiao, Yang; Fang, Zhiwen; Yang, Jian; Wang, Li; Li, Tao

2017-10-01

We are motived by the need for generic object detection algorithm which achieves high recall for small targets in complex scenes with acceptable computational efficiency. We propose a novel object detection algorithm, which has high localization quality with acceptable computational cost. Firstly, we obtain the objectness map as in BING[1] and use NMS to get the top N points. Then, k-means algorithm is used to cluster them into K classes according to their location. We set the center points of the K classes as seed points. For each seed point, an object potential region is extracted. Finally, a fast salient object detection algorithm[2] is applied to the object potential regions to highlight objectlike pixels, and a series of efficient post-processing operations are proposed to locate the targets. Our method runs at 5 FPS on 1000*1000 images, and significantly outperforms previous methods on small targets in cluttered background.
Unconventional Hamilton-type variational principle in phase space and symplectic algorithm

NASA Astrophysics Data System (ADS)

Luo, En; Huang, Weijiang; Zhang, Hexin

2003-06-01

By a novel approach proposed by Luo, the unconventional Hamilton-type variational principle in phase space for elastodynamics of multidegree-of-freedom system is established in this paper. It not only can fully characterize the initial-value problem of this dynamic, but also has a natural symplectic structure. Based on this variational principle, a symplectic algorithm which is called a symplectic time-subdomain method is proposed. A non-difference scheme is constructed by applying Lagrange interpolation polynomial to the time subdomain. Furthermore, it is also proved that the presented symplectic algorithm is an unconditionally stable one. From the results of the two numerical examples of different types, it can be seen that the accuracy and the computational efficiency of the new method excel obviously those of widely used Wilson-θ and Newmark-β methods. Therefore, this new algorithm is a highly efficient one with better computational performance.
Quantum Chemistry on Quantum Computers: A Polynomial-Time Quantum Algorithm for Constructing the Wave Functions of Open-Shell Molecules.

PubMed

Sugisaki, Kenji; Yamamoto, Satoru; Nakazawa, Shigeaki; Toyota, Kazuo; Sato, Kazunobu; Shiomi, Daisuke; Takui, Takeji

2016-08-18

Quantum computers are capable to efficiently perform full configuration interaction (FCI) calculations of atoms and molecules by using the quantum phase estimation (QPE) algorithm. Because the success probability of the QPE depends on the overlap between approximate and exact wave functions, efficient methods to prepare accurate initial guess wave functions enough to have sufficiently large overlap with the exact ones are highly desired. Here, we propose a quantum algorithm to construct the wave function consisting of one configuration state function, which is suitable for the initial guess wave function in QPE-based FCI calculations of open-shell molecules, based on the addition theorem of angular momentum. The proposed quantum algorithm enables us to prepare the wave function consisting of an exponential number of Slater determinants only by a polynomial number of quantum operations.
An adaptive inverse kinematics algorithm for robot manipulators

NASA Technical Reports Server (NTRS)

Colbaugh, R.; Glass, K.; Seraji, H.

1990-01-01

An adaptive algorithm for solving the inverse kinematics problem for robot manipulators is presented. The algorithm is derived using model reference adaptive control (MRAC) theory and is computationally efficient for online applications. The scheme requires no a priori knowledge of the kinematics of the robot if Cartesian end-effector sensing is available, and it requires knowledge of only the forward kinematics if joint position sensing is used. Computer simulation results are given for the redundant seven-DOF robotics research arm, demonstrating that the proposed algorithm yields accurate joint angle trajectories for a given end-effector position/orientation trajectory.
Ion flux through membrane channels--an enhanced algorithm for the Poisson-Nernst-Planck model.

PubMed

Dyrka, Witold; Augousti, Andy T; Kotulska, Malgorzata

2008-09-01

A novel algorithmic scheme for numerical solution of the 3D Poisson-Nernst-Planck model is proposed. The algorithmic improvements are universal and independent of the detailed physical model. They include three major steps: an adjustable gradient-based step value, an adjustable relaxation coefficient, and an optimized segmentation of the modeled space. The enhanced algorithm significantly accelerates the speed of computation and reduces the computational demands. The theoretical model was tested on a regular artificial channel and validated on a real protein channel-alpha-hemolysin, proving its efficiency. (c) 2008 Wiley Periodicals, Inc.
Monkey search algorithm for ECE components partitioning

NASA Astrophysics Data System (ADS)

Kuliev, Elmar; Kureichik, Vladimir; Kureichik, Vladimir, Jr.

2018-05-01

The paper considers one of the important design problems – a partitioning of electronic computer equipment (ECE) components (blocks). It belongs to the NP-hard class of problems and has a combinatorial and logic nature. In the paper, a partitioning problem formulation can be found as a partition of graph into parts. To solve the given problem, the authors suggest using a bioinspired approach based on a monkey search algorithm. Based on the developed software, computational experiments were carried out that show the algorithm efficiency, as well as its recommended settings for obtaining more effective solutions in comparison with a genetic algorithm.
Efficient Constant-Time Complexity Algorithm for Stochastic Simulation of Large Reaction Networks.

PubMed

Thanh, Vo Hong; Zunino, Roberto; Priami, Corrado

2017-01-01

Exact stochastic simulation is an indispensable tool for a quantitative study of biochemical reaction networks. The simulation realizes the time evolution of the model by randomly choosing a reaction to fire and update the system state according to a probability that is proportional to the reaction propensity. Two computationally expensive tasks in simulating large biochemical networks are the selection of next reaction firings and the update of reaction propensities due to state changes. We present in this work a new exact algorithm to optimize both of these simulation bottlenecks. Our algorithm employs the composition-rejection on the propensity bounds of reactions to select the next reaction firing. The selection of next reaction firings is independent of the number reactions while the update of propensities is skipped and performed only when necessary. It therefore provides a favorable scaling for the computational complexity in simulating large reaction networks. We benchmark our new algorithm with the state of the art algorithms available in literature to demonstrate its applicability and efficiency.
Comparison of OPC job prioritization schemes to generate data for mask manufacturing

NASA Astrophysics Data System (ADS)

Lewis, Travis; Veeraraghavan, Vijay; Jantzen, Kenneth; Kim, Stephen; Park, Minyoung; Russell, Gordon; Simmons, Mark

2015-03-01

Delivering mask ready OPC corrected data to the mask shop on-time is critical for a foundry to meet the cycle time commitment for a new product. With current OPC compute resource sharing technology, different job scheduling algorithms are possible, such as, priority based resource allocation and fair share resource allocation. In order to maximize computer cluster efficiency, minimize the cost of the data processing and deliver data on schedule, the trade-offs of each scheduling algorithm need to be understood. Using actual production jobs, each of the scheduling algorithms will be tested in a production tape-out environment. Each scheduling algorithm will be judged on its ability to deliver data on schedule and the trade-offs associated with each method will be analyzed. It is now possible to introduce advance scheduling algorithms to the OPC data processing environment to meet the goals of on-time delivery of mask ready OPC data while maximizing efficiency and reducing cost.
An adaptive multi-level simulation algorithm for stochastic biological systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lester, C., E-mail: lesterc@maths.ox.ac.uk; Giles, M. B.; Baker, R. E.

2015-01-14

Discrete-state, continuous-time Markov models are widely used in the modeling of biochemical reaction networks. Their complexity often precludes analytic solution, and we rely on stochastic simulation algorithms (SSA) to estimate system statistics. The Gillespie algorithm is exact, but computationally costly as it simulates every single reaction. As such, approximate stochastic simulation algorithms such as the tau-leap algorithm are often used. Potentially computationally more efficient, the system statistics generated suffer from significant bias unless tau is relatively small, in which case the computational time can be comparable to that of the Gillespie algorithm. The multi-level method [Anderson and Higham, “Multi-level Montemore » Carlo for continuous time Markov chains, with applications in biochemical kinetics,” SIAM Multiscale Model. Simul. 10(1), 146–179 (2012)] tackles this problem. A base estimator is computed using many (cheap) sample paths at low accuracy. The bias inherent in this estimator is then reduced using a number of corrections. Each correction term is estimated using a collection of paired sample paths where one path of each pair is generated at a higher accuracy compared to the other (and so more expensive). By sharing random variables between these paired paths, the variance of each correction estimator can be reduced. This renders the multi-level method very efficient as only a relatively small number of paired paths are required to calculate each correction term. In the original multi-level method, each sample path is simulated using the tau-leap algorithm with a fixed value of τ. This approach can result in poor performance when the reaction activity of a system changes substantially over the timescale of interest. By introducing a novel adaptive time-stepping approach where τ is chosen according to the stochastic behaviour of each sample path, we extend the applicability of the multi-level method to such cases. We demonstrate the efficiency of our method using a number of examples.« less
Proportional Topology Optimization: A New Non-Sensitivity Method for Solving Stress Constrained and Minimum Compliance Problems and Its Implementation in MATLAB

PubMed Central

Biyikli, Emre; To, Albert C.

2015-01-01

A new topology optimization method called the Proportional Topology Optimization (PTO) is presented. As a non-sensitivity method, PTO is simple to understand, easy to implement, and is also efficient and accurate at the same time. It is implemented into two MATLAB programs to solve the stress constrained and minimum compliance problems. Descriptions of the algorithm and computer programs are provided in detail. The method is applied to solve three numerical examples for both types of problems. The method shows comparable efficiency and accuracy with an existing optimality criteria method which computes sensitivities. Also, the PTO stress constrained algorithm and minimum compliance algorithm are compared by feeding output from one algorithm to the other in an alternative manner, where the former yields lower maximum stress and volume fraction but higher compliance compared to the latter. Advantages and disadvantages of the proposed method and future works are discussed. The computer programs are self-contained and publicly shared in the website www.ptomethod.org. PMID:26678849
An Efficient Biometric-Based Algorithm Using Heart Rate Variability for Securing Body Sensor Networks

PubMed Central

Pirbhulal, Sandeep; Zhang, Heye; Mukhopadhyay, Subhas Chandra; Li, Chunyue; Wang, Yumei; Li, Guanglin; Wu, Wanqing; Zhang, Yuan-Ting

2015-01-01

Body Sensor Network (BSN) is a network of several associated sensor nodes on, inside or around the human body to monitor vital signals, such as, Electroencephalogram (EEG), Photoplethysmography (PPG), Electrocardiogram (ECG), etc. Each sensor node in BSN delivers major information; therefore, it is very significant to provide data confidentiality and security. All existing approaches to secure BSN are based on complex cryptographic key generation procedures, which not only demands high resource utilization and computation time, but also consumes large amount of energy, power and memory during data transmission. However, it is indispensable to put forward energy efficient and computationally less complex authentication technique for BSN. In this paper, a novel biometric-based algorithm is proposed, which utilizes Heart Rate Variability (HRV) for simple key generation process to secure BSN. Our proposed algorithm is compared with three data authentication techniques, namely Physiological Signal based Key Agreement (PSKA), Data Encryption Standard (DES) and Rivest Shamir Adleman (RSA). Simulation is performed in Matlab and results suggest that proposed algorithm is quite efficient in terms of transmission time utilization, average remaining energy and total power consumption. PMID:26131666
An Efficient Biometric-Based Algorithm Using Heart Rate Variability for Securing Body Sensor Networks.

PubMed

Pirbhulal, Sandeep; Zhang, Heye; Mukhopadhyay, Subhas Chandra; Li, Chunyue; Wang, Yumei; Li, Guanglin; Wu, Wanqing; Zhang, Yuan-Ting

2015-06-26

Body Sensor Network (BSN) is a network of several associated sensor nodes on, inside or around the human body to monitor vital signals, such as, Electroencephalogram (EEG), Photoplethysmography (PPG), Electrocardiogram (ECG), etc. Each sensor node in BSN delivers major information; therefore, it is very significant to provide data confidentiality and security. All existing approaches to secure BSN are based on complex cryptographic key generation procedures, which not only demands high resource utilization and computation time, but also consumes large amount of energy, power and memory during data transmission. However, it is indispensable to put forward energy efficient and computationally less complex authentication technique for BSN. In this paper, a novel biometric-based algorithm is proposed, which utilizes Heart Rate Variability (HRV) for simple key generation process to secure BSN. Our proposed algorithm is compared with three data authentication techniques, namely Physiological Signal based Key Agreement (PSKA), Data Encryption Standard (DES) and Rivest Shamir Adleman (RSA). Simulation is performed in Matlab and results suggest that proposed algorithm is quite efficient in terms of transmission time utilization, average remaining energy and total power consumption.
The remote sensing image segmentation mean shift algorithm parallel processing based on MapReduce

NASA Astrophysics Data System (ADS)

Chen, Xi; Zhou, Liqing

2015-12-01

With the development of satellite remote sensing technology and the remote sensing image data, traditional remote sensing image segmentation technology cannot meet the massive remote sensing image processing and storage requirements. This article put cloud computing and parallel computing technology in remote sensing image segmentation process, and build a cheap and efficient computer cluster system that uses parallel processing to achieve MeanShift algorithm of remote sensing image segmentation based on the MapReduce model, not only to ensure the quality of remote sensing image segmentation, improved split speed, and better meet the real-time requirements. The remote sensing image segmentation MeanShift algorithm parallel processing algorithm based on MapReduce shows certain significance and a realization of value.
Computing the Envelope for Stepwise Constant Resource Allocations

NASA Technical Reports Server (NTRS)

Muscettola, Nicola; Clancy, Daniel (Technical Monitor)

2001-01-01

Estimating tight resource level is a fundamental problem in the construction of flexible plans with resource utilization. In this paper we describe an efficient algorithm that builds a resource envelope, the tightest possible such bound. The algorithm is based on transforming the temporal network of resource consuming and producing events into a flow network with noises equal to the events and edges equal to the necessary predecessor links between events. The incremental solution of a staged maximum flow problem on the network is then used to compute the time of occurrence and the height of each step of the resource envelope profile. The staged algorithm has the same computational complexity of solving a maximum flow problem on the entire flow network. This makes this method computationally feasible for use in the inner loop of search-based scheduling algorithms.
Specialized Computer Systems for Environment Visualization

NASA Astrophysics Data System (ADS)

Al-Oraiqat, Anas M.; Bashkov, Evgeniy A.; Zori, Sergii A.

2018-06-01

The need for real time image generation of landscapes arises in various fields as part of tasks solved by virtual and augmented reality systems, as well as geographic information systems. Such systems provide opportunities for collecting, storing, analyzing and graphically visualizing geographic data. Algorithmic and hardware software tools for increasing the realism and efficiency of the environment visualization in 3D visualization systems are proposed. This paper discusses a modified path tracing algorithm with a two-level hierarchy of bounding volumes and finding intersections with Axis-Aligned Bounding Box. The proposed algorithm eliminates the branching and hence makes the algorithm more suitable to be implemented on the multi-threaded CPU and GPU. A modified ROAM algorithm is used to solve the qualitative visualization of reliefs' problems and landscapes. The algorithm is implemented on parallel systems—cluster and Compute Unified Device Architecture-networks. Results show that the implementation on MPI clusters is more efficient than Graphics Processing Unit/Graphics Processing Clusters and allows real-time synthesis. The organization and algorithms of the parallel GPU system for the 3D pseudo stereo image/video synthesis are proposed. With realizing possibility analysis on a parallel GPU-architecture of each stage, 3D pseudo stereo synthesis is performed. An experimental prototype of a specialized hardware-software system 3D pseudo stereo imaging and video was developed on the CPU/GPU. The experimental results show that the proposed adaptation of 3D pseudo stereo imaging to the architecture of GPU-systems is efficient. Also it accelerates the computational procedures of 3D pseudo-stereo synthesis for the anaglyph and anamorphic formats of the 3D stereo frame without performing optimization procedures. The acceleration is on average 11 and 54 times for test GPUs.
An efficient and portable SIMD algorithm for charge/current deposition in Particle-In-Cell codes

NASA Astrophysics Data System (ADS)

Vincenti, H.; Lobet, M.; Lehe, R.; Sasanka, R.; Vay, J.-L.

2017-01-01

In current computer architectures, data movement (from die to network) is by far the most energy consuming part of an algorithm (≈ 20 pJ/word on-die to ≈10,000 pJ/word on the network). To increase memory locality at the hardware level and reduce energy consumption related to data movement, future exascale computers tend to use many-core processors on each compute nodes that will have a reduced clock speed to allow for efficient cooling. To compensate for frequency decrease, machine vendors are making use of long SIMD instruction registers that are able to process multiple data with one arithmetic operator in one clock cycle. SIMD register length is expected to double every four years. As a consequence, Particle-In-Cell (PIC) codes will have to achieve good vectorization to fully take advantage of these upcoming architectures. In this paper, we present a new algorithm that allows for efficient and portable SIMD vectorization of current/charge deposition routines that are, along with the field gathering routines, among the most time consuming parts of the PIC algorithm. Our new algorithm uses a particular data structure that takes into account memory alignment constraints and avoids gather/scatter instructions that can significantly affect vectorization performances on current CPUs. The new algorithm was successfully implemented in the 3D skeleton PIC code PICSAR and tested on Haswell Xeon processors (AVX2-256 bits wide data registers). Results show a factor of × 2 to × 2.5 speed-up in double precision for particle shape factor of orders 1- 3. The new algorithm can be applied as is on future KNL (Knights Landing) architectures that will include AVX-512 instruction sets with 512 bits register lengths (8 doubles/16 singles).
A flexible and accurate digital volume correlation method applicable to high-resolution volumetric images

NASA Astrophysics Data System (ADS)

Pan, Bing; Wang, Bo

2017-10-01

Digital volume correlation (DVC) is a powerful technique for quantifying interior deformation within solid opaque materials and biological tissues. In the last two decades, great efforts have been made to improve the accuracy and efficiency of the DVC algorithm. However, there is still a lack of a flexible, robust and accurate version that can be efficiently implemented in personal computers with limited RAM. This paper proposes an advanced DVC method that can realize accurate full-field internal deformation measurement applicable to high-resolution volume images with up to billions of voxels. Specifically, a novel layer-wise reliability-guided displacement tracking strategy combined with dynamic data management is presented to guide the DVC computation from slice to slice. The displacements at specified calculation points in each layer are computed using the advanced 3D inverse-compositional Gauss-Newton algorithm with the complete initial guess of the deformation vector accurately predicted from the computed calculation points. Since only limited slices of interest in the reference and deformed volume images rather than the whole volume images are required, the DVC calculation can thus be efficiently implemented on personal computers. The flexibility, accuracy and efficiency of the presented DVC approach are demonstrated by analyzing computer-simulated and experimentally obtained high-resolution volume images.
A heuristic re-mapping algorithm reducing inter-level communication in SAMR applications.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Steensland, Johan; Ray, Jaideep

2003-07-01

This paper aims at decreasing execution time for large-scale structured adaptive mesh refinement (SAMR) applications by proposing a new heuristic re-mapping algorithm and experimentally showing its effectiveness in reducing inter-level communication. Tests were done for five different SAMR applications. The overall goal is to engineer a dynamically adaptive meta-partitioner capable of selecting and configuring the most appropriate partitioning strategy at run-time based on current system and application state. Such a metapartitioner can significantly reduce execution times for general SAMR applications. Computer simulations of physical phenomena are becoming increasingly popular as they constitute an important complement to real-life testing. In manymore » cases, such simulations are based on solving partial differential equations by numerical methods. Adaptive methods are crucial to efficiently utilize computer resources such as memory and CPU. But even with adaption, the simulations are computationally demanding and yield huge data sets. Thus parallelization and the efficient partitioning of data become issues of utmost importance. Adaption causes the workload to change dynamically, calling for dynamic (re-) partitioning to maintain efficient resource utilization. The proposed heuristic algorithm reduced inter-level communication substantially. Since the complexity of the proposed algorithm is low, this decrease comes at a relatively low cost. As a consequence, we draw the conclusion that the proposed re-mapping algorithm would be useful to lower overall execution times for many large SAMR applications. Due to its usefulness and its parameterization, the proposed algorithm would constitute a natural and important component of the meta-partitioner.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)

Biyikli, Emre; To, Albert C., E-mail: albertto@pitt.edu

Atomistic/continuum coupling methods combine accurate atomistic methods and efficient continuum methods to simulate the behavior of highly ordered crystalline systems. Coupled methods utilize the advantages of both approaches to simulate systems at a lower computational cost, while retaining the accuracy associated with atomistic methods. Many concurrent atomistic/continuum coupling methods have been proposed in the past; however, their true computational efficiency has not been demonstrated. The present work presents an efficient implementation of a concurrent coupling method called the Multiresolution Molecular Mechanics (MMM) for serial, parallel, and adaptive analysis. First, we present the features of the software implemented along with themore » associated technologies. The scalability of the software implementation is demonstrated, and the competing effects of multiscale modeling and parallelization are discussed. Then, the algorithms contributing to the efficiency of the software are presented. These include algorithms for eliminating latent ghost atoms from calculations and measurement-based dynamic balancing of parallel workload. The efficiency improvements made by these algorithms are demonstrated by benchmark tests. The efficiency of the software is found to be on par with LAMMPS, a state-of-the-art Molecular Dynamics (MD) simulation code, when performing full atomistic simulations. Speed-up of the MMM method is shown to be directly proportional to the reduction of the number of the atoms visited in force computation. Finally, an adaptive MMM analysis on a nanoindentation problem, containing over a million atoms, is performed, yielding an improvement of 6.3–8.5 times in efficiency, over the full atomistic MD method. For the first time, the efficiency of a concurrent atomistic/continuum coupling method is comprehensively investigated and demonstrated.« less

Molecular Isotopic Distribution Analysis (MIDAs) with Adjustable Mass Accuracy

NASA Astrophysics Data System (ADS)

Alves, Gelio; Ogurtsov, Aleksey Y.; Yu, Yi-Kuo

2014-01-01

In this paper, we present Molecular Isotopic Distribution Analysis (MIDAs), a new software tool designed to compute molecular isotopic distributions with adjustable accuracies. MIDAs offers two algorithms, one polynomial-based and one Fourier-transform-based, both of which compute molecular isotopic distributions accurately and efficiently. The polynomial-based algorithm contains few novel aspects, whereas the Fourier-transform-based algorithm consists mainly of improvements to other existing Fourier-transform-based algorithms. We have benchmarked the performance of the two algorithms implemented in MIDAs with that of eight software packages (BRAIN, Emass, Mercury, Mercury5, NeutronCluster, Qmass, JFC, IC) using a consensus set of benchmark molecules. Under the proposed evaluation criteria, MIDAs's algorithms, JFC, and Emass compute with comparable accuracy the coarse-grained (low-resolution) isotopic distributions and are more accurate than the other software packages. For fine-grained isotopic distributions, we compared IC, MIDAs's polynomial algorithm, and MIDAs's Fourier transform algorithm. Among the three, IC and MIDAs's polynomial algorithm compute isotopic distributions that better resemble their corresponding exact fine-grained (high-resolution) isotopic distributions. MIDAs can be accessed freely through a user-friendly web-interface at http://www.ncbi.nlm.nih.gov/CBBresearch/Yu/midas/index.html.
Molecular Isotopic Distribution Analysis (MIDAs) with adjustable mass accuracy.

PubMed

Alves, Gelio; Ogurtsov, Aleksey Y; Yu, Yi-Kuo

2014-01-01

In this paper, we present Molecular Isotopic Distribution Analysis (MIDAs), a new software tool designed to compute molecular isotopic distributions with adjustable accuracies. MIDAs offers two algorithms, one polynomial-based and one Fourier-transform-based, both of which compute molecular isotopic distributions accurately and efficiently. The polynomial-based algorithm contains few novel aspects, whereas the Fourier-transform-based algorithm consists mainly of improvements to other existing Fourier-transform-based algorithms. We have benchmarked the performance of the two algorithms implemented in MIDAs with that of eight software packages (BRAIN, Emass, Mercury, Mercury5, NeutronCluster, Qmass, JFC, IC) using a consensus set of benchmark molecules. Under the proposed evaluation criteria, MIDAs's algorithms, JFC, and Emass compute with comparable accuracy the coarse-grained (low-resolution) isotopic distributions and are more accurate than the other software packages. For fine-grained isotopic distributions, we compared IC, MIDAs's polynomial algorithm, and MIDAs's Fourier transform algorithm. Among the three, IC and MIDAs's polynomial algorithm compute isotopic distributions that better resemble their corresponding exact fine-grained (high-resolution) isotopic distributions. MIDAs can be accessed freely through a user-friendly web-interface at http://www.ncbi.nlm.nih.gov/CBBresearch/Yu/midas/index.html.
Highly Scalable Matching Pursuit Signal Decomposition Algorithm

NASA Technical Reports Server (NTRS)

Christensen, Daniel; Das, Santanu; Srivastava, Ashok N.

2009-01-01

Matching Pursuit Decomposition (MPD) is a powerful iterative algorithm for signal decomposition and feature extraction. MPD decomposes any signal into linear combinations of its dictionary elements or atoms . A best fit atom from an arbitrarily defined dictionary is determined through cross-correlation. The selected atom is subtracted from the signal and this procedure is repeated on the residual in the subsequent iterations until a stopping criterion is met. The reconstructed signal reveals the waveform structure of the original signal. However, a sufficiently large dictionary is required for an accurate reconstruction; this in return increases the computational burden of the algorithm, thus limiting its applicability and level of adoption. The purpose of this research is to improve the scalability and performance of the classical MPD algorithm. Correlation thresholds were defined to prune insignificant atoms from the dictionary. The Coarse-Fine Grids and Multiple Atom Extraction techniques were proposed to decrease the computational burden of the algorithm. The Coarse-Fine Grids method enabled the approximation and refinement of the parameters for the best fit atom. The ability to extract multiple atoms within a single iteration enhanced the effectiveness and efficiency of each iteration. These improvements were implemented to produce an improved Matching Pursuit Decomposition algorithm entitled MPD++. Disparate signal decomposition applications may require a particular emphasis of accuracy or computational efficiency. The prominence of the key signal features required for the proper signal classification dictates the level of accuracy necessary in the decomposition. The MPD++ algorithm may be easily adapted to accommodate the imposed requirements. Certain feature extraction applications may require rapid signal decomposition. The full potential of MPD++ may be utilized to produce incredible performance gains while extracting only slightly less energy than the standard algorithm. When the utmost accuracy must be achieved, the modified algorithm extracts atoms more conservatively but still exhibits computational gains over classical MPD. The MPD++ algorithm was demonstrated using an over-complete dictionary on real life data. Computational times were reduced by factors of 1.9 and 44 for the emphases of accuracy and performance, respectively. The modified algorithm extracted similar amounts of energy compared to classical MPD. The degree of the improvement in computational time depends on the complexity of the data, the initialization parameters, and the breadth of the dictionary. The results of the research confirm that the three modifications successfully improved the scalability and computational efficiency of the MPD algorithm. Correlation Thresholding decreased the time complexity by reducing the dictionary size. Multiple Atom Extraction also reduced the time complexity by decreasing the number of iterations required for a stopping criterion to be reached. The Course-Fine Grids technique enabled complicated atoms with numerous variable parameters to be effectively represented in the dictionary. Due to the nature of the three proposed modifications, they are capable of being stacked and have cumulative effects on the reduction of the time complexity.
Stochastic subset selection for learning with kernel machines.

PubMed

Rhinelander, Jason; Liu, Xiaoping P

2012-06-01

Kernel machines have gained much popularity in applications of machine learning. Support vector machines (SVMs) are a subset of kernel machines and generalize well for classification, regression, and anomaly detection tasks. The training procedure for traditional SVMs involves solving a quadratic programming (QP) problem. The QP problem scales super linearly in computational effort with the number of training samples and is often used for the offline batch processing of data. Kernel machines operate by retaining a subset of observed data during training. The data vectors contained within this subset are referred to as support vectors (SVs). The work presented in this paper introduces a subset selection method for the use of kernel machines in online, changing environments. Our algorithm works by using a stochastic indexing technique when selecting a subset of SVs when computing the kernel expansion. The work described here is novel because it separates the selection of kernel basis functions from the training algorithm used. The subset selection algorithm presented here can be used in conjunction with any online training technique. It is important for online kernel machines to be computationally efficient due to the real-time requirements of online environments. Our algorithm is an important contribution because it scales linearly with the number of training samples and is compatible with current training techniques. Our algorithm outperforms standard techniques in terms of computational efficiency and provides increased recognition accuracy in our experiments. We provide results from experiments using both simulated and real-world data sets to verify our algorithm.
Solving search problems by strongly simulating quantum circuits

PubMed Central

Johnson, T. H.; Biamonte, J. D.; Clark, S. R.; Jaksch, D.

2013-01-01

Simulating quantum circuits using classical computers lets us analyse the inner workings of quantum algorithms. The most complete type of simulation, strong simulation, is believed to be generally inefficient. Nevertheless, several efficient strong simulation techniques are known for restricted families of quantum circuits and we develop an additional technique in this article. Further, we show that strong simulation algorithms perform another fundamental task: solving search problems. Efficient strong simulation techniques allow solutions to a class of search problems to be counted and found efficiently. This enhances the utility of strong simulation methods, known or yet to be discovered, and extends the class of search problems known to be efficiently simulable. Relating strong simulation to search problems also bounds the computational power of efficiently strongly simulable circuits; if they could solve all problems in P this would imply that all problems in NP and #P could be solved in polynomial time. PMID:23390585
Advanced Dynamically Adaptive Algorithms for Stochastic Simulations on Extreme Scales

DOE Office of Scientific and Technical Information (OSTI.GOV)

Xiu, Dongbin

2017-03-03

The focus of the project is the development of mathematical methods and high-performance computational tools for stochastic simulations, with a particular emphasis on computations on extreme scales. The core of the project revolves around the design of highly efficient and scalable numerical algorithms that can adaptively and accurately, in high dimensional spaces, resolve stochastic problems with limited smoothness, even containing discontinuities.
Feature Selection in Classification of Eye Movements Using Electrooculography for Activity Recognition

PubMed Central

Mala, S.; Latha, K.

2014-01-01

Activity recognition is needed in different requisition, for example, reconnaissance system, patient monitoring, and human-computer interfaces. Feature selection plays an important role in activity recognition, data mining, and machine learning. In selecting subset of features, an efficient evolutionary algorithm Differential Evolution (DE), a very efficient optimizer, is used for finding informative features from eye movements using electrooculography (EOG). Many researchers use EOG signals in human-computer interactions with various computational intelligence methods to analyze eye movements. The proposed system involves analysis of EOG signals using clearness based features, minimum redundancy maximum relevance features, and Differential Evolution based features. This work concentrates more on the feature selection algorithm based on DE in order to improve the classification for faultless activity recognition. PMID:25574185
Feature selection in classification of eye movements using electrooculography for activity recognition.

PubMed

Mala, S; Latha, K

2014-01-01

Activity recognition is needed in different requisition, for example, reconnaissance system, patient monitoring, and human-computer interfaces. Feature selection plays an important role in activity recognition, data mining, and machine learning. In selecting subset of features, an efficient evolutionary algorithm Differential Evolution (DE), a very efficient optimizer, is used for finding informative features from eye movements using electrooculography (EOG). Many researchers use EOG signals in human-computer interactions with various computational intelligence methods to analyze eye movements. The proposed system involves analysis of EOG signals using clearness based features, minimum redundancy maximum relevance features, and Differential Evolution based features. This work concentrates more on the feature selection algorithm based on DE in order to improve the classification for faultless activity recognition.
An Efficient Algorithm for TUCKALS3 on Data with Large Numbers of Observation Units.

ERIC Educational Resources Information Center

Kiers, Henk A. L.; And Others

1992-01-01

A modification of the TUCKALS3 algorithm is proposed that handles three-way arrays of order I x J x K for any I. The reduced work space needed for storing data and increased execution speed make the modified algorithm very suitable for use on personal computers. (SLD)
Exact parallel algorithms for some members of the traveling salesman problem family

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pekny, J.F.

1989-01-01

The traveling salesman problem and its many generalizations comprise one of the best known combinatorial optimization problem families. Most members of the family are NP-complete problems so that exact algorithms require an unpredictable and sometimes large computational effort. Parallel computers offer hope for providing the power required to meet these demands. A major barrier to applying parallel computers is the lack of parallel algorithms. The contributions presented in this thesis center around new exact parallel algorithms for the asymmetric traveling salesman problem (ATSP), prize collecting traveling salesman problem (PCTSP), and resource constrained traveling salesman problem (RCTSP). The RCTSP is amore » particularly difficult member of the family since finding a feasible solution is an NP-complete problem. An exact sequential algorithm is also presented for the directed hamiltonian cycle problem (DHCP). The DHCP algorithm is superior to current heuristic approaches and represents the first exact method applicable to large graphs. Computational results presented for each of the algorithms demonstrates the effectiveness of combining efficient algorithms with parallel computing methods. Performance statistics are reported for randomly generated ATSPs with 7,500 cities, PCTSPs with 200 cities, RCTSPs with 200 cities, DHCPs with 3,500 vertices, and assignment problems of size 10,000. Sequential results were collected on a Sun 4/260 engineering workstation, while parallel results were collected using a 14 and 100 processor BBN Butterfly Plus computer. The computational results represent the largest instances ever solved to optimality on any type of computer.« less
A survey on keeler’s theorem and application of symmetric group for swapping game

NASA Astrophysics Data System (ADS)

Pratama, Yohanssen; Prakasa, Yohenry

2017-01-01

An episode of Futurama features two-body mind-switching machine which will not work more than once on the same pair of bodies. The problem is “Can the switching be undone so as to restore all minds to their original bodies?” Ken Keeler found an algorithm that undoes any mind-scrambling permutation, and Lihua Huang found the refinement of it. We look on the process how the puzzle can be modeled in terms group theory and using symmetric group to solve it and find the most efficient way of it. After that we will try to build the algorithm to implement it into the computer program and see the effect of the transposition notion into the algorithm complexity. The number of steps that given by the algorithm will be different and one of algorithms will have the advantage in terms of efficiency. We compare Ken Keeler and Lihua Huang algorithms to see is there any difference if we run it in the computer program, although the complexity could be remain the same.
A generalized Condat's algorithm of 1D total variation regularization

NASA Astrophysics Data System (ADS)

Makovetskii, Artyom; Voronin, Sergei; Kober, Vitaly

2017-09-01

A common way for solving the denosing problem is to utilize the total variation (TV) regularization. Many efficient numerical algorithms have been developed for solving the TV regularization problem. Condat described a fast direct algorithm to compute the processed 1D signal. Also there exists a direct algorithm with a linear time for 1D TV denoising referred to as the taut string algorithm. The Condat's algorithm is based on a dual problem to the 1D TV regularization. In this paper, we propose a variant of the Condat's algorithm based on the direct 1D TV regularization problem. The usage of the Condat's algorithm with the taut string approach leads to a clear geometric description of the extremal function. Computer simulation results are provided to illustrate the performance of the proposed algorithm for restoration of degraded signals.
CREKID: A computer code for transient, gas-phase combustion of kinetics

NASA Technical Reports Server (NTRS)

Pratt, D. T.; Radhakrishnan, K.

1984-01-01

A new algorithm was developed for fast, automatic integration of chemical kinetic rate equations describing homogeneous, gas-phase combustion at constant pressure. Particular attention is paid to the distinguishing physical and computational characteristics of the induction, heat-release and equilibration regimes. The two-part predictor-corrector algorithm, based on an exponentially-fitted trapezoidal rule, includes filtering of ill-posed initial conditions, automatic selection of Newton-Jacobi or Newton iteration for convergence to achieve maximum computational efficiency while observing a prescribed error tolerance. The new algorithm was found to compare favorably with LSODE on two representative test problems drawn from combustion kinetics.
Scientific Discovery through Advanced Computing (SciDAC-3) Partnership Project Annual Report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hoffman, Forest M.; Bochev, Pavel B.; Cameron-Smith, Philip J..

The Applying Computationally Efficient Schemes for BioGeochemical Cycles ACES4BGC Project is advancing the predictive capabilities of Earth System Models (ESMs) by reducing two of the largest sources of uncertainty, aerosols and biospheric feedbacks, with a highly efficient computational approach. In particular, this project is implementing and optimizing new computationally efficient tracer advection algorithms for large numbers of tracer species; adding important biogeochemical interactions between the atmosphere, land, and ocean models; and applying uncertainty quanti cation (UQ) techniques to constrain process parameters and evaluate uncertainties in feedbacks between biogeochemical cycles and the climate system.
Multigrid treatment of implicit continuum diffusion

NASA Astrophysics Data System (ADS)

Francisquez, Manaure; Zhu, Ben; Rogers, Barrett

2017-10-01

Implicit treatment of diffusive terms of various differential orders common in continuum mechanics modeling, such as computational fluid dynamics, is investigated with spectral and multigrid algorithms in non-periodic 2D domains. In doubly periodic time dependent problems these terms can be efficiently and implicitly handled by spectral methods, but in non-periodic systems solved with distributed memory parallel computing and 2D domain decomposition, this efficiency is lost for large numbers of processors. We built and present here a multigrid algorithm for these types of problems which outperforms a spectral solution that employs the highly optimized FFTW library. This multigrid algorithm is not only suitable for high performance computing but may also be able to efficiently treat implicit diffusion of arbitrary order by introducing auxiliary equations of lower order. We test these solvers for fourth and sixth order diffusion with idealized harmonic test functions as well as a turbulent 2D magnetohydrodynamic simulation. It is also shown that an anisotropic operator without cross-terms can improve model accuracy and speed, and we examine the impact that the various diffusion operators have on the energy, the enstrophy, and the qualitative aspect of a simulation. This work was supported by DOE-SC-0010508. This research used resources of the National Energy Research Scientific Computing Center (NERSC).
Hamiltonian Monte Carlo acceleration using surrogate functions with random bases.

PubMed

Zhang, Cheng; Shahbaba, Babak; Zhao, Hongkai

2017-11-01

For big data analysis, high computational cost for Bayesian methods often limits their applications in practice. In recent years, there have been many attempts to improve computational efficiency of Bayesian inference. Here we propose an efficient and scalable computational technique for a state-of-the-art Markov chain Monte Carlo methods, namely, Hamiltonian Monte Carlo. The key idea is to explore and exploit the structure and regularity in parameter space for the underlying probabilistic model to construct an effective approximation of its geometric properties. To this end, we build a surrogate function to approximate the target distribution using properly chosen random bases and an efficient optimization process. The resulting method provides a flexible, scalable, and efficient sampling algorithm, which converges to the correct target distribution. We show that by choosing the basis functions and optimization process differently, our method can be related to other approaches for the construction of surrogate functions such as generalized additive models or Gaussian process models. Experiments based on simulated and real data show that our approach leads to substantially more efficient sampling algorithms compared to existing state-of-the-art methods.
Hardware architecture design of image restoration based on time-frequency domain computation

NASA Astrophysics Data System (ADS)

Wen, Bo; Zhang, Jing; Jiao, Zipeng

2013-10-01

The image restoration algorithms based on time-frequency domain computation is high maturity and applied widely in engineering. To solve the high-speed implementation of these algorithms, the TFDC hardware architecture is proposed. Firstly, the main module is designed, by analyzing the common processing and numerical calculation. Then, to improve the commonality, the iteration control module is planed for iterative algorithms. In addition, to reduce the computational cost and memory requirements, the necessary optimizations are suggested for the time-consuming module, which include two-dimensional FFT/IFFT and the plural calculation. Eventually, the TFDC hardware architecture is adopted for hardware design of real-time image restoration system. The result proves that, the TFDC hardware architecture and its optimizations can be applied to image restoration algorithms based on TFDC, with good algorithm commonality, hardware realizability and high efficiency.
Spatio-temporal colour correction of strongly degraded movies

NASA Astrophysics Data System (ADS)

Islam, A. B. M. Tariqul; Farup, Ivar

2011-01-01

The archives of motion pictures represent an important part of precious cultural heritage. Unfortunately, these cinematography collections are vulnerable to different distortions such as colour fading which is beyond the capability of photochemical restoration process. Spatial colour algorithms-Retinex and ACE provide helpful tool in restoring strongly degraded colour films but, there are some challenges associated with these algorithms. We present an automatic colour correction technique for digital colour restoration of strongly degraded movie material. The method is based upon the existing STRESS algorithm. In order to cope with the problem of highly correlated colour channels, we implemented a preprocessing step in which saturation enhancement is performed in a PCA space. Spatial colour algorithms tend to emphasize all details in the images, including dust and scratches. Surprisingly, we found that the presence of these defects does not affect the behaviour of the colour correction algorithm. Although the STRESS algorithm is already in itself more efficient than traditional spatial colour algorithms, it is still computationally expensive. To speed it up further, we went beyond the spatial domain of the frames and extended the algorithm to the temporal domain. This way, we were able to achieve an 80 percent reduction of the computational time compared to processing every single frame individually. We performed two user experiments and found that the visual quality of the resulting frames was significantly better than with existing methods. Thus, our method outperforms the existing ones in terms of both visual quality and computational efficiency.
Multi-fidelity stochastic collocation method for computation of statistical moments

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhu, Xueyu, E-mail: xueyu-zhu@uiowa.edu; Linebarger, Erin M., E-mail: aerinline@sci.utah.edu; Xiu, Dongbin, E-mail: xiu.16@osu.edu

We present an efficient numerical algorithm to approximate the statistical moments of stochastic problems, in the presence of models with different fidelities. The method extends the multi-fidelity approximation method developed in . By combining the efficiency of low-fidelity models and the accuracy of high-fidelity models, our method exhibits fast convergence with a limited number of high-fidelity simulations. We establish an error bound of the method and present several numerical examples to demonstrate the efficiency and applicability of the multi-fidelity algorithm.
Data intensive computing at Sandia.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wilson, Andrew T.

2010-09-01

Data-Intensive Computing is parallel computing where you design your algorithms and your software around efficient access and traversal of a data set; where hardware requirements are dictated by data size as much as by desired run times usually distilling compact results from massive data.

Protein alignment algorithms with an efficient backtracking routine on multiple GPUs.

PubMed

Blazewicz, Jacek; Frohmberg, Wojciech; Kierzynka, Michal; Pesch, Erwin; Wojciechowski, Pawel

2011-05-20

Pairwise sequence alignment methods are widely used in biological research. The increasing number of sequences is perceived as one of the upcoming challenges for sequence alignment methods in the nearest future. To overcome this challenge several GPU (Graphics Processing Unit) computing approaches have been proposed lately. These solutions show a great potential of a GPU platform but in most cases address the problem of sequence database scanning and computing only the alignment score whereas the alignment itself is omitted. Thus, the need arose to implement the global and semiglobal Needleman-Wunsch, and Smith-Waterman algorithms with a backtracking procedure which is needed to construct the alignment. In this paper we present the solution that performs the alignment of every given sequence pair, which is a required step for progressive multiple sequence alignment methods, as well as for DNA recognition at the DNA assembly stage. Performed tests show that the implementation, with performance up to 6.3 GCUPS on a single GPU for affine gap penalties, is very efficient in comparison to other CPU and GPU-based solutions. Moreover, multiple GPUs support with load balancing makes the application very scalable. The article shows that the backtracking procedure of the sequence alignment algorithms may be designed to fit in with the GPU architecture. Therefore, our algorithm, apart from scores, is able to compute pairwise alignments. This opens a wide range of new possibilities, allowing other methods from the area of molecular biology to take advantage of the new computational architecture. Performed tests show that the efficiency of the implementation is excellent. Moreover, the speed of our GPU-based algorithms can be almost linearly increased when using more than one graphics card.
CALCLENS: Weak lensing simulations for large-area sky surveys and second-order effects in cosmic shear power spectra

NASA Astrophysics Data System (ADS)

Becker, Matthew Rand

I present a new algorithm, CALCLENS, for efficiently computing weak gravitational lensing shear signals from large N-body light cone simulations over a curved sky. This new algorithm properly accounts for the sky curvature and boundary conditions, is able to produce redshift- dependent shear signals including corrections to the Born approximation by using multiple- plane ray tracing, and properly computes the lensed images of source galaxies in the light cone. The key feature of this algorithm is a new, computationally efficient Poisson solver for the sphere that combines spherical harmonic transform and multigrid methods. As a result, large areas of sky (~10,000 square degrees) can be ray traced efficiently at high-resolution using only a few hundred cores. Using this new algorithm and curved-sky calculations that only use a slower but more accurate spherical harmonic transform Poisson solver, I study the convergence, shear E-mode, shear B-mode and rotation mode power spectra. Employing full-sky E/B-mode decompositions, I confirm that the numerically computed shear B-mode and rotation mode power spectra are equal at high accuracy ( ≲ 1%) as expected from perturbation theory up to second order. Coupled with realistic galaxy populations placed in large N-body light cone simulations, this new algorithm is ideally suited for the construction of synthetic weak lensing shear catalogs to be used to test for systematic effects in data analysis procedures for upcoming large-area sky surveys. The implementation presented in this work, written in C and employing widely available software libraries to maintain portability, is publicly available at http://code.google.com/p/calclens.
CALCLENS: weak lensing simulations for large-area sky surveys and second-order effects in cosmic shear power spectra

NASA Astrophysics Data System (ADS)

Becker, Matthew R.

2013-10-01

I present a new algorithm, Curved-sky grAvitational Lensing for Cosmological Light conE simulatioNS (CALCLENS), for efficiently computing weak gravitational lensing shear signals from large N-body light cone simulations over a curved sky. This new algorithm properly accounts for the sky curvature and boundary conditions, is able to produce redshift-dependent shear signals including corrections to the Born approximation by using multiple-plane ray tracing and properly computes the lensed images of source galaxies in the light cone. The key feature of this algorithm is a new, computationally efficient Poisson solver for the sphere that combines spherical harmonic transform and multigrid methods. As a result, large areas of sky (˜10 000 square degrees) can be ray traced efficiently at high resolution using only a few hundred cores. Using this new algorithm and curved-sky calculations that only use a slower but more accurate spherical harmonic transform Poisson solver, I study the convergence, shear E-mode, shear B-mode and rotation mode power spectra. Employing full-sky E/B-mode decompositions, I confirm that the numerically computed shear B-mode and rotation mode power spectra are equal at high accuracy (≲1 per cent) as expected from perturbation theory up to second order. Coupled with realistic galaxy populations placed in large N-body light cone simulations, this new algorithm is ideally suited for the construction of synthetic weak lensing shear catalogues to be used to test for systematic effects in data analysis procedures for upcoming large-area sky surveys. The implementation presented in this work, written in C and employing widely available software libraries to maintain portability, is publicly available at http://code.google.com/p/calclens.
An algorithmic framework for multiobjective optimization.

PubMed

Ganesan, T; Elamvazuthi, I; Shaari, Ku Zilati Ku; Vasant, P

2013-01-01

Multiobjective (MO) optimization is an emerging field which is increasingly being encountered in many fields globally. Various metaheuristic techniques such as differential evolution (DE), genetic algorithm (GA), gravitational search algorithm (GSA), and particle swarm optimization (PSO) have been used in conjunction with scalarization techniques such as weighted sum approach and the normal-boundary intersection (NBI) method to solve MO problems. Nevertheless, many challenges still arise especially when dealing with problems with multiple objectives (especially in cases more than two). In addition, problems with extensive computational overhead emerge when dealing with hybrid algorithms. This paper discusses these issues by proposing an alternative framework that utilizes algorithmic concepts related to the problem structure for generating efficient and effective algorithms. This paper proposes a framework to generate new high-performance algorithms with minimal computational overhead for MO optimization.
An Algorithmic Framework for Multiobjective Optimization

PubMed Central

Ganesan, T.; Elamvazuthi, I.; Shaari, Ku Zilati Ku; Vasant, P.

2013-01-01

Multiobjective (MO) optimization is an emerging field which is increasingly being encountered in many fields globally. Various metaheuristic techniques such as differential evolution (DE), genetic algorithm (GA), gravitational search algorithm (GSA), and particle swarm optimization (PSO) have been used in conjunction with scalarization techniques such as weighted sum approach and the normal-boundary intersection (NBI) method to solve MO problems. Nevertheless, many challenges still arise especially when dealing with problems with multiple objectives (especially in cases more than two). In addition, problems with extensive computational overhead emerge when dealing with hybrid algorithms. This paper discusses these issues by proposing an alternative framework that utilizes algorithmic concepts related to the problem structure for generating efficient and effective algorithms. This paper proposes a framework to generate new high-performance algorithms with minimal computational overhead for MO optimization. PMID:24470795
Simulation of 3-D Nonequilibrium Seeded Air Flow in the NASA-Ames MHD Channel

NASA Technical Reports Server (NTRS)

Gupta, Sumeet; Tannehill, John C.; Mehta, Unmeel B.

2004-01-01

The 3-D nonequilibrium seeded air flow in the NASA-Ames experimental MHD channel has been numerically simulated. The channel contains a nozzle section, a center section, and an accelerator section where magnetic and electric fields can be imposed on the flow. In recent tests, velocity increases of up to 40% have been achieved in the accelerator section. The flow in the channel is numerically computed us ing a 3-D parabolized Navier-Stokes (PNS) algorithm that has been developed to efficiently compute MHD flows in the low magnetic Reynolds number regime: The MHD effects are modeled by introducing source terms into the PNS equations which can then be solved in a very efficient manner. The algorithm has been extended in the present study to account for nonequilibrium seeded air flows. The electrical conductivity of the flow is determined using the program of Park. The new algorithm has been used to compute two test cases that match the experimental conditions. In both cases, magnetic and electric fields are applied to the seeded flow. The computed results are in good agreement with the experimental data.
Evaluation of Emerging Energy-Efficient Heterogeneous Computing Platforms for Biomolecular and Cellular Simulation Workloads.

PubMed

Stone, John E; Hallock, Michael J; Phillips, James C; Peterson, Joseph R; Luthey-Schulten, Zaida; Schulten, Klaus

2016-05-01

Many of the continuing scientific advances achieved through computational biology are predicated on the availability of ongoing increases in computational power required for detailed simulation and analysis of cellular processes on biologically-relevant timescales. A critical challenge facing the development of future exascale supercomputer systems is the development of new computing hardware and associated scientific applications that dramatically improve upon the energy efficiency of existing solutions, while providing increased simulation, analysis, and visualization performance. Mobile computing platforms have recently become powerful enough to support interactive molecular visualization tasks that were previously only possible on laptops and workstations, creating future opportunities for their convenient use for meetings, remote collaboration, and as head mounted displays for immersive stereoscopic viewing. We describe early experiences adapting several biomolecular simulation and analysis applications for emerging heterogeneous computing platforms that combine power-efficient system-on-chip multi-core CPUs with high-performance massively parallel GPUs. We present low-cost power monitoring instrumentation that provides sufficient temporal resolution to evaluate the power consumption of individual CPU algorithms and GPU kernels. We compare the performance and energy efficiency of scientific applications running on emerging platforms with results obtained on traditional platforms, identify hardware and algorithmic performance bottlenecks that affect the usability of these platforms, and describe avenues for improving both the hardware and applications in pursuit of the needs of molecular modeling tasks on mobile devices and future exascale computers.
Developing the fuzzy c-means clustering algorithm based on maximum entropy for multitarget tracking in a cluttered environment

NASA Astrophysics Data System (ADS)

Chen, Xiao; Li, Yaan; Yu, Jing; Li, Yuxing

2018-01-01

For fast and more effective implementation of tracking multiple targets in a cluttered environment, we propose a multiple targets tracking (MTT) algorithm called maximum entropy fuzzy c-means clustering joint probabilistic data association that combines fuzzy c-means clustering and the joint probabilistic data association (PDA) algorithm. The algorithm uses the membership value to express the probability of the target originating from measurement. The membership value is obtained through fuzzy c-means clustering objective function optimized by the maximum entropy principle. When considering the effect of the public measurement, we use a correction factor to adjust the association probability matrix to estimate the state of the target. As this algorithm avoids confirmation matrix splitting, it can solve the high computational load problem of the joint PDA algorithm. The results of simulations and analysis conducted for tracking neighbor parallel targets and cross targets in a different density cluttered environment show that the proposed algorithm can realize MTT quickly and efficiently in a cluttered environment. Further, the performance of the proposed algorithm remains constant with increasing process noise variance. The proposed algorithm has the advantages of efficiency and low computational load, which can ensure optimum performance when tracking multiple targets in a dense cluttered environment.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Trędak, Przemysław, E-mail: przemyslaw.tredak@fuw.edu.pl; Rudnicki, Witold R.; Interdisciplinary Centre for Mathematical and Computational Modelling, University of Warsaw, ul. Pawińskiego 5a, 02-106 Warsaw

The second generation Reactive Bond Order (REBO) empirical potential is commonly used to accurately model a wide range hydrocarbon materials. It is also extensible to other atom types and interactions. REBO potential assumes complex multi-body interaction model, that is difficult to represent efficiently in the SIMD or SIMT programming model. Hence, despite its importance, no efficient GPGPU implementation has been developed for this potential. Here we present a detailed description of a highly efficient GPGPU implementation of molecular dynamics algorithm using REBO potential. The presented algorithm takes advantage of rarely used properties of the SIMT architecture of a modern GPUmore » to solve difficult synchronizations issues that arise in computations of multi-body potential. Techniques developed for this problem may be also used to achieve efficient solutions of different problems. The performance of proposed algorithm is assessed using a range of model systems. It is compared to highly optimized CPU implementation (both single core and OpenMP) available in LAMMPS package. These experiments show up to 6x improvement in forces computation time using single processor of the NVIDIA Tesla K80 compared to high end 16-core Intel Xeon processor.« less
A parallel algorithm for the two-dimensional time fractional diffusion equation with implicit difference method.

PubMed

Gong, Chunye; Bao, Weimin; Tang, Guojian; Jiang, Yuewen; Liu, Jie

2014-01-01

It is very time consuming to solve fractional differential equations. The computational complexity of two-dimensional fractional differential equation (2D-TFDE) with iterative implicit finite difference method is O(M(x)M(y)N(2)). In this paper, we present a parallel algorithm for 2D-TFDE and give an in-depth discussion about this algorithm. A task distribution model and data layout with virtual boundary are designed for this parallel algorithm. The experimental results show that the parallel algorithm compares well with the exact solution. The parallel algorithm on single Intel Xeon X5540 CPU runs 3.16-4.17 times faster than the serial algorithm on single CPU core. The parallel efficiency of 81 processes is up to 88.24% compared with 9 processes on a distributed memory cluster system. We do think that the parallel computing technology will become a very basic method for the computational intensive fractional applications in the near future.
Seismic waveform tomography with shot-encoding using a restarted L-BFGS algorithm.

PubMed

Rao, Ying; Wang, Yanghua

2017-08-17

In seismic waveform tomography, or full-waveform inversion (FWI), one effective strategy used to reduce the computational cost is shot-encoding, which encodes all shots randomly and sums them into one super shot to significantly reduce the number of wavefield simulations in the inversion. However, this process will induce instability in the iterative inversion regardless of whether it uses a robust limited-memory BFGS (L-BFGS) algorithm. The restarted L-BFGS algorithm proposed here is both stable and efficient. This breakthrough ensures, for the first time, the applicability of advanced FWI methods to three-dimensional seismic field data. In a standard L-BFGS algorithm, if the shot-encoding remains unchanged, it will generate a crosstalk effect between different shots. This crosstalk effect can only be suppressed by employing sufficient randomness in the shot-encoding. Therefore, the implementation of the L-BFGS algorithm is restarted at every segment. Each segment consists of a number of iterations; the first few iterations use an invariant encoding, while the remainder use random re-coding. This restarted L-BFGS algorithm balances the computational efficiency of shot-encoding, the convergence stability of the L-BFGS algorithm, and the inversion quality characteristic of random encoding in FWI.
Elastic Cloud Computing Architecture and System for Heterogeneous Spatiotemporal Computing

NASA Astrophysics Data System (ADS)

Shi, X.

2017-10-01

Spatiotemporal computation implements a variety of different algorithms. When big data are involved, desktop computer or standalone application may not be able to complete the computation task due to limited memory and computing power. Now that a variety of hardware accelerators and computing platforms are available to improve the performance of geocomputation, different algorithms may have different behavior on different computing infrastructure and platforms. Some are perfect for implementation on a cluster of graphics processing units (GPUs), while GPUs may not be useful on certain kind of spatiotemporal computation. This is the same situation in utilizing a cluster of Intel's many-integrated-core (MIC) or Xeon Phi, as well as Hadoop or Spark platforms, to handle big spatiotemporal data. Furthermore, considering the energy efficiency requirement in general computation, Field Programmable Gate Array (FPGA) may be a better solution for better energy efficiency when the performance of computation could be similar or better than GPUs and MICs. It is expected that an elastic cloud computing architecture and system that integrates all of GPUs, MICs, and FPGAs could be developed and deployed to support spatiotemporal computing over heterogeneous data types and computational problems.
A-VCI: A flexible method to efficiently compute vibrational spectra

NASA Astrophysics Data System (ADS)

Odunlami, Marc; Le Bris, Vincent; Bégué, Didier; Baraille, Isabelle; Coulaud, Olivier

2017-06-01

The adaptive vibrational configuration interaction algorithm has been introduced as a new method to efficiently reduce the dimension of the set of basis functions used in a vibrational configuration interaction process. It is based on the construction of nested bases for the discretization of the Hamiltonian operator according to a theoretical criterion that ensures the convergence of the method. In the present work, the Hamiltonian is written as a sum of products of operators. The purpose of this paper is to study the properties and outline the performance details of the main steps of the algorithm. New parameters have been incorporated to increase flexibility, and their influence has been thoroughly investigated. The robustness and reliability of the method are demonstrated for the computation of the vibrational spectrum up to 3000 cm-1 of a widely studied 6-atom molecule (acetonitrile). Our results are compared to the most accurate up to date computation; we also give a new reference calculation for future work on this system. The algorithm has also been applied to a more challenging 7-atom molecule (ethylene oxide). The computed spectrum up to 3200 cm-1 is the most accurate computation that exists today on such systems.
A-VCI: A flexible method to efficiently compute vibrational spectra.

PubMed

Odunlami, Marc; Le Bris, Vincent; Bégué, Didier; Baraille, Isabelle; Coulaud, Olivier

2017-06-07

The adaptive vibrational configuration interaction algorithm has been introduced as a new method to efficiently reduce the dimension of the set of basis functions used in a vibrational configuration interaction process. It is based on the construction of nested bases for the discretization of the Hamiltonian operator according to a theoretical criterion that ensures the convergence of the method. In the present work, the Hamiltonian is written as a sum of products of operators. The purpose of this paper is to study the properties and outline the performance details of the main steps of the algorithm. New parameters have been incorporated to increase flexibility, and their influence has been thoroughly investigated. The robustness and reliability of the method are demonstrated for the computation of the vibrational spectrum up to 3000 cm -1 of a widely studied 6-atom molecule (acetonitrile). Our results are compared to the most accurate up to date computation; we also give a new reference calculation for future work on this system. The algorithm has also been applied to a more challenging 7-atom molecule (ethylene oxide). The computed spectrum up to 3200 cm -1 is the most accurate computation that exists today on such systems.
Customizing FP-growth algorithm to parallel mining with Charm++ library

NASA Astrophysics Data System (ADS)

Puścian, Marek

2017-08-01

This paper presents a frequent item mining algorithm that was customized to handle growing data repositories. The proposed solution applies Master Slave scheme to frequent pattern growth technique. Efficient utilization of available computation units is achieved by dynamic reallocation of tasks. Conditional frequent trees are assigned to parallel workers basing on their workload. Proposed enhancements have been successfully implemented using Charm++ library. This paper discusses results of the performance of parallelized FP-growth algorithm against different datasets. The approach has been illustrated with many experiments and measurements performed using multiprocessor and multithreaded computer.
Architecture and data processing alternatives for the tse computer. Volume 4: Image rotation using tse operations

NASA Technical Reports Server (NTRS)

Kao, M. H.; Bodenheimer, R. E.

1976-01-01

The tse computer's capability of achieving image congruence between temporal and multiple images with misregistration due to rotational differences is reported. The coordinate transformations are obtained and a general algorithms is devised to perform image rotation using tse operations very efficiently. The details of this algorithm as well as its theoretical implications are presented. Step by step procedures of image registration are described in detail. Numerous examples are also employed to demonstrate the correctness and the effectiveness of the algorithms and conclusions and recommendations are made.
Symmetric log-domain diffeomorphic Registration: a demons-based approach.

PubMed

Vercauteren, Tom; Pennec, Xavier; Perchant, Aymeric; Ayache, Nicholas

2008-01-01

Modern morphometric studies use non-linear image registration to compare anatomies and perform group analysis. Recently, log-Euclidean approaches have contributed to promote the use of such computational anatomy tools by permitting simple computations of statistics on a rather large class of invertible spatial transformations. In this work, we propose a non-linear registration algorithm perfectly fit for log-Euclidean statistics on diffeomorphisms. Our algorithm works completely in the log-domain, i.e. it uses a stationary velocity field. This implies that we guarantee the invertibility of the deformation and have access to the true inverse transformation. This also means that our output can be directly used for log-Euclidean statistics without relying on the heavy computation of the log of the spatial transformation. As it is often desirable, our algorithm is symmetric with respect to the order of the input images. Furthermore, we use an alternate optimization approach related to Thirion's demons algorithm to provide a fast non-linear registration algorithm. First results show that our algorithm outperforms both the demons algorithm and the recently proposed diffeomorphic demons algorithm in terms of accuracy of the transformation while remaining computationally efficient.
Structural alignment of protein descriptors - a combinatorial model.

PubMed

Antczak, Maciej; Kasprzak, Marta; Lukasiak, Piotr; Blazewicz, Jacek

2016-09-17

Structural alignment of proteins is one of the most challenging problems in molecular biology. The tertiary structure of a protein strictly correlates with its function and computationally predicted structures are nowadays a main premise for understanding the latter. However, computationally derived 3D models often exhibit deviations from the native structure. A way to confirm a model is a comparison with other structures. The structural alignment of a pair of proteins can be defined with the use of a concept of protein descriptors. The protein descriptors are local substructures of protein molecules, which allow us to divide the original problem into a set of subproblems and, consequently, to propose a more efficient algorithmic solution. In the literature, one can find many applications of the descriptors concept that prove its usefulness for insight into protein 3D structures, but the proposed approaches are presented rather from the biological perspective than from the computational or algorithmic point of view. Efficient algorithms for identification and structural comparison of descriptors can become crucial components of methods for structural quality assessment as well as tertiary structure prediction. In this paper, we propose a new combinatorial model and new polynomial-time algorithms for the structural alignment of descriptors. The model is based on the maximum-size assignment problem, which we define here and prove that it can be solved in polynomial time. We demonstrate suitability of this approach by comparison with an exact backtracking algorithm. Besides a simplification coming from the combinatorial modeling, both on the conceptual and complexity level, we gain with this approach high quality of obtained results, in terms of 3D alignment accuracy and processing efficiency. All the proposed algorithms were developed and integrated in a computationally efficient tool descs-standalone, which allows the user to identify and structurally compare descriptors of biological molecules, such as proteins and RNAs. Both PDB (Protein Data Bank) and mmCIF (macromolecular Crystallographic Information File) formats are supported. The proposed tool is available as an open source project stored on GitHub ( https://github.com/mantczak/descs-standalone ).
Quantum computation for solving linear systems

NASA Astrophysics Data System (ADS)

Cao, Yudong

Quantum computation is a subject born out of the combination between physics and computer science. It studies how the laws of quantum mechanics can be exploited to perform computations much more efficiently than current computers (termed classical computers as oppose to quantum computers). The thesis starts by introducing ideas from quantum physics and theoretical computer science and based on these ideas, introducing the basic concepts in quantum computing. These introductory discussions are intended for non-specialists to obtain the essential knowledge needed for understanding the new results presented in the subsequent chapters. After introducing the basics of quantum computing, we focus on the recently proposed quantum algorithm for linear systems. The new results include i) special instances of quantum circuits that can be implemented using current experimental resources; ii) detailed quantum algorithms that are suitable for a broader class of linear systems. We show that for some particular problems the quantum algorithm is able to achieve exponential speedup over their classical counterparts.
Vectorized algorithms for spiking neural network simulation.

PubMed

Brette, Romain; Goodman, Dan F M

2011-06-01

High-level languages (Matlab, Python) are popular in neuroscience because they are flexible and accelerate development. However, for simulating spiking neural networks, the cost of interpretation is a bottleneck. We describe a set of algorithms to simulate large spiking neural networks efficiently with high-level languages using vector-based operations. These algorithms constitute the core of Brian, a spiking neural network simulator written in the Python language. Vectorized simulation makes it possible to combine the flexibility of high-level languages with the computational efficiency usually associated with compiled languages.

Efficient implementation of the many-body Reactive Bond Order (REBO) potential on GPU

NASA Astrophysics Data System (ADS)

Trędak, Przemysław; Rudnicki, Witold R.; Majewski, Jacek A.

2016-09-01

The second generation Reactive Bond Order (REBO) empirical potential is commonly used to accurately model a wide range hydrocarbon materials. It is also extensible to other atom types and interactions. REBO potential assumes complex multi-body interaction model, that is difficult to represent efficiently in the SIMD or SIMT programming model. Hence, despite its importance, no efficient GPGPU implementation has been developed for this potential. Here we present a detailed description of a highly efficient GPGPU implementation of molecular dynamics algorithm using REBO potential. The presented algorithm takes advantage of rarely used properties of the SIMT architecture of a modern GPU to solve difficult synchronizations issues that arise in computations of multi-body potential. Techniques developed for this problem may be also used to achieve efficient solutions of different problems. The performance of proposed algorithm is assessed using a range of model systems. It is compared to highly optimized CPU implementation (both single core and OpenMP) available in LAMMPS package. These experiments show up to 6x improvement in forces computation time using single processor of the NVIDIA Tesla K80 compared to high end 16-core Intel Xeon processor.
A Semi-Vectorization Algorithm to Synthesis of Gravitational Anomaly Quantities on the Earth

NASA Astrophysics Data System (ADS)

Abdollahzadeh, M.; Eshagh, M.; Najafi Alamdari, M.

2009-04-01

The Earth's gravitational potential can be expressed by the well-known spherical harmonic expansion. The computational time of summing up this expansion is an important practical issue which can be reduced by an efficient numerical algorithm. This paper proposes such a method for block-wise synthesizing the anomaly quantities on the Earth surface using vectorization. Fully-vectorization means transformation of the summations to the simple matrix and vector products. It is not a practical for the matrices with large dimensions. Here a semi-vectorization algorithm is proposed to avoid working with large vectors and matrices. It speeds up the computations by using one loop for the summation either on degrees or on orders. The former is a good option to synthesize the anomaly quantities on the Earth surface considering a digital elevation model (DEM). This approach is more efficient than the two-step method which computes the quantities on the reference ellipsoid and continues them upward to the Earth surface. The algorithm has been coded in MATLAB which synthesizes a global grid of 5â²Ã- 5â² (corresponding 9 million points) of gravity anomaly or geoid height using a geopotential model to degree 360 in 10000 seconds by an ordinary computer with 2G RAM.
Automated Software Acceleration in Programmable Logic for an Efficient NFFT Algorithm Implementation: A Case Study.

PubMed

Rodríguez, Manuel; Magdaleno, Eduardo; Pérez, Fernando; García, Cristhian

2017-03-28

Non-equispaced Fast Fourier transform (NFFT) is a very important algorithm in several technological and scientific areas such as synthetic aperture radar, computational photography, medical imaging, telecommunications, seismic analysis and so on. However, its computation complexity is high. In this paper, we describe an efficient NFFT implementation with a hardware coprocessor using an All-Programmable System-on-Chip (APSoC). This is a hybrid device that employs an Advanced RISC Machine (ARM) as Processing System with Programmable Logic for high-performance digital signal processing through parallelism and pipeline techniques. The algorithm has been coded in C language with pragma directives to optimize the architecture of the system. We have used the very novel Software Develop System-on-Chip (SDSoC) evelopment tool that simplifies the interface and partitioning between hardware and software. This provides shorter development cycles and iterative improvements by exploring several architectures of the global system. The computational results shows that hardware acceleration significantly outperformed the software based implementation.
Automated Software Acceleration in Programmable Logic for an Efficient NFFT Algorithm Implementation: A Case Study

PubMed Central

Rodríguez, Manuel; Magdaleno, Eduardo; Pérez, Fernando; García, Cristhian

2017-01-01

Non-equispaced Fast Fourier transform (NFFT) is a very important algorithm in several technological and scientific areas such as synthetic aperture radar, computational photography, medical imaging, telecommunications, seismic analysis and so on. However, its computation complexity is high. In this paper, we describe an efficient NFFT implementation with a hardware coprocessor using an All-Programmable System-on-Chip (APSoC). This is a hybrid device that employs an Advanced RISC Machine (ARM) as Processing System with Programmable Logic for high-performance digital signal processing through parallelism and pipeline techniques. The algorithm has been coded in C language with pragma directives to optimize the architecture of the system. We have used the very novel Software Develop System-on-Chip (SDSoC) evelopment tool that simplifies the interface and partitioning between hardware and software. This provides shorter development cycles and iterative improvements by exploring several architectures of the global system. The computational results shows that hardware acceleration significantly outperformed the software based implementation. PMID:28350358
An algorithm for continuum modeling of rocks with multiple embedded nonlinearly-compliant joints [Continuum modeling of elasto-plastic media with multiple embedded nonlinearly-compliant joints

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hurley, R. C.; Vorobiev, O. Y.; Ezzedine, S. M.

Here, we present a numerical method for modeling the mechanical effects of nonlinearly-compliant joints in elasto-plastic media. The method uses a series of strain-rate and stress update algorithms to determine joint closure, slip, and solid stress within computational cells containing multiple “embedded” joints. This work facilitates efficient modeling of nonlinear wave propagation in large spatial domains containing a large number of joints that affect bulk mechanical properties. We implement the method within the massively parallel Lagrangian code GEODYN-L and provide verification and examples. We highlight the ability of our algorithms to capture joint interactions and multiple weakness planes within individualmore » computational cells, as well as its computational efficiency. We also discuss the motivation for developing the proposed technique: to simulate large-scale wave propagation during the Source Physics Experiments (SPE), a series of underground explosions conducted at the Nevada National Security Site (NNSS).« less
An algorithm for continuum modeling of rocks with multiple embedded nonlinearly-compliant joints [Continuum modeling of elasto-plastic media with multiple embedded nonlinearly-compliant joints

DOE PAGES

Hurley, R. C.; Vorobiev, O. Y.; Ezzedine, S. M.

2017-04-06

Here, we present a numerical method for modeling the mechanical effects of nonlinearly-compliant joints in elasto-plastic media. The method uses a series of strain-rate and stress update algorithms to determine joint closure, slip, and solid stress within computational cells containing multiple “embedded” joints. This work facilitates efficient modeling of nonlinear wave propagation in large spatial domains containing a large number of joints that affect bulk mechanical properties. We implement the method within the massively parallel Lagrangian code GEODYN-L and provide verification and examples. We highlight the ability of our algorithms to capture joint interactions and multiple weakness planes within individualmore » computational cells, as well as its computational efficiency. We also discuss the motivation for developing the proposed technique: to simulate large-scale wave propagation during the Source Physics Experiments (SPE), a series of underground explosions conducted at the Nevada National Security Site (NNSS).« less
Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large Networks

NASA Astrophysics Data System (ADS)

Esfandiar, Pooya; Bonchi, Francesco; Gleich, David F.; Greif, Chen; Lakshmanan, Laks V. S.; On, Byung-Won

Motivated by social network data mining problems such as link prediction and collaborative filtering, significant research effort has been devoted to computing topological measures including the Katz score and the commute time. Existing approaches typically approximate all pairwise relationships simultaneously. In this paper, we are interested in computing: the score for a single pair of nodes, and the top-k nodes with the best scores from a given source node. For the pairwise problem, we apply an iterative algorithm that computes upper and lower bounds for the measures we seek. This algorithm exploits a relationship between the Lanczos process and a quadrature rule. For the top-k problem, we propose an algorithm that only accesses a small portion of the graph and is related to techniques used in personalized PageRank computing. To test the scalability and accuracy of our algorithms we experiment with three real-world networks and find that these algorithms run in milliseconds to seconds without any preprocessing.
Fast katz and commuters : efficient estimation of social relatedness in large networks.

DOE Office of Scientific and Technical Information (OSTI.GOV)

On, Byung-Won; Lakshmanan, Laks V. S.; Greif, Chen

Motivated by social network data mining problems such as link prediction and collaborative filtering, significant research effort has been devoted to computing topological measures including the Katz score and the commute time. Existing approaches typically approximate all pairwise relationships simultaneously. In this paper, we are interested in computing: the score for a single pair of nodes, and the top-k nodes with the best scores from a given source node. For the pairwise problem, we apply an iterative algorithm that computes upper and lower bounds for the measures we seek. This algorithm exploits a relationship between the Lanczos process and amore » quadrature rule. For the top-k problem, we propose an algorithm that only accesses a small portion of the graph and is related to techniques used in personalized PageRank computing. To test the scalability and accuracy of our algorithms we experiment with three real-world networks and find that these algorithms run in milliseconds to seconds without any preprocessing.« less
Architecture and data processing alternatives for the TSE computer. Volume 3: Execution of a parallel counting algorithm using array logic (Tse) devices

NASA Technical Reports Server (NTRS)

Metcalfe, A. G.; Bodenheimer, R. E.

1976-01-01

A parallel algorithm for counting the number of logic-l elements in a binary array or image developed during preliminary investigation of the Tse concept is described. The counting algorithm is implemented using a basic combinational structure. Modifications which improve the efficiency of the basic structure are also presented. A programmable Tse computer structure is proposed, along with a hardware control unit, Tse instruction set, and software program for execution of the counting algorithm. Finally, a comparison is made between the different structures in terms of their more important characteristics.
Trajectory Segmentation Map-Matching Approach for Large-Scale, High-Resolution GPS Data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhu, Lei; Holden, Jacob R.; Gonder, Jeffrey D.

With the development of smartphones and portable GPS devices, large-scale, high-resolution GPS data can be collected. Map matching is a critical step in studying vehicle driving activity and recognizing network traffic conditions from the data. A new trajectory segmentation map-matching algorithm is proposed to deal accurately and efficiently with large-scale, high-resolution GPS trajectory data. The new algorithm separated the GPS trajectory into segments. It found the shortest path for each segment in a scientific manner and ultimately generated a best-matched path for the entire trajectory. The similarity of a trajectory segment and its matched path is described by a similaritymore » score system based on the longest common subsequence. The numerical experiment indicated that the proposed map-matching algorithm was very promising in relation to accuracy and computational efficiency. Large-scale data set applications verified that the proposed method is robust and capable of dealing with real-world, large-scale GPS data in a computationally efficient and accurate manner.« less
Trajectory Segmentation Map-Matching Approach for Large-Scale, High-Resolution GPS Data

DOE PAGES

Zhu, Lei; Holden, Jacob R.; Gonder, Jeffrey D.

2017-01-01

With the development of smartphones and portable GPS devices, large-scale, high-resolution GPS data can be collected. Map matching is a critical step in studying vehicle driving activity and recognizing network traffic conditions from the data. A new trajectory segmentation map-matching algorithm is proposed to deal accurately and efficiently with large-scale, high-resolution GPS trajectory data. The new algorithm separated the GPS trajectory into segments. It found the shortest path for each segment in a scientific manner and ultimately generated a best-matched path for the entire trajectory. The similarity of a trajectory segment and its matched path is described by a similaritymore » score system based on the longest common subsequence. The numerical experiment indicated that the proposed map-matching algorithm was very promising in relation to accuracy and computational efficiency. Large-scale data set applications verified that the proposed method is robust and capable of dealing with real-world, large-scale GPS data in a computationally efficient and accurate manner.« less
Implementation of the diagonalization-free algorithm in the self-consistent field procedure within the four-component relativistic scheme.

PubMed

Hrdá, Marcela; Kulich, Tomáš; Repiský, Michal; Noga, Jozef; Malkina, Olga L; Malkin, Vladimir G

2014-09-05

A recently developed Thouless-expansion-based diagonalization-free approach for improving the efficiency of self-consistent field (SCF) methods (Noga and Šimunek, J. Chem. Theory Comput. 2010, 6, 2706) has been adapted to the four-component relativistic scheme and implemented within the program package ReSpect. In addition to the implementation, the method has been thoroughly analyzed, particularly with respect to cases for which it is difficult or computationally expensive to find a good initial guess. Based on this analysis, several modifications of the original algorithm, refining its stability and efficiency, are proposed. To demonstrate the robustness and efficiency of the improved algorithm, we present the results of four-component diagonalization-free SCF calculations on several heavy-metal complexes, the largest of which contains more than 80 atoms (about 6000 4-spinor basis functions). The diagonalization-free procedure is about twice as fast as the corresponding diagonalization. Copyright © 2014 Wiley Periodicals, Inc.
Approximate Computing Techniques for Iterative Graph Algorithms

DOE Office of Scientific and Technical Information (OSTI.GOV)

Panyala, Ajay R.; Subasi, Omer; Halappanavar, Mahantesh

Approximate computing enables processing of large-scale graphs by trading off quality for performance. Approximate computing techniques have become critical not only due to the emergence of parallel architectures but also the availability of large scale datasets enabling data-driven discovery. Using two prototypical graph algorithms, PageRank and community detection, we present several approximate computing heuristics to scale the performance with minimal loss of accuracy. We present several heuristics including loop perforation, data caching, incomplete graph coloring and synchronization, and evaluate their efficiency. We demonstrate performance improvements of up to 83% for PageRank and up to 450x for community detection, with lowmore » impact of accuracy for both the algorithms. We expect the proposed approximate techniques will enable scalable graph analytics on data of importance to several applications in science and their subsequent adoption to scale similar graph algorithms.« less
Efficient Sum of Outer Products Dictionary Learning (SOUP-DIL) and Its Application to Inverse Problems.

PubMed

Ravishankar, Saiprasad; Nadakuditi, Raj Rao; Fessler, Jeffrey A

2017-12-01

The sparsity of signals in a transform domain or dictionary has been exploited in applications such as compression, denoising and inverse problems. More recently, data-driven adaptation of synthesis dictionaries has shown promise compared to analytical dictionary models. However, dictionary learning problems are typically non-convex and NP-hard, and the usual alternating minimization approaches for these problems are often computationally expensive, with the computations dominated by the NP-hard synthesis sparse coding step. This paper exploits the ideas that drive algorithms such as K-SVD, and investigates in detail efficient methods for aggregate sparsity penalized dictionary learning by first approximating the data with a sum of sparse rank-one matrices (outer products) and then using a block coordinate descent approach to estimate the unknowns. The resulting block coordinate descent algorithms involve efficient closed-form solutions. Furthermore, we consider the problem of dictionary-blind image reconstruction, and propose novel and efficient algorithms for adaptive image reconstruction using block coordinate descent and sum of outer products methodologies. We provide a convergence study of the algorithms for dictionary learning and dictionary-blind image reconstruction. Our numerical experiments show the promising performance and speedups provided by the proposed methods over previous schemes in sparse data representation and compressed sensing-based image reconstruction.
Efficient Sum of Outer Products Dictionary Learning (SOUP-DIL) and Its Application to Inverse Problems

PubMed Central

Ravishankar, Saiprasad; Nadakuditi, Raj Rao; Fessler, Jeffrey A.

2017-01-01

The sparsity of signals in a transform domain or dictionary has been exploited in applications such as compression, denoising and inverse problems. More recently, data-driven adaptation of synthesis dictionaries has shown promise compared to analytical dictionary models. However, dictionary learning problems are typically non-convex and NP-hard, and the usual alternating minimization approaches for these problems are often computationally expensive, with the computations dominated by the NP-hard synthesis sparse coding step. This paper exploits the ideas that drive algorithms such as K-SVD, and investigates in detail efficient methods for aggregate sparsity penalized dictionary learning by first approximating the data with a sum of sparse rank-one matrices (outer products) and then using a block coordinate descent approach to estimate the unknowns. The resulting block coordinate descent algorithms involve efficient closed-form solutions. Furthermore, we consider the problem of dictionary-blind image reconstruction, and propose novel and efficient algorithms for adaptive image reconstruction using block coordinate descent and sum of outer products methodologies. We provide a convergence study of the algorithms for dictionary learning and dictionary-blind image reconstruction. Our numerical experiments show the promising performance and speedups provided by the proposed methods over previous schemes in sparse data representation and compressed sensing-based image reconstruction. PMID:29376111
Parameterized Algorithmics for Finding Exact Solutions of NP-Hard Biological Problems.

PubMed

Hüffner, Falk; Komusiewicz, Christian; Niedermeier, Rolf; Wernicke, Sebastian

2017-01-01

Fixed-parameter algorithms are designed to efficiently find optimal solutions to some computationally hard (NP-hard) problems by identifying and exploiting "small" problem-specific parameters. We survey practical techniques to develop such algorithms. Each technique is introduced and supported by case studies of applications to biological problems, with additional pointers to experimental results.
Algorithm Diversity for Resilent Systems

DTIC Science & Technology

2016-06-27

data structures. 15. SUBJECT TERMS computer security, software diversity, program transformation 16. SECURITY CLASSIFICATION OF: 17. LIMITATION OF 18...systematic method for transforming Datalog rules with general universal and existential quantification into efficient algorithms with precise complexity...worst case in the size of the ground rules. There are numerous choices during the transformation that lead to diverse algorithms and different
Diffeomorphic demons: efficient non-parametric image registration.

PubMed

Vercauteren, Tom; Pennec, Xavier; Perchant, Aymeric; Ayache, Nicholas

2009-03-01

We propose an efficient non-parametric diffeomorphic image registration algorithm based on Thirion's demons algorithm. In the first part of this paper, we show that Thirion's demons algorithm can be seen as an optimization procedure on the entire space of displacement fields. We provide strong theoretical roots to the different variants of Thirion's demons algorithm. This analysis predicts a theoretical advantage for the symmetric forces variant of the demons algorithm. We show on controlled experiments that this advantage is confirmed in practice and yields a faster convergence. In the second part of this paper, we adapt the optimization procedure underlying the demons algorithm to a space of diffeomorphic transformations. In contrast to many diffeomorphic registration algorithms, our solution is computationally efficient since in practice it only replaces an addition of displacement fields by a few compositions. Our experiments show that in addition to being diffeomorphic, our algorithm provides results that are similar to the ones from the demons algorithm but with transformations that are much smoother and closer to the gold standard, available in controlled experiments, in terms of Jacobians.
GPU accelerated fuzzy connected image segmentation by using CUDA.

PubMed

Zhuge, Ying; Cao, Yong; Miller, Robert W

2009-01-01

Image segmentation techniques using fuzzy connectedness principles have shown their effectiveness in segmenting a variety of objects in several large applications in recent years. However, one problem of these algorithms has been their excessive computational requirements when processing large image datasets. Nowadays commodity graphics hardware provides high parallel computing power. In this paper, we present a parallel fuzzy connected image segmentation algorithm on Nvidia's Compute Unified Device Architecture (CUDA) platform for segmenting large medical image data sets. Our experiments based on three data sets with small, medium, and large data size demonstrate the efficiency of the parallel algorithm, which achieves a speed-up factor of 7.2x, 7.3x, and 14.4x, correspondingly, for the three data sets over the sequential implementation of fuzzy connected image segmentation algorithm on CPU.
Singular value decomposition for collaborative filtering on a GPU

NASA Astrophysics Data System (ADS)

Kato, Kimikazu; Hosino, Tikara

2010-06-01

A collaborative filtering predicts customers' unknown preferences from known preferences. In a computation of the collaborative filtering, a singular value decomposition (SVD) is needed to reduce the size of a large scale matrix so that the burden for the next phase computation will be decreased. In this application, SVD means a roughly approximated factorization of a given matrix into smaller sized matrices. Webb (a.k.a. Simon Funk) showed an effective algorithm to compute SVD toward a solution of an open competition called "Netflix Prize". The algorithm utilizes an iterative method so that the error of approximation improves in each step of the iteration. We give a GPU version of Webb's algorithm. Our algorithm is implemented in the CUDA and it is shown to be efficient by an experiment.

Optimal Golomb Ruler Sequences Generation for Optical WDM Systems: A Novel Parallel Hybrid Multi-objective Bat Algorithm

NASA Astrophysics Data System (ADS)

Bansal, Shonak; Singh, Arun Kumar; Gupta, Neena

2017-02-01

In real-life, multi-objective engineering design problems are very tough and time consuming optimization problems due to their high degree of nonlinearities, complexities and inhomogeneity. Nature-inspired based multi-objective optimization algorithms are now becoming popular for solving multi-objective engineering design problems. This paper proposes original multi-objective Bat algorithm (MOBA) and its extended form, namely, novel parallel hybrid multi-objective Bat algorithm (PHMOBA) to generate shortest length Golomb ruler called optimal Golomb ruler (OGR) sequences at a reasonable computation time. The OGRs found their application in optical wavelength division multiplexing (WDM) systems as channel-allocation algorithm to reduce the four-wave mixing (FWM) crosstalk. The performances of both the proposed algorithms to generate OGRs as optical WDM channel-allocation is compared with other existing classical computing and nature-inspired algorithms, including extended quadratic congruence (EQC), search algorithm (SA), genetic algorithms (GAs), biogeography based optimization (BBO) and big bang-big crunch (BB-BC) optimization algorithms. Simulations conclude that the proposed parallel hybrid multi-objective Bat algorithm works efficiently as compared to original multi-objective Bat algorithm and other existing algorithms to generate OGRs for optical WDM systems. The algorithm PHMOBA to generate OGRs, has higher convergence and success rate than original MOBA. The efficiency improvement of proposed PHMOBA to generate OGRs up to 20-marks, in terms of ruler length and total optical channel bandwidth (TBW) is 100 %, whereas for original MOBA is 85 %. Finally the implications for further research are also discussed.
A critical analysis of computational protein design with sparse residue interaction graphs

PubMed Central

Georgiev, Ivelin S.

2017-01-01

Protein design algorithms enumerate a combinatorial number of candidate structures to compute the Global Minimum Energy Conformation (GMEC). To efficiently find the GMEC, protein design algorithms must methodically reduce the conformational search space. By applying distance and energy cutoffs, the protein system to be designed can thus be represented using a sparse residue interaction graph, where the number of interacting residue pairs is less than all pairs of mutable residues, and the corresponding GMEC is called the sparse GMEC. However, ignoring some pairwise residue interactions can lead to a change in the energy, conformation, or sequence of the sparse GMEC vs. the original or the full GMEC. Despite the widespread use of sparse residue interaction graphs in protein design, the above mentioned effects of their use have not been previously analyzed. To analyze the costs and benefits of designing with sparse residue interaction graphs, we computed the GMECs for 136 different protein design problems both with and without distance and energy cutoffs, and compared their energies, conformations, and sequences. Our analysis shows that the differences between the GMECs depend critically on whether or not the design includes core, boundary, or surface residues. Moreover, neglecting long-range interactions can alter local interactions and introduce large sequence differences, both of which can result in significant structural and functional changes. Designs on proteins with experimentally measured thermostability show it is beneficial to compute both the full and the sparse GMEC accurately and efficiently. To this end, we show that a provable, ensemble-based algorithm can efficiently compute both GMECs by enumerating a small number of conformations, usually fewer than 1000. This provides a novel way to combine sparse residue interaction graphs with provable, ensemble-based algorithms to reap the benefits of sparse residue interaction graphs while avoiding their potential inaccuracies. PMID:28358804
On the efficient and reliable numerical solution of rate-and-state friction problems

NASA Astrophysics Data System (ADS)

Pipping, Elias; Kornhuber, Ralf; Rosenau, Matthias; Oncken, Onno

2016-03-01

We present a mathematically consistent numerical algorithm for the simulation of earthquake rupture with rate-and-state friction. Its main features are adaptive time stepping, a novel algebraic solution algorithm involving nonlinear multigrid and a fixed point iteration for the rate-and-state decoupling. The algorithm is applied to a laboratory scale subduction zone which allows us to compare our simulations with experimental results. Using physical parameters from the experiment, we find a good fit of recurrence time of slip events as well as their rupture width and peak slip. Computations in 3-D confirm efficiency and robustness of our algorithm.
Computationally efficient real-time interpolation algorithm for non-uniform sampled biosignals

PubMed Central

Eftekhar, Amir; Kindt, Wilko; Constandinou, Timothy G.

2016-01-01

This Letter presents a novel, computationally efficient interpolation method that has been optimised for use in electrocardiogram baseline drift removal. In the authors’ previous Letter three isoelectric baseline points per heartbeat are detected, and here utilised as interpolation points. As an extension from linear interpolation, their algorithm segments the interpolation interval and utilises different piecewise linear equations. Thus, the algorithm produces a linear curvature that is computationally efficient while interpolating non-uniform samples. The proposed algorithm is tested using sinusoids with different fundamental frequencies from 0.05 to 0.7 Hz and also validated with real baseline wander data acquired from the Massachusetts Institute of Technology University and Boston's Beth Israel Hospital (MIT-BIH) Noise Stress Database. The synthetic data results show an root mean square (RMS) error of 0.9 μV (mean), 0.63 μV (median) and 0.6 μV (standard deviation) per heartbeat on a 1 mVp–p 0.1 Hz sinusoid. On real data, they obtain an RMS error of 10.9 μV (mean), 8.5 μV (median) and 9.0 μV (standard deviation) per heartbeat. Cubic spline interpolation and linear interpolation on the other hand shows 10.7 μV, 11.6 μV (mean), 7.8 μV, 8.9 μV (median) and 9.8 μV, 9.3 μV (standard deviation) per heartbeat. PMID:27382478
Computationally efficient real-time interpolation algorithm for non-uniform sampled biosignals.

PubMed

Guven, Onur; Eftekhar, Amir; Kindt, Wilko; Constandinou, Timothy G

2016-06-01

This Letter presents a novel, computationally efficient interpolation method that has been optimised for use in electrocardiogram baseline drift removal. In the authors' previous Letter three isoelectric baseline points per heartbeat are detected, and here utilised as interpolation points. As an extension from linear interpolation, their algorithm segments the interpolation interval and utilises different piecewise linear equations. Thus, the algorithm produces a linear curvature that is computationally efficient while interpolating non-uniform samples. The proposed algorithm is tested using sinusoids with different fundamental frequencies from 0.05 to 0.7 Hz and also validated with real baseline wander data acquired from the Massachusetts Institute of Technology University and Boston's Beth Israel Hospital (MIT-BIH) Noise Stress Database. The synthetic data results show an root mean square (RMS) error of 0.9 μV (mean), 0.63 μV (median) and 0.6 μV (standard deviation) per heartbeat on a 1 mVp-p 0.1 Hz sinusoid. On real data, they obtain an RMS error of 10.9 μV (mean), 8.5 μV (median) and 9.0 μV (standard deviation) per heartbeat. Cubic spline interpolation and linear interpolation on the other hand shows 10.7 μV, 11.6 μV (mean), 7.8 μV, 8.9 μV (median) and 9.8 μV, 9.3 μV (standard deviation) per heartbeat.
A deconvolution extraction method for 2D multi-object fibre spectroscopy based on the regularized least-squares QR-factorization algorithm

NASA Astrophysics Data System (ADS)

Yu, Jian; Yin, Qian; Guo, Ping; Luo, A.-li

2014-09-01

This paper presents an efficient method for the extraction of astronomical spectra from two-dimensional (2D) multifibre spectrographs based on the regularized least-squares QR-factorization (LSQR) algorithm. We address two issues: we propose a modified Gaussian point spread function (PSF) for modelling the 2D PSF from multi-emission-line gas-discharge lamp images (arc images), and we develop an efficient deconvolution method to extract spectra in real circumstances. The proposed modified 2D Gaussian PSF model can fit various types of 2D PSFs, including different radial distortion angles and ellipticities. We adopt the regularized LSQR algorithm to solve the sparse linear equations constructed from the sparse convolution matrix, which we designate the deconvolution spectrum extraction method. Furthermore, we implement a parallelized LSQR algorithm based on graphics processing unit programming in the Compute Unified Device Architecture to accelerate the computational processing. Experimental results illustrate that the proposed extraction method can greatly reduce the computational cost and memory use of the deconvolution method and, consequently, increase its efficiency and practicability. In addition, the proposed extraction method has a stronger noise tolerance than other methods, such as the boxcar (aperture) extraction and profile extraction methods. Finally, we present an analysis of the sensitivity of the extraction results to the radius and full width at half-maximum of the 2D PSF.
Stochastic Evolutionary Algorithms for Planning Robot Paths

NASA Technical Reports Server (NTRS)

Fink, Wolfgang; Aghazarian, Hrand; Huntsberger, Terrance; Terrile, Richard

2006-01-01

A computer program implements stochastic evolutionary algorithms for planning and optimizing collision-free paths for robots and their jointed limbs. Stochastic evolutionary algorithms can be made to produce acceptably close approximations to exact, optimal solutions for path-planning problems while often demanding much less computation than do exhaustive-search and deterministic inverse-kinematics algorithms that have been used previously for this purpose. Hence, the present software is better suited for application aboard robots having limited computing capabilities (see figure). The stochastic aspect lies in the use of simulated annealing to (1) prevent trapping of an optimization algorithm in local minima of an energy-like error measure by which the fitness of a trial solution is evaluated while (2) ensuring that the entire multidimensional configuration and parameter space of the path-planning problem is sampled efficiently with respect to both robot joint angles and computation time. Simulated annealing is an established technique for avoiding local minima in multidimensional optimization problems, but has not, until now, been applied to planning collision-free robot paths by use of low-power computers.
Vectorization of transport and diffusion computations on the CDC Cyber 205

DOE Office of Scientific and Technical Information (OSTI.GOV)

Abu-Shumays, I.K.

1986-01-01

The development and testing of alternative numerical methods and computational algorithms specifically designed for the vectorization of transport and diffusion computations on a Control Data Corporation (CDC) Cyber 205 vector computer are described. Two solution methods for the discrete ordinates approximation to the transport equation are summarized and compared. Factors of 4 to 7 reduction in run times for certain large transport problems were achieved on a Cyber 205 as compared with run times on a CDC-7600. The solution of tridiagonal systems of linear equations, central to several efficient numerical methods for multidimensional diffusion computations and essential for fluid flowmore » and other physics and engineering problems, is also dealt with. Among the methods tested, a combined odd-even cyclic reduction and modified Cholesky factorization algorithm for solving linear symmetric positive definite tridiagonal systems is found to be the most effective for these systems on a Cyber 205. For large tridiagonal systems, computation with this algorithm is an order of magnitude faster on a Cyber 205 than computation with the best algorithm for tridiagonal systems on a CDC-7600.« less
The efficiency of geophysical adjoint codes generated by automatic differentiation tools

NASA Astrophysics Data System (ADS)

Vlasenko, A. V.; Köhl, A.; Stammer, D.

2016-02-01

The accuracy of numerical models that describe complex physical or chemical processes depends on the choice of model parameters. Estimating an optimal set of parameters by optimization algorithms requires knowledge of the sensitivity of the process of interest to model parameters. Typically the sensitivity computation involves differentiation of the model, which can be performed by applying algorithmic differentiation (AD) tools to the underlying numerical code. However, existing AD tools differ substantially in design, legibility and computational efficiency. In this study we show that, for geophysical data assimilation problems of varying complexity, the performance of adjoint codes generated by the existing AD tools (i) Open_AD, (ii) Tapenade, (iii) NAGWare and (iv) Transformation of Algorithms in Fortran (TAF) can be vastly different. Based on simple test problems, we evaluate the efficiency of each AD tool with respect to computational speed, accuracy of the adjoint, the efficiency of memory usage, and the capability of each AD tool to handle modern FORTRAN 90-95 elements such as structures and pointers, which are new elements that either combine groups of variables or provide aliases to memory addresses, respectively. We show that, while operator overloading tools are the only ones suitable for modern codes written in object-oriented programming languages, their computational efficiency lags behind source transformation by orders of magnitude, rendering the application of these modern tools to practical assimilation problems prohibitive. In contrast, the application of source transformation tools appears to be the most efficient choice, allowing handling even large geophysical data assimilation problems. However, they can only be applied to numerical models written in earlier generations of programming languages. Our study indicates that applying existing AD tools to realistic geophysical problems faces limitations that urgently need to be solved to allow the continuous use of AD tools for solving geophysical problems on modern computer architectures.
The Application of Multiobjective Evolutionary Algorithms to an Educational Computational Model of Science Information Processing: A Computational Experiment in Science Education

ERIC Educational Resources Information Center

Lamb, Richard L.; Firestone, Jonah B.

2017-01-01

Conflicting explanations and unrelated information in science classrooms increase cognitive load and decrease efficiency in learning. This reduced efficiency ultimately limits one's ability to solve reasoning problems in the science. In reasoning, it is the ability of students to sift through and identify critical pieces of information that is of…
Characterization of robotics parallel algorithms and mapping onto a reconfigurable SIMD machine

NASA Technical Reports Server (NTRS)

Lee, C. S. G.; Lin, C. T.

1989-01-01

The kinematics, dynamics, Jacobian, and their corresponding inverse computations are six essential problems in the control of robot manipulators. Efficient parallel algorithms for these computations are discussed and analyzed. Their characteristics are identified and a scheme on the mapping of these algorithms to a reconfigurable parallel architecture is presented. Based on the characteristics including type of parallelism, degree of parallelism, uniformity of the operations, fundamental operations, data dependencies, and communication requirement, it is shown that most of the algorithms for robotic computations possess highly regular properties and some common structures, especially the linear recursive structure. Moreover, they are well-suited to be implemented on a single-instruction-stream multiple-data-stream (SIMD) computer with reconfigurable interconnection network. The model of a reconfigurable dual network SIMD machine with internal direct feedback is introduced. A systematic procedure internal direct feedback is introduced. A systematic procedure to map these computations to the proposed machine is presented. A new scheduling problem for SIMD machines is investigated and a heuristic algorithm, called neighborhood scheduling, that reorders the processing sequence of subtasks to reduce the communication time is described. Mapping results of a benchmark algorithm are illustrated and discussed.
Consistent and efficient processing of ADCP streamflow measurements

USGS Publications Warehouse

Mueller, David S.; Constantinescu, George; Garcia, Marcelo H.; Hanes, Dan

2016-01-01

The use of Acoustic Doppler Current Profilers (ADCPs) from a moving boat is a commonly used method for measuring streamflow. Currently, the algorithms used to compute the average depth, compute edge discharge, identify invalid data, and estimate velocity and discharge for invalid data vary among manufacturers. These differences could result in different discharges being computed from identical data. Consistent computational algorithm, automated filtering, and quality assessment of ADCP streamflow measurements that are independent of the ADCP manufacturer are being developed in a software program that can process ADCP moving-boat discharge measurements independent of the ADCP used to collect the data.
Quantum machine learning.

PubMed

Biamonte, Jacob; Wittek, Peter; Pancotti, Nicola; Rebentrost, Patrick; Wiebe, Nathan; Lloyd, Seth

2017-09-13

Fuelled by increasing computer power and algorithmic advances, machine learning techniques have become powerful tools for finding patterns in data. Quantum systems produce atypical patterns that classical systems are thought not to produce efficiently, so it is reasonable to postulate that quantum computers may outperform classical computers on machine learning tasks. The field of quantum machine learning explores how to devise and implement quantum software that could enable machine learning that is faster than that of classical computers. Recent work has produced quantum algorithms that could act as the building blocks of machine learning programs, but the hardware and software challenges are still considerable.
An algorithm for automatic reduction of complex signal flow graphs

NASA Technical Reports Server (NTRS)

Young, K. R.; Hoberock, L. L.; Thompson, J. G.

1976-01-01

A computer algorithm is developed that provides efficient means to compute transmittances directly from a signal flow graph or a block diagram. Signal flow graphs are cast as directed graphs described by adjacency matrices. Nonsearch computation, designed for compilers without symbolic capability, is used to identify all arcs that are members of simple cycles for use with Mason's gain formula. The routine does not require the visual acumen of an interpreter to reduce the topology of the graph, and it is particularly useful for analyzing control systems described for computer analyses by means of interactive graphics.
Quantum machine learning

NASA Astrophysics Data System (ADS)

Biamonte, Jacob; Wittek, Peter; Pancotti, Nicola; Rebentrost, Patrick; Wiebe, Nathan; Lloyd, Seth

2017-09-01

Fuelled by increasing computer power and algorithmic advances, machine learning techniques have become powerful tools for finding patterns in data. Quantum systems produce atypical patterns that classical systems are thought not to produce efficiently, so it is reasonable to postulate that quantum computers may outperform classical computers on machine learning tasks. The field of quantum machine learning explores how to devise and implement quantum software that could enable machine learning that is faster than that of classical computers. Recent work has produced quantum algorithms that could act as the building blocks of machine learning programs, but the hardware and software challenges are still considerable.
Image preprocessing for improving computational efficiency in implementation of restoration and superresolution algorithms.

PubMed

Sundareshan, Malur K; Bhattacharjee, Supratik; Inampudi, Radhika; Pang, Ho-Yuen

2002-12-10

Computational complexity is a major impediment to the real-time implementation of image restoration and superresolution algorithms in many applications. Although powerful restoration algorithms have been developed within the past few years utilizing sophisticated mathematical machinery (based on statistical optimization and convex set theory), these algorithms are typically iterative in nature and require a sufficient number of iterations to be executed to achieve the desired resolution improvement that may be needed to meaningfully perform postprocessing image exploitation tasks in practice. Additionally, recent technological breakthroughs have facilitated novel sensor designs (focal plane arrays, for instance) that make it possible to capture megapixel imagery data at video frame rates. A major challenge in the processing of these large-format images is to complete the execution of the image processing steps within the frame capture times and to keep up with the output rate of the sensor so that all data captured by the sensor can be efficiently utilized. Consequently, development of novel methods that facilitate real-time implementation of image restoration and superresolution algorithms is of significant practical interest and is the primary focus of this study. The key to designing computationally efficient processing schemes lies in strategically introducing appropriate preprocessing steps together with the superresolution iterations to tailor optimized overall processing sequences for imagery data of specific formats. For substantiating this assertion, three distinct methods for tailoring a preprocessing filter and integrating it with the superresolution processing steps are outlined. These methods consist of a region-of-interest extraction scheme, a background-detail separation procedure, and a scene-derived information extraction step for implementing a set-theoretic restoration of the image that is less demanding in computation compared with the superresolution iterations. A quantitative evaluation of the performance of these algorithms for restoring and superresolving various imagery data captured by diffraction-limited sensing operations are also presented.
Improving Design Efficiency for Large-Scale Heterogeneous Circuits

NASA Astrophysics Data System (ADS)

Gregerson, Anthony

Despite increases in logic density, many Big Data applications must still be partitioned across multiple computing devices in order to meet their strict performance requirements. Among the most demanding of these applications is high-energy physics (HEP), which uses complex computing systems consisting of thousands of FPGAs and ASICs to process the sensor data created by experiments at particles accelerators such as the Large Hadron Collider (LHC). Designing such computing systems is challenging due to the scale of the systems, the exceptionally high-throughput and low-latency performance constraints that necessitate application-specific hardware implementations, the requirement that algorithms are efficiently partitioned across many devices, and the possible need to update the implemented algorithms during the lifetime of the system. In this work, we describe our research to develop flexible architectures for implementing such large-scale circuits on FPGAs. In particular, this work is motivated by (but not limited in scope to) high-energy physics algorithms for the Compact Muon Solenoid (CMS) experiment at the LHC. To make efficient use of logic resources in multi-FPGA systems, we introduce Multi-Personality Partitioning, a novel form of the graph partitioning problem, and present partitioning algorithms that can significantly improve resource utilization on heterogeneous devices while also reducing inter-chip connections. To reduce the high communication costs of Big Data applications, we also introduce Information-Aware Partitioning, a partitioning method that analyzes the data content of application-specific circuits, characterizes their entropy, and selects circuit partitions that enable efficient compression of data between chips. We employ our information-aware partitioning method to improve the performance of the hardware validation platform for evaluating new algorithms for the CMS experiment. Together, these research efforts help to improve the efficiency and decrease the cost of the developing large-scale, heterogeneous circuits needed to enable large-scale application in high-energy physics and other important areas.
Efficient Algorithms for Estimating the Absorption Spectrum within Linear Response TDDFT

DOE Office of Scientific and Technical Information (OSTI.GOV)

Brabec, Jiri; Lin, Lin; Shao, Meiyue

We present two iterative algorithms for approximating the absorption spectrum of molecules within linear response of time-dependent density functional theory (TDDFT) framework. These methods do not attempt to compute eigenvalues or eigenvectors of the linear response matrix. They are designed to approximate the absorption spectrum as a function directly. They take advantage of the special structure of the linear response matrix. Neither method requires the linear response matrix to be constructed explicitly. They only require a procedure that performs the multiplication of the linear response matrix with a vector. These methods can also be easily modified to efficiently estimate themore » density of states (DOS) of the linear response matrix without computing the eigenvalues of this matrix. We show by computational experiments that the methods proposed in this paper can be much more efficient than methods that are based on the exact diagonalization of the linear response matrix. We show that they can also be more efficient than real-time TDDFT simulations. We compare the pros and cons of these methods in terms of their accuracy as well as their computational and storage cost.« less
A novel clinical decision support algorithm for constructing complete medication histories.

PubMed

Long, Ju; Yuan, Michael Juntao

2017-07-01

A patient's complete medication history is a crucial element for physicians to develop a full understanding of the patient's medical conditions and treatment options. However, due to the fragmented nature of medical data, this process can be very time-consuming and often impossible for physicians to construct a complete medication history for complex patients. In this paper, we describe an accurate, computationally efficient and scalable algorithm to construct a medication history timeline. The algorithm is developed and validated based on 1 million random prescription records from a large national prescription data aggregator. Our evaluation shows that the algorithm can be scaled horizontally on-demand, making it suitable for future delivery in a cloud-computing environment. We also propose that this cloud-based medication history computation algorithm could be integrated into Electronic Medical Records, enabling informed clinical decision-making at the point of care. Copyright © 2017 Elsevier B.V. All rights reserved.
Scalable software-defined optical networking with high-performance routing and wavelength assignment algorithms.

PubMed

Lee, Chankyun; Cao, Xiaoyuan; Yoshikane, Noboru; Tsuritani, Takehiro; Rhee, June-Koo Kevin

2015-10-19

The feasibility of software-defined optical networking (SDON) for a practical application critically depends on scalability of centralized control performance. The paper, highly scalable routing and wavelength assignment (RWA) algorithms are investigated on an OpenFlow-based SDON testbed for proof-of-concept demonstration. Efficient RWA algorithms are proposed to achieve high performance in achieving network capacity with reduced computation cost, which is a significant attribute in a scalable centralized-control SDON. The proposed heuristic RWA algorithms differ in the orders of request processes and in the procedures of routing table updates. Combined in a shortest-path-based routing algorithm, a hottest-request-first processing policy that considers demand intensity and end-to-end distance information offers both the highest throughput of networks and acceptable computation scalability. We further investigate trade-off relationship between network throughput and computation complexity in routing table update procedure by a simulation study.

SAGE: The Self-Adaptive Grid Code. 3

NASA Technical Reports Server (NTRS)

Davies, Carol B.; Venkatapathy, Ethiraj

1999-01-01

The multi-dimensional self-adaptive grid code, SAGE, is an important tool in the field of computational fluid dynamics (CFD). It provides an efficient method to improve the accuracy of flow solutions while simultaneously reducing computer processing time. Briefly, SAGE enhances an initial computational grid by redistributing the mesh points into more appropriate locations. The movement of these points is driven by an equal-error-distribution algorithm that utilizes the relationship between high flow gradients and excessive solution errors. The method also provides a balance between clustering points in the high gradient regions and maintaining the smoothness and continuity of the adapted grid, The latest version, Version 3, includes the ability to change the boundaries of a given grid to more efficiently enclose flow structures and provides alternative redistribution algorithms.
Symmetry-conserving purification of quantum states within the density matrix renormalization group

DOE PAGES

Nocera, Alberto; Alvarez, Gonzalo

2016-01-28

The density matrix renormalization group (DMRG) algorithm was originally designed to efficiently compute the zero-temperature or ground-state properties of one-dimensional strongly correlated quantum systems. The development of the algorithm at finite temperature has been a topic of much interest, because of the usefulness of thermodynamics quantities in understanding the physics of condensed matter systems, and because of the increased complexity associated with efficiently computing temperature-dependent properties. The ancilla method is a DMRG technique that enables the computation of these thermodynamic quantities. In this paper, we review the ancilla method, and improve its performance by working on reduced Hilbert spaces andmore » using canonical approaches. Furthermore we explore its applicability beyond spins systems to t-J and Hubbard models.« less
A computational approach for hypersonic nonequilibrium radiation utilizing space partition algorithm and Gauss quadrature

NASA Astrophysics Data System (ADS)

Shang, J. S.; Andrienko, D. A.; Huang, P. G.; Surzhikov, S. T.

2014-06-01

An efficient computational capability for nonequilibrium radiation simulation via the ray tracing technique has been accomplished. The radiative rate equation is iteratively coupled with the aerodynamic conservation laws including nonequilibrium chemical and chemical-physical kinetic models. The spectral properties along tracing rays are determined by a space partition algorithm of the nearest neighbor search process, and the numerical accuracy is further enhanced by a local resolution refinement using the Gauss-Lobatto polynomial. The interdisciplinary governing equations are solved by an implicit delta formulation through the diminishing residual approach. The axisymmetric radiating flow fields over the reentry RAM-CII probe have been simulated and verified with flight data and previous solutions by traditional methods. A computational efficiency gain nearly forty times is realized over that of the existing simulation procedures.
The explicit computation of integration algorithms and first integrals for ordinary differential equations with polynomials coefficients using trees

NASA Technical Reports Server (NTRS)

Crouch, P. E.; Grossman, Robert

1992-01-01

This note is concerned with the explicit symbolic computation of expressions involving differential operators and their actions on functions. The derivation of specialized numerical algorithms, the explicit symbolic computation of integrals of motion, and the explicit computation of normal forms for nonlinear systems all require such computations. More precisely, if R = k(x(sub 1),...,x(sub N)), where k = R or C, F denotes a differential operator with coefficients from R, and g member of R, we describe data structures and algorithms for efficiently computing g. The basic idea is to impose a multiplicative structure on the vector space with basis the set of finite rooted trees and whose nodes are labeled with the coefficients of the differential operators. Cancellations of two trees with r + 1 nodes translates into cancellation of O(N(exp r)) expressions involving the coefficient functions and their derivatives.
Computing the Envelope for Stepwise-Constant Resource Allocations

NASA Technical Reports Server (NTRS)

Muscettola, Nicola; Clancy, Daniel (Technical Monitor)

2002-01-01

Computing tight resource-level bounds is a fundamental problem in the construction of flexible plans with resource utilization. In this paper we describe an efficient algorithm that builds a resource envelope, the tightest possible such bound. The algorithm is based on transforming the temporal network of resource consuming and producing events into a flow network with nodes equal to the events and edges equal to the necessary predecessor links between events. A staged maximum flow problem on the network is then used to compute the time of occurrence and the height of each step of the resource envelope profile. Each stage has the same computational complexity of solving a maximum flow problem on the entire flow network. This makes this method computationally feasible and promising for use in the inner loop of flexible-time scheduling algorithms.
A multiresolution approach to iterative reconstruction algorithms in X-ray computed tomography.

PubMed

De Witte, Yoni; Vlassenbroeck, Jelle; Van Hoorebeke, Luc

2010-09-01

In computed tomography, the application of iterative reconstruction methods in practical situations is impeded by their high computational demands. Especially in high resolution X-ray computed tomography, where reconstruction volumes contain a high number of volume elements (several giga voxels), this computational burden prevents their actual breakthrough. Besides the large amount of calculations, iterative algorithms require the entire volume to be kept in memory during reconstruction, which quickly becomes cumbersome for large data sets. To overcome this obstacle, we present a novel multiresolution reconstruction, which greatly reduces the required amount of memory without significantly affecting the reconstructed image quality. It is shown that, combined with an efficient implementation on a graphical processing unit, the multiresolution approach enables the application of iterative algorithms in the reconstruction of large volumes at an acceptable speed using only limited resources.
Monte Carlo simulation of Ising models by multispin coding on a vector computer

NASA Astrophysics Data System (ADS)

Wansleben, Stephan; Zabolitzky, John G.; Kalle, Claus

1984-11-01

Rebbi's efficient multispin coding algorithm for Ising models is combined with the use of the vector computer CDC Cyber 205. A speed of 21.2 million updates per second is reached. This is comparable to that obtained by special- purpose computers.
New Information Dispersal Techniques for Trustworthy Computing

ERIC Educational Resources Information Center

Parakh, Abhishek

2011-01-01

Information dispersal algorithms (IDA) are used for distributed data storage because they simultaneously provide security, reliability and space efficiency, constituting a trustworthy computing framework for many critical applications, such as cloud computing, in the information society. In the most general sense, this is achieved by dividing data…
Parallel deterministic transport sweeps of structured and unstructured meshes with overloaded mesh decompositions

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pautz, Shawn D.; Bailey, Teresa S.

Here, the efficiency of discrete ordinates transport sweeps depends on the scheduling algorithm, the domain decomposition, the problem to be solved, and the computational platform. Sweep scheduling algorithms may be categorized by their approach to several issues. In this paper we examine the strategy of domain overloading for mesh partitioning as one of the components of such algorithms. In particular, we extend the domain overloading strategy, previously defined and analyzed for structured meshes, to the general case of unstructured meshes. We also present computational results for both the structured and unstructured domain overloading cases. We find that an appropriate amountmore » of domain overloading can greatly improve the efficiency of parallel sweeps for both structured and unstructured partitionings of the test problems examined on up to 10 5 processor cores.« less
Parallel deterministic transport sweeps of structured and unstructured meshes with overloaded mesh decompositions

DOE PAGES

Pautz, Shawn D.; Bailey, Teresa S.

2016-11-29

Here, the efficiency of discrete ordinates transport sweeps depends on the scheduling algorithm, the domain decomposition, the problem to be solved, and the computational platform. Sweep scheduling algorithms may be categorized by their approach to several issues. In this paper we examine the strategy of domain overloading for mesh partitioning as one of the components of such algorithms. In particular, we extend the domain overloading strategy, previously defined and analyzed for structured meshes, to the general case of unstructured meshes. We also present computational results for both the structured and unstructured domain overloading cases. We find that an appropriate amountmore » of domain overloading can greatly improve the efficiency of parallel sweeps for both structured and unstructured partitionings of the test problems examined on up to 10 5 processor cores.« less
Fast computation algorithms for speckle pattern simulation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nascov, Victor; Samoilă, Cornel; Ursuţiu, Doru

2013-11-13

We present our development of a series of efficient computation algorithms, generally usable to calculate light diffraction and particularly for speckle pattern simulation. We use mainly the scalar diffraction theory in the form of Rayleigh-Sommerfeld diffraction formula and its Fresnel approximation. Our algorithms are based on a special form of the convolution theorem and the Fast Fourier Transform. They are able to evaluate the diffraction formula much faster than by direct computation and we have circumvented the restrictions regarding the relative sizes of the input and output domains, met on commonly used procedures. Moreover, the input and output planes canmore » be tilted each to other and the output domain can be off-axis shifted.« less
Hybrid-dual-fourier tomographic algorithm for a fast three-dimensionial optical image reconstruction in turbid media

NASA Technical Reports Server (NTRS)

Alfano, Robert R. (Inventor); Cai, Wei (Inventor)

2007-01-01

A reconstruction technique for reducing computation burden in the 3D image processes, wherein the reconstruction procedure comprises an inverse and a forward model. The inverse model uses a hybrid dual Fourier algorithm that combines a 2D Fourier inversion with a 1D matrix inversion to thereby provide high-speed inverse computations. The inverse algorithm uses a hybrid transfer to provide fast Fourier inversion for data of multiple sources and multiple detectors. The forward model is based on an analytical cumulant solution of a radiative transfer equation. The accurate analytical form of the solution to the radiative transfer equation provides an efficient formalism for fast computation of the forward model.
Exact consideration of data redundancies for spiral cone-beam CT

NASA Astrophysics Data System (ADS)

Lauritsch, Guenter; Katsevich, Alexander; Hirsch, Michael

2004-05-01

In multi-slice spiral computed tomography (CT) there is an obvious trend in adding more and more detector rows. The goals are numerous: volume coverage, isotropic spatial resolution, and speed. Consequently, there will be a variety of scan protocols optimizing clinical applications. Flexibility in table feed requires consideration of data redundancies to ensure efficient detector usage. Until recently this was achieved by approximate reconstruction algorithms only. However, due to the increasing cone angles there is a need of exact treatment of the cone beam geometry. A new, exact and efficient 3-PI algorithm for considering three-fold data redundancies was derived from a general, theoretical framework based on 3D Radon inversion using Grangeat's formula. The 3-PI algorithm possesses a simple and efficient structure as the 1-PI method for non-redundant data previously proposed. Filtering is one-dimensional, performed along lines with variable tilt on the detector. This talk deals with a thorough evaluation of the performance of the 3-PI algorithm in comparison to the 1-PI method. Image quality of the 3-PI algorithm is superior. The prominent spiral artifacts and other discretization artifacts are significantly reduced due to averaging effects when taking into account redundant data. Certainly signal-to-noise ratio is increased. The computational expense is comparable even to that of approximate algorithms. The 3-PI algorithm proves its practicability for applications in medical imaging. Other exact n-PI methods for n-fold data redundancies (n odd) can be deduced from the general, theoretical framework.
Seeding the initial population with feasible solutions in metaheuristic optimization of steel trusses

NASA Astrophysics Data System (ADS)

Kazemzadeh Azad, Saeid

2018-01-01

In spite of considerable research work on the development of efficient algorithms for discrete sizing optimization of steel truss structures, only a few studies have addressed non-algorithmic issues affecting the general performance of algorithms. For instance, an important question is whether starting the design optimization from a feasible solution is fruitful or not. This study is an attempt to investigate the effect of seeding the initial population with feasible solutions on the general performance of metaheuristic techniques. To this end, the sensitivity of recently proposed metaheuristic algorithms to the feasibility of initial candidate designs is evaluated through practical discrete sizing of real-size steel truss structures. The numerical experiments indicate that seeding the initial population with feasible solutions can improve the computational efficiency of metaheuristic structural optimization algorithms, especially in the early stages of the optimization. This paves the way for efficient metaheuristic optimization of large-scale structural systems.
Efficient Boundary Extraction of BSP Solids Based on Clipping Operations.

PubMed

Wang, Charlie C L; Manocha, Dinesh

2013-01-01

We present an efficient algorithm to extract the manifold surface that approximates the boundary of a solid represented by a Binary Space Partition (BSP) tree. Our polygonization algorithm repeatedly performs clipping operations on volumetric cells that correspond to a spatial convex partition and computes the boundary by traversing the connected cells. We use point-based representations along with finite-precision arithmetic to improve the efficiency and generate the B-rep approximation of a BSP solid. The core of our polygonization method is a novel clipping algorithm that uses a set of logical operations to make it resistant to degeneracies resulting from limited precision of floating-point arithmetic. The overall BSP to B-rep conversion algorithm can accurately generate boundaries with sharp and small features, and is faster than prior methods. At the end of this paper, we use this algorithm for a few geometric processing applications including Boolean operations, model repair, and mesh reconstruction.
A comparison of select image-compression algorithms for an electronic still camera

NASA Technical Reports Server (NTRS)

Nerheim, Rosalee

1989-01-01

This effort is a study of image-compression algorithms for an electronic still camera. An electronic still camera can record and transmit high-quality images without the use of film, because images are stored digitally in computer memory. However, high-resolution images contain an enormous amount of information, and will strain the camera's data-storage system. Image compression will allow more images to be stored in the camera's memory. For the electronic still camera, a compression algorithm that produces a reconstructed image of high fidelity is most important. Efficiency of the algorithm is the second priority. High fidelity and efficiency are more important than a high compression ratio. Several algorithms were chosen for this study and judged on fidelity, efficiency and compression ratio. The transform method appears to be the best choice. At present, the method is compressing images to a ratio of 5.3:1 and producing high-fidelity reconstructed images.
A fast and accurate frequency estimation algorithm for sinusoidal signal with harmonic components

NASA Astrophysics Data System (ADS)

Hu, Jinghua; Pan, Mengchun; Zeng, Zhidun; Hu, Jiafei; Chen, Dixiang; Tian, Wugang; Zhao, Jianqiang; Du, Qingfa

2016-10-01

Frequency estimation is a fundamental problem in many applications, such as traditional vibration measurement, power system supervision, and microelectromechanical system sensors control. In this paper, a fast and accurate frequency estimation algorithm is proposed to deal with low efficiency problem in traditional methods. The proposed algorithm consists of coarse and fine frequency estimation steps, and we demonstrate that it is more efficient than conventional searching methods to achieve coarse frequency estimation (location peak of FFT amplitude) by applying modified zero-crossing technique. Thus, the proposed estimation algorithm requires less hardware and software sources and can achieve even higher efficiency when the experimental data increase. Experimental results with modulated magnetic signal show that the root mean square error of frequency estimation is below 0.032 Hz with the proposed algorithm, which has lower computational complexity and better global performance than conventional frequency estimation methods.
Subspace algorithms for identifying separable-in-denominator 2D systems with deterministic-stochastic inputs

NASA Astrophysics Data System (ADS)

Ramos, José A.; Mercère, Guillaume

2016-12-01

In this paper, we present an algorithm for identifying two-dimensional (2D) causal, recursive and separable-in-denominator (CRSD) state-space models in the Roesser form with deterministic-stochastic inputs. The algorithm implements the N4SID, PO-MOESP and CCA methods, which are well known in the literature on 1D system identification, but here we do so for the 2D CRSD Roesser model. The algorithm solves the 2D system identification problem by maintaining the constraint structure imposed by the problem (i.e. Toeplitz and Hankel) and computes the horizontal and vertical system orders, system parameter matrices and covariance matrices of a 2D CRSD Roesser model. From a computational point of view, the algorithm has been presented in a unified framework, where the user can select which of the three methods to use. Furthermore, the identification task is divided into three main parts: (1) computing the deterministic horizontal model parameters, (2) computing the deterministic vertical model parameters and (3) computing the stochastic components. Specific attention has been paid to the computation of a stabilised Kalman gain matrix and a positive real solution when required. The efficiency and robustness of the unified algorithm have been demonstrated via a thorough simulation example.
Novel techniques for data decomposition and load balancing for parallel processing of vision systems: Implementation and evaluation using a motion estimation system

NASA Technical Reports Server (NTRS)

Choudhary, Alok Nidhi; Leung, Mun K.; Huang, Thomas S.; Patel, Janak H.

1989-01-01

Computer vision systems employ a sequence of vision algorithms in which the output of an algorithm is the input of the next algorithm in the sequence. Algorithms that constitute such systems exhibit vastly different computational characteristics, and therefore, require different data decomposition techniques and efficient load balancing techniques for parallel implementation. However, since the input data for a task is produced as the output data of the previous task, this information can be exploited to perform knowledge based data decomposition and load balancing. Presented here are algorithms for a motion estimation system. The motion estimation is based on the point correspondence between the involved images which are a sequence of stereo image pairs. Researchers propose algorithms to obtain point correspondences by matching feature points among stereo image pairs at any two consecutive time instants. Furthermore, the proposed algorithms employ non-iterative procedures, which results in saving considerable amounts of computation time. The system consists of the following steps: (1) extraction of features; (2) stereo match of images in one time instant; (3) time match of images from consecutive time instants; (4) stereo match to compute final unambiguous points; and (5) computation of motion parameters.
TU-AB-303-08: GPU-Based Software Platform for Efficient Image-Guided Adaptive Radiation Therapy

DOE Office of Scientific and Technical Information (OSTI.GOV)

Park, S; Robinson, A; McNutt, T

2015-06-15

Purpose: In this study, we develop an integrated software platform for adaptive radiation therapy (ART) that combines fast and accurate image registration, segmentation, and dose computation/accumulation methods. Methods: The proposed system consists of three key components; 1) deformable image registration (DIR), 2) automatic segmentation, and 3) dose computation/accumulation. The computationally intensive modules including DIR and dose computation have been implemented on a graphics processing unit (GPU). All required patient-specific data including the planning CT (pCT) with contours, daily cone-beam CTs, and treatment plan are automatically queried and retrieved from their own databases. To improve the accuracy of DIR between pCTmore » and CBCTs, we use the double force demons DIR algorithm in combination with iterative CBCT intensity correction by local intensity histogram matching. Segmentation of daily CBCT is then obtained by propagating contours from the pCT. Daily dose delivered to the patient is computed on the registered pCT by a GPU-accelerated superposition/convolution algorithm. Finally, computed daily doses are accumulated to show the total delivered dose to date. Results: Since the accuracy of DIR critically affects the quality of the other processes, we first evaluated our DIR method on eight head-and-neck cancer cases and compared its performance. Normalized mutual-information (NMI) and normalized cross-correlation (NCC) computed as similarity measures, and our method produced overall NMI of 0.663 and NCC of 0.987, outperforming conventional methods by 3.8% and 1.9%, respectively. Experimental results show that our registration method is more consistent and roust than existing algorithms, and also computationally efficient. Computation time at each fraction took around one minute (30–50 seconds for registration and 15–25 seconds for dose computation). Conclusion: We developed an integrated GPU-accelerated software platform that enables accurate and efficient DIR, auto-segmentation, and dose computation, thus supporting an efficient ART workflow. This work was supported by NIH/NCI under grant R42CA137886.« less

Efficient Pricing Technique for Resource Allocation Problem in Downlink OFDM Cognitive Radio Networks

NASA Astrophysics Data System (ADS)

Abdulghafoor, O. B.; Shaat, M. M. R.; Ismail, M.; Nordin, R.; Yuwono, T.; Alwahedy, O. N. A.

2017-05-01

In this paper, the problem of resource allocation in OFDM-based downlink cognitive radio (CR) networks has been proposed. The purpose of this research is to decrease the computational complexity of the resource allocation algorithm for downlink CR network while concerning the interference constraint of primary network. The objective has been secured by adopting pricing scheme to develop power allocation algorithm with the following concerns: (i) reducing the complexity of the proposed algorithm and (ii) providing firm power control to the interference introduced to primary users (PUs). The performance of the proposed algorithm is tested for OFDM- CRNs. The simulation results show that the performance of the proposed algorithm approached the performance of the optimal algorithm at a lower computational complexity, i.e., O(NlogN), which makes the proposed algorithm suitable for more practical applications.
Optimal Energy Efficiency Fairness of Nodes in Wireless Powered Communication Networks.

PubMed

Zhang, Jing; Zhou, Qingjie; Ng, Derrick Wing Kwan; Jo, Minho

2017-09-15

In wireless powered communication networks (WPCNs), it is essential to research energy efficiency fairness in order to evaluate the balance of nodes for receiving information and harvesting energy. In this paper, we propose an efficient iterative algorithm for optimal energy efficiency proportional fairness in WPCN. The main idea is to use stochastic geometry to derive the mean proportionally fairness utility function with respect to user association probability and receive threshold. Subsequently, we prove that the relaxed proportionally fairness utility function is a concave function for user association probability and receive threshold, respectively. At the same time, a sub-optimal algorithm by exploiting alternating optimization approach is proposed. Through numerical simulations, we demonstrate that our sub-optimal algorithm can obtain a result close to optimal energy efficiency proportional fairness with significant reduction of computational complexity.
Optimal Energy Efficiency Fairness of Nodes in Wireless Powered Communication Networks

PubMed Central

Zhou, Qingjie; Ng, Derrick Wing Kwan; Jo, Minho

2017-01-01

In wireless powered communication networks (WPCNs), it is essential to research energy efficiency fairness in order to evaluate the balance of nodes for receiving information and harvesting energy. In this paper, we propose an efficient iterative algorithm for optimal energy efficiency proportional fairness in WPCN. The main idea is to use stochastic geometry to derive the mean proportionally fairness utility function with respect to user association probability and receive threshold. Subsequently, we prove that the relaxed proportionally fairness utility function is a concave function for user association probability and receive threshold, respectively. At the same time, a sub-optimal algorithm by exploiting alternating optimization approach is proposed. Through numerical simulations, we demonstrate that our sub-optimal algorithm can obtain a result close to optimal energy efficiency proportional fairness with significant reduction of computational complexity. PMID:28914818
On some methods for improving time of reachability sets computation for the dynamic system control problem

NASA Astrophysics Data System (ADS)

Zimovets, Artem; Matviychuk, Alexander; Ushakov, Vladimir

2016-12-01

The paper presents two different approaches to reduce the time of computer calculation of reachability sets. First of these two approaches use different data structures for storing the reachability sets in the computer memory for calculation in single-threaded mode. Second approach is based on using parallel algorithms with reference to the data structures from the first approach. Within the framework of this paper parallel algorithm of approximate reachability set calculation on computer with SMP-architecture is proposed. The results of numerical modelling are presented in the form of tables which demonstrate high efficiency of parallel computing technology and also show how computing time depends on the used data structure.
MDTri: robust and efficient global mixed integer search of spaces of multiple ternary alloys: A DIRECT-inspired optimization algorithm for experimentally accessible computational material design

DOE PAGES

Graf, Peter A.; Billups, Stephen

2017-07-24

Computational materials design has suffered from a lack of algorithms formulated in terms of experimentally accessible variables. Here we formulate the problem of (ternary) alloy optimization at the level of choice of atoms and their composition that is normal for synthesists. Mathematically, this is a mixed integer problem where a candidate solution consists of a choice of three elements, and how much of each of them to use. This space has the natural structure of a set of equilateral triangles. We solve this problem by introducing a novel version of the DIRECT algorithm that (1) operates on equilateral triangles insteadmore » of rectangles and (2) works across multiple triangles. We demonstrate on a test case that the algorithm is both robust and efficient. Lastly, we offer an explanation of the efficacy of DIRECT -- specifically, its balance of global and local search -- by showing that 'potentially optimal rectangles' of the original algorithm are akin to the Pareto front of the 'multi-component optimization' of global and local search.« less
An improved shuffled frog leaping algorithm based evolutionary framework for currency exchange rate prediction

NASA Astrophysics Data System (ADS)

Dash, Rajashree

2017-11-01

Forecasting purchasing power of one currency with respect to another currency is always an interesting topic in the field of financial time series prediction. Despite the existence of several traditional and computational models for currency exchange rate forecasting, there is always a need for developing simpler and more efficient model, which will produce better prediction capability. In this paper, an evolutionary framework is proposed by using an improved shuffled frog leaping (ISFL) algorithm with a computationally efficient functional link artificial neural network (CEFLANN) for prediction of currency exchange rate. The model is validated by observing the monthly prediction measures obtained for three currency exchange data sets such as USD/CAD, USD/CHF, and USD/JPY accumulated within same period of time. The model performance is also compared with two other evolutionary learning techniques such as Shuffled frog leaping algorithm and Particle Swarm optimization algorithm. Practical analysis of results suggest that, the proposed model developed using the ISFL algorithm with CEFLANN network is a promising predictor model for currency exchange rate prediction compared to other models included in the study.
Hybrid Artificial Root Foraging Optimizer Based Multilevel Threshold for Image Segmentation

PubMed Central

Liu, Yang; Liu, Junfei

2016-01-01

This paper proposes a new plant-inspired optimization algorithm for multilevel threshold image segmentation, namely, hybrid artificial root foraging optimizer (HARFO), which essentially mimics the iterative root foraging behaviors. In this algorithm the new growth operators of branching, regrowing, and shrinkage are initially designed to optimize continuous space search by combining root-to-root communication and coevolution mechanism. With the auxin-regulated scheme, various root growth operators are guided systematically. With root-to-root communication, individuals exchange information in different efficient topologies, which essentially improve the exploration ability. With coevolution mechanism, the hierarchical spatial population driven by evolutionary pressure of multiple subpopulations is structured, which ensure that the diversity of root population is well maintained. The comparative results on a suit of benchmarks show the superiority of the proposed algorithm. Finally, the proposed HARFO algorithm is applied to handle the complex image segmentation problem based on multilevel threshold. Computational results of this approach on a set of tested images show the outperformance of the proposed algorithm in terms of optimization accuracy computation efficiency. PMID:27725826
Hybrid Artificial Root Foraging Optimizer Based Multilevel Threshold for Image Segmentation.

PubMed

Liu, Yang; Liu, Junfei; Tian, Liwei; Ma, Lianbo

2016-01-01

This paper proposes a new plant-inspired optimization algorithm for multilevel threshold image segmentation, namely, hybrid artificial root foraging optimizer (HARFO), which essentially mimics the iterative root foraging behaviors. In this algorithm the new growth operators of branching, regrowing, and shrinkage are initially designed to optimize continuous space search by combining root-to-root communication and coevolution mechanism. With the auxin-regulated scheme, various root growth operators are guided systematically. With root-to-root communication, individuals exchange information in different efficient topologies, which essentially improve the exploration ability. With coevolution mechanism, the hierarchical spatial population driven by evolutionary pressure of multiple subpopulations is structured, which ensure that the diversity of root population is well maintained. The comparative results on a suit of benchmarks show the superiority of the proposed algorithm. Finally, the proposed HARFO algorithm is applied to handle the complex image segmentation problem based on multilevel threshold. Computational results of this approach on a set of tested images show the outperformance of the proposed algorithm in terms of optimization accuracy computation efficiency.
MDTri: robust and efficient global mixed integer search of spaces of multiple ternary alloys: A DIRECT-inspired optimization algorithm for experimentally accessible computational material design

DOE Office of Scientific and Technical Information (OSTI.GOV)

Graf, Peter A.; Billups, Stephen

Computational materials design has suffered from a lack of algorithms formulated in terms of experimentally accessible variables. Here we formulate the problem of (ternary) alloy optimization at the level of choice of atoms and their composition that is normal for synthesists. Mathematically, this is a mixed integer problem where a candidate solution consists of a choice of three elements, and how much of each of them to use. This space has the natural structure of a set of equilateral triangles. We solve this problem by introducing a novel version of the DIRECT algorithm that (1) operates on equilateral triangles insteadmore » of rectangles and (2) works across multiple triangles. We demonstrate on a test case that the algorithm is both robust and efficient. Lastly, we offer an explanation of the efficacy of DIRECT -- specifically, its balance of global and local search -- by showing that 'potentially optimal rectangles' of the original algorithm are akin to the Pareto front of the 'multi-component optimization' of global and local search.« less
A Sparse Self-Consistent Field Algorithm and Its Parallel Implementation: Application to Density-Functional-Based Tight Binding.

PubMed

Scemama, Anthony; Renon, Nicolas; Rapacioli, Mathias

2014-06-10

We present an algorithm and its parallel implementation for solving a self-consistent problem as encountered in Hartree-Fock or density functional theory. The algorithm takes advantage of the sparsity of matrices through the use of local molecular orbitals. The implementation allows one to exploit efficiently modern symmetric multiprocessing (SMP) computer architectures. As a first application, the algorithm is used within the density-functional-based tight binding method, for which most of the computational time is spent in the linear algebra routines (diagonalization of the Fock/Kohn-Sham matrix). We show that with this algorithm (i) single point calculations on very large systems (millions of atoms) can be performed on large SMP machines, (ii) calculations involving intermediate size systems (1000-100 000 atoms) are also strongly accelerated and can run efficiently on standard servers, and (iii) the error on the total energy due to the use of a cutoff in the molecular orbital coefficients can be controlled such that it remains smaller than the SCF convergence criterion.
An efficient tensor transpose algorithm for multicore CPU, Intel Xeon Phi, and NVidia Tesla GPU

NASA Astrophysics Data System (ADS)

Lyakh, Dmitry I.

2015-04-01

An efficient parallel tensor transpose algorithm is suggested for shared-memory computing units, namely, multicore CPU, Intel Xeon Phi, and NVidia GPU. The algorithm operates on dense tensors (multidimensional arrays) and is based on the optimization of cache utilization on x86 CPU and the use of shared memory on NVidia GPU. From the applied side, the ultimate goal is to minimize the overhead encountered in the transformation of tensor contractions into matrix multiplications in computer implementations of advanced methods of quantum many-body theory (e.g., in electronic structure theory and nuclear physics). A particular accent is made on higher-dimensional tensors that typically appear in the so-called multireference correlated methods of electronic structure theory. Depending on tensor dimensionality, the presented optimized algorithms can achieve an order of magnitude speedup on x86 CPUs and 2-3 times speedup on NVidia Tesla K20X GPU with respect to the naïve scattering algorithm (no memory access optimization). The tensor transpose routines developed in this work have been incorporated into a general-purpose tensor algebra library (TAL-SH).
A Programmable Five Qubit Quantum Computer Using Trapped Atomic Ions

NASA Astrophysics Data System (ADS)

Debnath, Shantanu

Quantum computers can solve certain problems more efficiently compared to conventional classical methods. In the endeavor to build a quantum computer, several competing platforms have emerged that can implement certain quantum algorithms using a few qubits. However, the demonstrations so far have been done usually by tailoring the hardware to meet the requirements of a particular algorithm implemented for a limited number of instances. Although such proof of principal implementations are important to verify the working of algorithms on a physical system, they further need to have the potential to serve as a general purpose quantum computer allowing the flexibility required for running multiple algorithms and be scaled up to host more qubits. Here we demonstrate a small programmable quantum computer based on five trapped atomic ions each of which serves as a qubit. By optically resolving each ion we can individually address them in order to perform a complete set of single-qubit and fully connected two-qubit quantum gates and alsoperform efficient individual qubit measurements. We implement a computation architecture that accepts an algorithm from a user interface in the form of a standard logic gate sequence and decomposes it into fundamental quantum operations that are native to the hardware using a set of compilation instructions that are defined within the software. These operations are then effected through a pattern of laser pulses that perform coherent rotations on targeted qubits in the chain. The architecture implemented in the experiment therefore gives us unprecedented flexibility in the programming of any quantum algorithm while staying blind to the underlying hardware. As a demonstration we implement the Deutsch-Jozsa and Bernstein-Vazirani algorithms on the five-qubit processor and achieve average success rates of 95 and 90 percent, respectively. We also implement a five-qubit coherent quantum Fourier transform and examine its performance in the period finding and phase estimation protocol. We find fidelities of 84 and 62 percent, respectively. While maintaining the same computation architecture the system can be scaled to more ions using resources that scale favorably (O(N. 2)) with the numberof qubits N.
An improved harmony search algorithm for emergency inspection scheduling

NASA Astrophysics Data System (ADS)

Kallioras, Nikos A.; Lagaros, Nikos D.; Karlaftis, Matthew G.

2014-11-01

The ability of nature-inspired search algorithms to efficiently handle combinatorial problems, and their successful implementation in many fields of engineering and applied sciences, have led to the development of new, improved algorithms. In this work, an improved harmony search (IHS) algorithm is presented, while a holistic approach for solving the problem of post-disaster infrastructure management is also proposed. The efficiency of IHS is compared with that of the algorithms of particle swarm optimization, differential evolution, basic harmony search and the pure random search procedure, when solving the districting problem that is the first part of post-disaster infrastructure management. The ant colony optimization algorithm is employed for solving the associated routing problem that constitutes the second part. The comparison is based on the quality of the results obtained, the computational demands and the sensitivity on the algorithmic parameters.
Near real-time, on-the-move software PED using VPEF

NASA Astrophysics Data System (ADS)

Green, Kevin; Geyer, Chris; Burnette, Chris; Agarwal, Sanjeev; Swett, Bruce; Phan, Chung; Deterline, Diane

2015-05-01

The scope of the Micro-Cloud for Operational, Vehicle-Based EO-IR Reconnaissance System (MOVERS) development effort, managed by the Night Vision and Electronic Sensors Directorate (NVESD), is to develop, integrate, and demonstrate new sensor technologies and algorithms that improve improvised device/mine detection using efficient and effective exploitation and fusion of sensor data and target cues from existing and future Route Clearance Package (RCP) sensor systems. Unfortunately, the majority of forward looking Full Motion Video (FMV) and computer vision processing, exploitation, and dissemination (PED) algorithms are often developed using proprietary, incompatible software. This makes the insertion of new algorithms difficult due to the lack of standardized processing chains. In order to overcome these limitations, EOIR developed the Government off-the-shelf (GOTS) Video Processing and Exploitation Framework (VPEF) to be able to provide standardized interfaces (e.g., input/output video formats, sensor metadata, and detected objects) for exploitation software and to rapidly integrate and test computer vision algorithms. EOIR developed a vehicle-based computing framework within the MOVERS and integrated it with VPEF. VPEF was further enhanced for automated processing, detection, and publishing of detections in near real-time, thus improving the efficiency and effectiveness of RCP sensor systems.
Quantum computation and analysis of Wigner and Husimi functions: toward a quantum image treatment.

PubMed

Terraneo, M; Georgeot, B; Shepelyansky, D L

2005-06-01

We study the efficiency of quantum algorithms which aim at obtaining phase-space distribution functions of quantum systems. Wigner and Husimi functions are considered. Different quantum algorithms are envisioned to build these functions, and compared with the classical computation. Different procedures to extract more efficiently information from the final wave function of these algorithms are studied, including coarse-grained measurements, amplitude amplification, and measure of wavelet-transformed wave function. The algorithms are analyzed and numerically tested on a complex quantum system showing different behavior depending on parameters: namely, the kicked rotator. The results for the Wigner function show in particular that the use of the quantum wavelet transform gives a polynomial gain over classical computation. For the Husimi distribution, the gain is much larger than for the Wigner function and is larger with the help of amplitude amplification and wavelet transforms. We discuss the generalization of these results to the simulation of other quantum systems. We also apply the same set of techniques to the analysis of real images. The results show that the use of the quantum wavelet transform allows one to lower dramatically the number of measurements needed, but at the cost of a large loss of information.
A 3D inversion for all-space magnetotelluric data with static shift correction

NASA Astrophysics Data System (ADS)

Zhang, Kun

2017-04-01

Base on the previous studies on the static shift correction and 3D inversion algorithms, we improve the NLCG 3D inversion method and propose a new static shift correction method which work in the inversion. The static shift correction method is based on the 3D theory and real data. The static shift can be detected by the quantitative analysis of apparent parameters (apparent resistivity and impedance phase) of MT in high frequency range, and completed correction with inversion. The method is an automatic processing technology of computer with 0 cost, and avoids the additional field work and indoor processing with good results. The 3D inversion algorithm is improved (Zhang et al., 2013) base on the NLCG method of Newman & Alumbaugh (2000) and Rodi & Mackie (2001). For the algorithm, we added the parallel structure, improved the computational efficiency, reduced the memory of computer and added the topographic and marine factors. So the 3D inversion could work in general PC with high efficiency and accuracy. And all the MT data of surface stations, seabed stations and underground stations can be used in the inversion algorithm.
Parallelization and implementation of approximate root isolation for nonlinear system by Monte Carlo

NASA Astrophysics Data System (ADS)

Khosravi, Ebrahim

1998-12-01

This dissertation solves a fundamental problem of isolating the real roots of nonlinear systems of equations by Monte-Carlo that were published by Bush Jones. This algorithm requires only function values and can be applied readily to complicated systems of transcendental functions. The implementation of this sequential algorithm provides scientists with the means to utilize function analysis in mathematics or other fields of science. The algorithm, however, is so computationally intensive that the system is limited to a very small set of variables, and this will make it unfeasible for large systems of equations. Also a computational technique was needed for investigating a metrology of preventing the algorithm structure from converging to the same root along different paths of computation. The research provides techniques for improving the efficiency and correctness of the algorithm. The sequential algorithm for this technique was corrected and a parallel algorithm is presented. This parallel method has been formally analyzed and is compared with other known methods of root isolation. The effectiveness, efficiency, enhanced overall performance of the parallel processing of the program in comparison to sequential processing is discussed. The message passing model was used for this parallel processing, and it is presented and implemented on Intel/860 MIMD architecture. The parallel processing proposed in this research has been implemented in an ongoing high energy physics experiment: this algorithm has been used to track neutrinoes in a super K detector. This experiment is located in Japan, and data can be processed on-line or off-line locally or remotely.
Improved transition path sampling methods for simulation of rare events

NASA Astrophysics Data System (ADS)

Chopra, Manan; Malshe, Rohit; Reddy, Allam S.; de Pablo, J. J.

2008-04-01

The free energy surfaces of a wide variety of systems encountered in physics, chemistry, and biology are characterized by the existence of deep minima separated by numerous barriers. One of the central aims of recent research in computational chemistry and physics has been to determine how transitions occur between deep local minima on rugged free energy landscapes, and transition path sampling (TPS) Monte-Carlo methods have emerged as an effective means for numerical investigation of such transitions. Many of the shortcomings of TPS-like approaches generally stem from their high computational demands. Two new algorithms are presented in this work that improve the efficiency of TPS simulations. The first algorithm uses biased shooting moves to render the sampling of reactive trajectories more efficient. The second algorithm is shown to substantially improve the accuracy of the transition state ensemble by introducing a subset of local transition path simulations in the transition state. The system considered in this work consists of a two-dimensional rough energy surface that is representative of numerous systems encountered in applications. When taken together, these algorithms provide gains in efficiency of over two orders of magnitude when compared to traditional TPS simulations.
A discrete Fourier transform for virtual memory machines

NASA Technical Reports Server (NTRS)

Galant, David C.

1992-01-01

An algebraic theory of the Discrete Fourier Transform is developed in great detail. Examination of the details of the theory leads to a computationally efficient fast Fourier transform for the use on computers with virtual memory. Such an algorithm is of great use on modern desktop machines. A FORTRAN coded version of the algorithm is given for the case when the sequence of numbers to be transformed is a power of two.
Resolution Study of a Hyperspectral Sensor using Computed Tomography in the Presence of Noise

DTIC Science & Technology

2012-06-14

diffraction efficiency is dependent on wavelength. Compared to techniques developed by later work, simple algebraic reconstruction techniques were used...spectral di- mension, using computed tomography (CT) techniques with only a finite number of diverse images. CTHIS require a reconstruction algorithm in...many frames are needed to reconstruct the spectral cube of a simple object using a theoretical lower bound. In this research a new algorithm is derived

A different Deutsch-Jozsa

NASA Astrophysics Data System (ADS)

Bera, Debajyoti

2015-06-01

One of the early achievements of quantum computing was demonstrated by Deutsch and Jozsa (Proc R Soc Lond A Math Phys Sci 439(1907):553, 1992) regarding classification of a particular type of Boolean functions. Their solution demonstrated an exponential speedup compared to classical approaches to the same problem; however, their solution was the only known quantum algorithm for that specific problem so far. This paper demonstrates another quantum algorithm for the same problem, with the same exponential advantage compared to classical algorithms. The novelty of this algorithm is the use of quantum amplitude amplification, a technique that is the key component of another celebrated quantum algorithm developed by Grover (Proceedings of the twenty-eighth annual ACM symposium on theory of computing, ACM Press, New York, 1996). A lower bound for randomized (classical) algorithms is also presented which establishes a sound gap between the effectiveness of our quantum algorithm and that of any randomized algorithm with similar efficiency.
Multiple feature fusion via covariance matrix for visual tracking

NASA Astrophysics Data System (ADS)

Jin, Zefenfen; Hou, Zhiqiang; Yu, Wangsheng; Wang, Xin; Sun, Hui

2018-04-01

Aiming at the problem of complicated dynamic scenes in visual target tracking, a multi-feature fusion tracking algorithm based on covariance matrix is proposed to improve the robustness of the tracking algorithm. In the frame-work of quantum genetic algorithm, this paper uses the region covariance descriptor to fuse the color, edge and texture features. It also uses a fast covariance intersection algorithm to update the model. The low dimension of region covariance descriptor, the fast convergence speed and strong global optimization ability of quantum genetic algorithm, and the fast computation of fast covariance intersection algorithm are used to improve the computational efficiency of fusion, matching, and updating process, so that the algorithm achieves a fast and effective multi-feature fusion tracking. The experiments prove that the proposed algorithm can not only achieve fast and robust tracking but also effectively handle interference of occlusion, rotation, deformation, motion blur and so on.
A proximity algorithm accelerated by Gauss-Seidel iterations for L1/TV denoising models

NASA Astrophysics Data System (ADS)

Li, Qia; Micchelli, Charles A.; Shen, Lixin; Xu, Yuesheng

2012-09-01

Our goal in this paper is to improve the computational performance of the proximity algorithms for the L1/TV denoising model. This leads us to a new characterization of all solutions to the L1/TV model via fixed-point equations expressed in terms of the proximity operators. Based upon this observation we develop an algorithm for solving the model and establish its convergence. Furthermore, we demonstrate that the proposed algorithm can be accelerated through the use of the componentwise Gauss-Seidel iteration so that the CPU time consumed is significantly reduced. Numerical experiments using the proposed algorithm for impulsive noise removal are included, with a comparison to three recently developed algorithms. The numerical results show that while the proposed algorithm enjoys a high quality of the restored images, as the other three known algorithms do, it performs significantly better in terms of computational efficiency measured in the CPU time consumed.
Scheduled Relaxation Jacobi method: Improvements and applications

NASA Astrophysics Data System (ADS)

Adsuara, J. E.; Cordero-Carrión, I.; Cerdá-Durán, P.; Aloy, M. A.

2016-09-01

Elliptic partial differential equations (ePDEs) appear in a wide variety of areas of mathematics, physics and engineering. Typically, ePDEs must be solved numerically, which sets an ever growing demand for efficient and highly parallel algorithms to tackle their computational solution. The Scheduled Relaxation Jacobi (SRJ) is a promising class of methods, atypical for combining simplicity and efficiency, that has been recently introduced for solving linear Poisson-like ePDEs. The SRJ methodology relies on computing the appropriate parameters of a multilevel approach with the goal of minimizing the number of iterations needed to cut down the residuals below specified tolerances. The efficiency in the reduction of the residual increases with the number of levels employed in the algorithm. Applying the original methodology to compute the algorithm parameters with more than 5 levels notably hinders obtaining optimal SRJ schemes, as the mixed (non-linear) algebraic-differential system of equations from which they result becomes notably stiff. Here we present a new methodology for obtaining the parameters of SRJ schemes that overcomes the limitations of the original algorithm and provide parameters for SRJ schemes with up to 15 levels and resolutions of up to 215 points per dimension, allowing for acceleration factors larger than several hundreds with respect to the Jacobi method for typical resolutions and, in some high resolution cases, close to 1000. Most of the success in finding SRJ optimal schemes with more than 10 levels is based on an analytic reduction of the complexity of the previously mentioned system of equations. Furthermore, we extend the original algorithm to apply it to certain systems of non-linear ePDEs.
A Metascalable Computing Framework for Large Spatiotemporal-Scale Atomistic Simulations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nomura, K; Seymour, R; Wang, W

2009-02-17

A metascalable (or 'design once, scale on new architectures') parallel computing framework has been developed for large spatiotemporal-scale atomistic simulations of materials based on spatiotemporal data locality principles, which is expected to scale on emerging multipetaflops architectures. The framework consists of: (1) an embedded divide-and-conquer (EDC) algorithmic framework based on spatial locality to design linear-scaling algorithms for high complexity problems; (2) a space-time-ensemble parallel (STEP) approach based on temporal locality to predict long-time dynamics, while introducing multiple parallelization axes; and (3) a tunable hierarchical cellular decomposition (HCD) parallelization framework to map these O(N) algorithms onto a multicore cluster based onmore » hybrid implementation combining message passing and critical section-free multithreading. The EDC-STEP-HCD framework exposes maximal concurrency and data locality, thereby achieving: (1) inter-node parallel efficiency well over 0.95 for 218 billion-atom molecular-dynamics and 1.68 trillion electronic-degrees-of-freedom quantum-mechanical simulations on 212,992 IBM BlueGene/L processors (superscalability); (2) high intra-node, multithreading parallel efficiency (nanoscalability); and (3) nearly perfect time/ensemble parallel efficiency (eon-scalability). The spatiotemporal scale covered by MD simulation on a sustained petaflops computer per day (i.e. petaflops {center_dot} day of computing) is estimated as NT = 2.14 (e.g. N = 2.14 million atoms for T = 1 microseconds).« less
Wavelet compression of multichannel ECG data by enhanced set partitioning in hierarchical trees algorithm.

PubMed

Sharifahmadian, Ershad

2006-01-01

The set partitioning in hierarchical trees (SPIHT) algorithm is very effective and computationally simple technique for image and signal compression. Here the author modified the algorithm which provides even better performance than the SPIHT algorithm. The enhanced set partitioning in hierarchical trees (ESPIHT) algorithm has performance faster than the SPIHT algorithm. In addition, the proposed algorithm reduces the number of bits in a bit stream which is stored or transmitted. I applied it to compression of multichannel ECG data. Also, I presented a specific procedure based on the modified algorithm for more efficient compression of multichannel ECG data. This method employed on selected records from the MIT-BIH arrhythmia database. According to experiments, the proposed method attained the significant results regarding compression of multichannel ECG data. Furthermore, in order to compress one signal which is stored for a long time, the proposed multichannel compression method can be utilized efficiently.
A real-time implementation of an advanced sensor failure detection, isolation, and accommodation algorithm

NASA Technical Reports Server (NTRS)

Delaat, J. C.; Merrill, W. C.

1983-01-01

A sensor failure detection, isolation, and accommodation algorithm was developed which incorporates analytic sensor redundancy through software. This algorithm was implemented in a high level language on a microprocessor based controls computer. Parallel processing and state-of-the-art 16-bit microprocessors are used along with efficient programming practices to achieve real-time operation.
Simulation of synthetic discriminant function optical implementation

NASA Astrophysics Data System (ADS)

Riggins, J.; Butler, S.

1984-12-01

The optical implementation of geometrical shape and synthetic discriminant function matched filters is computer modeled. The filter implementation utilizes the Allebach-Keegan computer-generated hologram algorithm. Signal-to-noise and efficiency measurements were made on the resultant correlation planes.
ClimateSpark: An in-memory distributed computing framework for big climate data analytics

NASA Astrophysics Data System (ADS)

Hu, Fei; Yang, Chaowei; Schnase, John L.; Duffy, Daniel Q.; Xu, Mengchao; Bowen, Michael K.; Lee, Tsengdar; Song, Weiwei

2018-06-01

The unprecedented growth of climate data creates new opportunities for climate studies, and yet big climate data pose a grand challenge to climatologists to efficiently manage and analyze big data. The complexity of climate data content and analytical algorithms increases the difficulty of implementing algorithms on high performance computing systems. This paper proposes an in-memory, distributed computing framework, ClimateSpark, to facilitate complex big data analytics and time-consuming computational tasks. Chunking data structure improves parallel I/O efficiency, while a spatiotemporal index is built for the chunks to avoid unnecessary data reading and preprocessing. An integrated, multi-dimensional, array-based data model (ClimateRDD) and ETL operations are developed to address big climate data variety by integrating the processing components of the climate data lifecycle. ClimateSpark utilizes Spark SQL and Apache Zeppelin to develop a web portal to facilitate the interaction among climatologists, climate data, analytic operations and computing resources (e.g., using SQL query and Scala/Python notebook). Experimental results show that ClimateSpark conducts different spatiotemporal data queries/analytics with high efficiency and data locality. ClimateSpark is easily adaptable to other big multiple-dimensional, array-based datasets in various geoscience domains.
Matching pursuit parallel decomposition of seismic data

NASA Astrophysics Data System (ADS)

Li, Chuanhui; Zhang, Fanchang

2017-07-01

In order to improve the computation speed of matching pursuit decomposition of seismic data, a matching pursuit parallel algorithm is designed in this paper. We pick a fixed number of envelope peaks from the current signal in every iteration according to the number of compute nodes and assign them to the compute nodes on average to search the optimal Morlet wavelets in parallel. With the help of parallel computer systems and Message Passing Interface, the parallel algorithm gives full play to the advantages of parallel computing to significantly improve the computation speed of the matching pursuit decomposition and also has good expandability. Besides, searching only one optimal Morlet wavelet by every compute node in every iteration is the most efficient implementation.
Column generation algorithms for virtual network embedding in flexi-grid optical networks.

PubMed

Lin, Rongping; Luo, Shan; Zhou, Jingwei; Wang, Sheng; Chen, Bin; Zhang, Xiaoning; Cai, Anliang; Zhong, Wen-De; Zukerman, Moshe

2018-04-16

Network virtualization provides means for efficient management of network resources by embedding multiple virtual networks (VNs) to share efficiently the same substrate network. Such virtual network embedding (VNE) gives rise to a challenging problem of how to optimize resource allocation to VNs and to guarantee their performance requirements. In this paper, we provide VNE algorithms for efficient management of flexi-grid optical networks. We provide an exact algorithm aiming to minimize the total embedding cost in terms of spectrum cost and computation cost for a single VN request. Then, to achieve scalability, we also develop a heuristic algorithm for the same problem. We apply these two algorithms for a dynamic traffic scenario where many VN requests arrive one-by-one. We first demonstrate by simulations for the case of a six-node network that the heuristic algorithm obtains very close blocking probabilities to exact algorithm (about 0.2% higher). Then, for a network of realistic size (namely, USnet) we demonstrate that the blocking probability of our new heuristic algorithm is about one magnitude lower than a simpler heuristic algorithm, which was a component of an earlier published algorithm.
Power and Efficiency Optimized in Traveling-Wave Tubes Over a Broad Frequency Bandwidth

NASA Technical Reports Server (NTRS)

Wilson, Jeffrey D.

2001-01-01

A traveling-wave tube (TWT) is an electron beam device that is used to amplify electromagnetic communication waves at radio and microwave frequencies. TWT's are critical components in deep space probes, communication satellites, and high-power radar systems. Power conversion efficiency is of paramount importance for TWT's employed in deep space probes and communication satellites. A previous effort was very successful in increasing efficiency and power at a single frequency (ref. 1). Such an algorithm is sufficient for narrow bandwidth designs, but for optimal designs in applications that require high radiofrequency power over a wide bandwidth, such as high-density communications or high-resolution radar, the variation of the circuit response with respect to frequency must be considered. This work at the NASA Glenn Research Center is the first to develop techniques for optimizing TWT efficiency and output power over a broad frequency bandwidth (ref. 2). The techniques are based on simulated annealing, which has the advantage over conventional optimization techniques in that it enables the best possible solution to be obtained (ref. 3). Two new broadband simulated annealing algorithms were developed that optimize (1) minimum saturated power efficiency over a frequency bandwidth and (2) simultaneous bandwidth and minimum power efficiency over the frequency band with constant input power. The algorithms were incorporated into the NASA coupled-cavity TWT computer model (ref. 4) and used to design optimal phase velocity tapers using the 59- to 64-GHz Hughes 961HA coupled-cavity TWT as a baseline model. In comparison to the baseline design, the computational results of the first broad-band design algorithm show an improvement of 73.9 percent in minimum saturated efficiency (see the top graph). The second broadband design algorithm (see the bottom graph) improves minimum radiofrequency efficiency with constant input power drive by a factor of 2.7 at the high band edge (64 GHz) and increases simultaneous bandwidth by 500 MHz.
Evaluation of Emerging Energy-Efficient Heterogeneous Computing Platforms for Biomolecular and Cellular Simulation Workloads

PubMed Central

Stone, John E.; Hallock, Michael J.; Phillips, James C.; Peterson, Joseph R.; Luthey-Schulten, Zaida; Schulten, Klaus

2016-01-01

Many of the continuing scientific advances achieved through computational biology are predicated on the availability of ongoing increases in computational power required for detailed simulation and analysis of cellular processes on biologically-relevant timescales. A critical challenge facing the development of future exascale supercomputer systems is the development of new computing hardware and associated scientific applications that dramatically improve upon the energy efficiency of existing solutions, while providing increased simulation, analysis, and visualization performance. Mobile computing platforms have recently become powerful enough to support interactive molecular visualization tasks that were previously only possible on laptops and workstations, creating future opportunities for their convenient use for meetings, remote collaboration, and as head mounted displays for immersive stereoscopic viewing. We describe early experiences adapting several biomolecular simulation and analysis applications for emerging heterogeneous computing platforms that combine power-efficient system-on-chip multi-core CPUs with high-performance massively parallel GPUs. We present low-cost power monitoring instrumentation that provides sufficient temporal resolution to evaluate the power consumption of individual CPU algorithms and GPU kernels. We compare the performance and energy efficiency of scientific applications running on emerging platforms with results obtained on traditional platforms, identify hardware and algorithmic performance bottlenecks that affect the usability of these platforms, and describe avenues for improving both the hardware and applications in pursuit of the needs of molecular modeling tasks on mobile devices and future exascale computers. PMID:27516922
Decomposed multidimensional control grid interpolation for common consumer electronic image processing applications

NASA Astrophysics Data System (ADS)

Zwart, Christine M.; Venkatesan, Ragav; Frakes, David H.

2012-10-01

Interpolation is an essential and broadly employed function of signal processing. Accordingly, considerable development has focused on advancing interpolation algorithms toward optimal accuracy. Such development has motivated a clear shift in the state-of-the art from classical interpolation to more intelligent and resourceful approaches, registration-based interpolation for example. As a natural result, many of the most accurate current algorithms are highly complex, specific, and computationally demanding. However, the diverse hardware destinations for interpolation algorithms present unique constraints that often preclude use of the most accurate available options. For example, while computationally demanding interpolators may be suitable for highly equipped image processing platforms (e.g., computer workstations and clusters), only more efficient interpolators may be practical for less well equipped platforms (e.g., smartphones and tablet computers). The latter examples of consumer electronics present a design tradeoff in this regard: high accuracy interpolation benefits the consumer experience but computing capabilities are limited. It follows that interpolators with favorable combinations of accuracy and efficiency are of great practical value to the consumer electronics industry. We address multidimensional interpolation-based image processing problems that are common to consumer electronic devices through a decomposition approach. The multidimensional problems are first broken down into multiple, independent, one-dimensional (1-D) interpolation steps that are then executed with a newly modified registration-based one-dimensional control grid interpolator. The proposed approach, decomposed multidimensional control grid interpolation (DMCGI), combines the accuracy of registration-based interpolation with the simplicity, flexibility, and computational efficiency of a 1-D interpolation framework. Results demonstrate that DMCGI provides improved interpolation accuracy (and other benefits) in image resizing, color sample demosaicing, and video deinterlacing applications, at a computational cost that is manageable or reduced in comparison to popular alternatives.
Sizing of complex structure by the integration of several different optimal design algorithms

NASA Technical Reports Server (NTRS)

Sobieszczanski, J.

1974-01-01

Practical design of large-scale structures can be accomplished with the aid of the digital computer by bringing together in one computer program algorithms of nonlinear mathematical programing and optimality criteria with weight-strength and other so-called engineering methods. Applications of this approach to aviation structures are discussed with a detailed description of how the total problem of structural sizing can be broken down into subproblems for best utilization of each algorithm and for efficient organization of the program into iterative loops. Typical results are examined for a number of examples.
Computing Interactions Of Free-Space Radiation With Matter

NASA Technical Reports Server (NTRS)

Wilson, J. W.; Cucinotta, F. A.; Shinn, J. L.; Townsend, L. W.; Badavi, F. F.; Tripathi, R. K.; Silberberg, R.; Tsao, C. H.; Badwar, G. D.

1995-01-01

High Charge and Energy Transport (HZETRN) computer program computationally efficient, user-friendly package of software adressing problem of transport of, and shielding against, radiation in free space. Designed as "black box" for design engineers not concerned with physics of underlying atomic and nuclear radiation processes in free-space environment, but rather primarily interested in obtaining fast and accurate dosimetric information for design and construction of modules and devices for use in free space. Computational efficiency achieved by unique algorithm based on deterministic approach to solution of Boltzmann equation rather than computationally intensive statistical Monte Carlo method. Written in FORTRAN.
Cloud Computing and Its Applications in GIS

NASA Astrophysics Data System (ADS)

Kang, Cao

2011-12-01

Cloud computing is a novel computing paradigm that offers highly scalable and highly available distributed computing services. The objectives of this research are to: 1. analyze and understand cloud computing and its potential for GIS; 2. discover the feasibilities of migrating truly spatial GIS algorithms to distributed computing infrastructures; 3. explore a solution to host and serve large volumes of raster GIS data efficiently and speedily. These objectives thus form the basis for three professional articles. The first article is entitled "Cloud Computing and Its Applications in GIS". This paper introduces the concept, structure, and features of cloud computing. Features of cloud computing such as scalability, parallelization, and high availability make it a very capable computing paradigm. Unlike High Performance Computing (HPC), cloud computing uses inexpensive commodity computers. The uniform administration systems in cloud computing make it easier to use than GRID computing. Potential advantages of cloud-based GIS systems such as lower barrier to entry are consequently presented. Three cloud-based GIS system architectures are proposed: public cloud- based GIS systems, private cloud-based GIS systems and hybrid cloud-based GIS systems. Public cloud-based GIS systems provide the lowest entry barriers for users among these three architectures, but their advantages are offset by data security and privacy related issues. Private cloud-based GIS systems provide the best data protection, though they have the highest entry barriers. Hybrid cloud-based GIS systems provide a compromise between these extremes. The second article is entitled "A cloud computing algorithm for the calculation of Euclidian distance for raster GIS". Euclidean distance is a truly spatial GIS algorithm. Classical algorithms such as the pushbroom and growth ring techniques require computational propagation through the entire raster image, which makes it incompatible with the distributed nature of cloud computing. This paper presents a parallel Euclidean distance algorithm that works seamlessly with the distributed nature of cloud computing infrastructures. The mechanism of this algorithm is to subdivide a raster image into sub-images and wrap them with a one pixel deep edge layer of individually computed distance information. Each sub-image is then processed by a separate node, after which the resulting sub-images are reassembled into the final output. It is shown that while any rectangular sub-image shape can be used, those approximating squares are computationally optimal. This study also serves as a demonstration of this subdivide and layer-wrap strategy, which would enable the migration of many truly spatial GIS algorithms to cloud computing infrastructures. However, this research also indicates that certain spatial GIS algorithms such as cost distance cannot be migrated by adopting this mechanism, which presents significant challenges for the development of cloud-based GIS systems. The third article is entitled "A Distributed Storage Schema for Cloud Computing based Raster GIS Systems". This paper proposes a NoSQL Database Management System (NDDBMS) based raster GIS data storage schema. NDDBMS has good scalability and is able to use distributed commodity computers, which make it superior to Relational Database Management Systems (RDBMS) in a cloud computing environment. In order to provide optimized data service performance, the proposed storage schema analyzes the nature of commonly used raster GIS data sets. It discriminates two categories of commonly used data sets, and then designs corresponding data storage models for both categories. As a result, the proposed storage schema is capable of hosting and serving enormous volumes of raster GIS data speedily and efficiently on cloud computing infrastructures. In addition, the scheme also takes advantage of the data compression characteristics of Quadtrees, thus promoting efficient data storage. Through this assessment of cloud computing technology, the exploration of the challenges and solutions to the migration of GIS algorithms to cloud computing infrastructures, and the examination of strategies for serving large amounts of GIS data in a cloud computing infrastructure, this dissertation lends support to the feasibility of building a cloud-based GIS system. However, there are still challenges that need to be addressed before a full-scale functional cloud-based GIS system can be successfully implemented. (Abstract shortened by UMI.)
Portfolios of quantum algorithms.

PubMed

Maurer, S M; Hogg, T; Huberman, B A

2001-12-17

Quantum computation holds promise for the solution of many intractable problems. However, since many quantum algorithms are stochastic in nature they can find the solution of hard problems only probabilistically. Thus the efficiency of the algorithms has to be characterized by both the expected time to completion and the associated variance. In order to minimize both the running time and its uncertainty, we show that portfolios of quantum algorithms analogous to those of finance can outperform single algorithms when applied to the NP-complete problems such as 3-satisfiability.
A memetic optimization algorithm for multi-constrained multicast routing in ad hoc networks

PubMed Central

Hammad, Karim; El Bakly, Ahmed M.

2018-01-01

A mobile ad hoc network is a conventional self-configuring network where the routing optimization problem—subject to various Quality-of-Service (QoS) constraints—represents a major challenge. Unlike previously proposed solutions, in this paper, we propose a memetic algorithm (MA) employing an adaptive mutation parameter, to solve the multicast routing problem with higher search ability and computational efficiency. The proposed algorithm utilizes an updated scheme, based on statistical analysis, to estimate the best values for all MA parameters and enhance MA performance. The numerical results show that the proposed MA improved the delay and jitter of the network, while reducing computational complexity as compared to existing algorithms. PMID:29509760
A memetic optimization algorithm for multi-constrained multicast routing in ad hoc networks.

PubMed

Ramadan, Rahab M; Gasser, Safa M; El-Mahallawy, Mohamed S; Hammad, Karim; El Bakly, Ahmed M

2018-01-01

A mobile ad hoc network is a conventional self-configuring network where the routing optimization problem-subject to various Quality-of-Service (QoS) constraints-represents a major challenge. Unlike previously proposed solutions, in this paper, we propose a memetic algorithm (MA) employing an adaptive mutation parameter, to solve the multicast routing problem with higher search ability and computational efficiency. The proposed algorithm utilizes an updated scheme, based on statistical analysis, to estimate the best values for all MA parameters and enhance MA performance. The numerical results show that the proposed MA improved the delay and jitter of the network, while reducing computational complexity as compared to existing algorithms.

Real-time robot deliberation by compilation and monitoring of anytime algorithms

NASA Technical Reports Server (NTRS)

Zilberstein, Shlomo

1994-01-01

Anytime algorithms are algorithms whose quality of results improves gradually as computation time increases. Certainty, accuracy, and specificity are metrics useful in anytime algorighm construction. It is widely accepted that a successful robotic system must trade off between decision quality and the computational resources used to produce it. Anytime algorithms were designed to offer such a trade off. A model of compilation and monitoring mechanisms needed to build robots that can efficiently control their deliberation time is presented. This approach simplifies the design and implementation of complex intelligent robots, mechanizes the composition and monitoring processes, and provides independent real time robotic systems that automatically adjust resource allocation to yield optimum performance.
Efficient bulk-loading of gridfiles

NASA Technical Reports Server (NTRS)

Leutenegger, Scott T.; Nicol, David M.

1994-01-01

This paper considers the problem of bulk-loading large data sets for the gridfile multiattribute indexing technique. We propose a rectilinear partitioning algorithm that heuristically seeks to minimize the size of the gridfile needed to ensure no bucket overflows. Empirical studies on both synthetic data sets and on data sets drawn from computational fluid dynamics applications demonstrate that our algorithm is very efficient, and is able to handle large data sets. In addition, we present an algorithm for bulk-loading data sets too large to fit in main memory. Utilizing a sort of the entire data set it creates a gridfile without incurring any overflows.
New Parallel Algorithms for Structural Analysis and Design of Aerospace Structures

NASA Technical Reports Server (NTRS)

Nguyen, Duc T.

1998-01-01

Subspace and Lanczos iterations have been developed, well documented, and widely accepted as efficient methods for obtaining p-lowest eigen-pair solutions of large-scale, practical engineering problems. The focus of this paper is to incorporate recent developments in vectorized sparse technologies in conjunction with Subspace and Lanczos iterative algorithms for computational enhancements. Numerical performance, in terms of accuracy and efficiency of the proposed sparse strategies for Subspace and Lanczos algorithm, is demonstrated by solving for the lowest frequencies and mode shapes of structural problems on the IBM-R6000/590 and SunSparc 20 workstations.
Efficient and optimized identification of generalized Maxwell viscoelastic relaxation spectra

PubMed Central

Babaei, Behzad; Davarian, Ali; Pryse, Kenneth M.; Elson, Elliot L.; Genin, Guy M.

2017-01-01

Viscoelastic relaxation spectra are essential for predicting and interpreting the mechanical responses of materials and structures. For biological tissues, these spectra must usually be estimated from viscoelastic relaxation tests. Interpreting viscoelastic relaxation tests is challenging because the inverse problem is expensive computationally. We present here an efficient algorithm that enables rapid identification of viscoelastic relaxation spectra. The algorithm was tested against trial data to characterize its robustness and identify its limitations and strengths. The algorithm was then applied to identify the viscoelastic response of reconstituted collagen, revealing an extensive distribution of viscoelastic time constants. PMID:26523785
Efficient parallel and out of core algorithms for constructing large bi-directed de Bruijn graphs.

PubMed

Kundeti, Vamsi K; Rajasekaran, Sanguthevar; Dinh, Hieu; Vaughn, Matthew; Thapar, Vishal

2010-11-15

Assembling genomic sequences from a set of overlapping reads is one of the most fundamental problems in computational biology. Algorithms addressing the assembly problem fall into two broad categories - based on the data structures which they employ. The first class uses an overlap/string graph and the second type uses a de Bruijn graph. However with the recent advances in short read sequencing technology, de Bruijn graph based algorithms seem to play a vital role in practice. Efficient algorithms for building these massive de Bruijn graphs are very essential in large sequencing projects based on short reads. In an earlier work, an O(n/p) time parallel algorithm has been given for this problem. Here n is the size of the input and p is the number of processors. This algorithm enumerates all possible bi-directed edges which can overlap with a node and ends up generating Θ(nΣ) messages (Σ being the size of the alphabet). In this paper we present a Θ(n/p) time parallel algorithm with a communication complexity that is equal to that of parallel sorting and is not sensitive to Σ. The generality of our algorithm makes it very easy to extend it even to the out-of-core model and in this case it has an optimal I/O complexity of Θ(nlog(n/B)Blog(M/B)) (M being the main memory size and B being the size of the disk block). We demonstrate the scalability of our parallel algorithm on a SGI/Altix computer. A comparison of our algorithm with the previous approaches reveals that our algorithm is faster--both asymptotically and practically. We demonstrate the scalability of our sequential out-of-core algorithm by comparing it with the algorithm used by VELVET to build the bi-directed de Bruijn graph. Our experiments reveal that our algorithm can build the graph with a constant amount of memory, which clearly outperforms VELVET. We also provide efficient algorithms for the bi-directed chain compaction problem. The bi-directed de Bruijn graph is a fundamental data structure for any sequence assembly program based on Eulerian approach. Our algorithms for constructing Bi-directed de Bruijn graphs are efficient in parallel and out of core settings. These algorithms can be used in building large scale bi-directed de Bruijn graphs. Furthermore, our algorithms do not employ any all-to-all communications in a parallel setting and perform better than the prior algorithms. Finally our out-of-core algorithm is extremely memory efficient and can replace the existing graph construction algorithm in VELVET.
HRSSA - Efficient hybrid stochastic simulation for spatially homogeneous biochemical reaction networks

NASA Astrophysics Data System (ADS)

Marchetti, Luca; Priami, Corrado; Thanh, Vo Hong

2016-07-01

This paper introduces HRSSA (Hybrid Rejection-based Stochastic Simulation Algorithm), a new efficient hybrid stochastic simulation algorithm for spatially homogeneous biochemical reaction networks. HRSSA is built on top of RSSA, an exact stochastic simulation algorithm which relies on propensity bounds to select next reaction firings and to reduce the average number of reaction propensity updates needed during the simulation. HRSSA exploits the computational advantage of propensity bounds to manage time-varying transition propensities and to apply dynamic partitioning of reactions, which constitute the two most significant bottlenecks of hybrid simulation. A comprehensive set of simulation benchmarks is provided for evaluating performance and accuracy of HRSSA against other state of the art algorithms.
Algorithms for solving large sparse systems of simultaneous linear equations on vector processors

NASA Technical Reports Server (NTRS)

David, R. E.

1984-01-01

Very efficient algorithms for solving large sparse systems of simultaneous linear equations have been developed for serial processing computers. These involve a reordering of matrix rows and columns in order to obtain a near triangular pattern of nonzero elements. Then an LU factorization is developed to represent the matrix inverse in terms of a sequence of elementary Gaussian eliminations, or pivots. In this paper it is shown how these algorithms are adapted for efficient implementation on vector processors. Results obtained on the CYBER 200 Model 205 are presented for a series of large test problems which show the comparative advantages of the triangularization and vector processing algorithms.
Real-time optical flow estimation on a GPU for a skied-steered mobile robot

NASA Astrophysics Data System (ADS)

Kniaz, V. V.

2016-04-01

Accurate egomotion estimation is required for mobile robot navigation. Often the egomotion is estimated using optical flow algorithms. For an accurate estimation of optical flow most of modern algorithms require high memory resources and processor speed. However simple single-board computers that control the motion of the robot usually do not provide such resources. On the other hand, most of modern single-board computers are equipped with an embedded GPU that could be used in parallel with a CPU to improve the performance of the optical flow estimation algorithm. This paper presents a new Z-flow algorithm for efficient computation of an optical flow using an embedded GPU. The algorithm is based on the phase correlation optical flow estimation and provide a real-time performance on a low cost embedded GPU. The layered optical flow model is used. Layer segmentation is performed using graph-cut algorithm with a time derivative based energy function. Such approach makes the algorithm both fast and robust in low light and low texture conditions. The algorithm implementation for a Raspberry Pi Model B computer is discussed. For evaluation of the algorithm the computer was mounted on a Hercules mobile skied-steered robot equipped with a monocular camera. The evaluation was performed using a hardware-in-the-loop simulation and experiments with Hercules mobile robot. Also the algorithm was evaluated using KITTY Optical Flow 2015 dataset. The resulting endpoint error of the optical flow calculated with the developed algorithm was low enough for navigation of the robot along the desired trajectory.
Efficient volumetric estimation from plenoptic data

NASA Astrophysics Data System (ADS)

Anglin, Paul; Reeves, Stanley J.; Thurow, Brian S.

2013-03-01

The commercial release of the Lytro camera, and greater availability of plenoptic imaging systems in general, have given the image processing community cost-effective tools for light-field imaging. While this data is most commonly used to generate planar images at arbitrary focal depths, reconstruction of volumetric fields is also possible. Similarly, deconvolution is a technique that is conventionally used in planar image reconstruction, or deblurring, algorithms. However, when leveraged with the ability of a light-field camera to quickly reproduce multiple focal planes within an imaged volume, deconvolution offers a computationally efficient method of volumetric reconstruction. Related research has shown than light-field imaging systems in conjunction with tomographic reconstruction techniques are also capable of estimating the imaged volume and have been successfully applied to particle image velocimetry (PIV). However, while tomographic volumetric estimation through algorithms such as multiplicative algebraic reconstruction techniques (MART) have proven to be highly accurate, they are computationally intensive. In this paper, the reconstruction problem is shown to be solvable by deconvolution. Deconvolution offers significant improvement in computational efficiency through the use of fast Fourier transforms (FFTs) when compared to other tomographic methods. This work describes a deconvolution algorithm designed to reconstruct a 3-D particle field from simulated plenoptic data. A 3-D extension of existing 2-D FFT-based refocusing techniques is presented to further improve efficiency when computing object focal stacks and system point spread functions (PSF). Reconstruction artifacts are identified; their underlying source and methods of mitigation are explored where possible, and reconstructions of simulated particle fields are provided.
Extension of a streamwise upwind algorithm to a moving grid system

NASA Technical Reports Server (NTRS)

Obayashi, Shigeru; Goorjian, Peter M.; Guruswamy, Guru P.

1990-01-01

A new streamwise upwind algorithm was derived to compute unsteady flow fields with the use of a moving-grid system. The temporally nonconservative LU-ADI (lower-upper-factored, alternating-direction-implicit) method was applied for time marching computations. A comparison of the temporally nonconservative method with a time-conservative implicit upwind method indicates that the solutions are insensitive to the conservative properties of the implicit solvers when practical time steps are used. Using this new method, computations were made for an oscillating wing at a transonic Mach number. The computed results confirm that the present upwind scheme captures the shock motion better than the central-difference scheme based on the beam-warming algorithm. The new upwind option of the code allows larger time-steps and thus is more efficient, even though it requires slightly more computational time per time step than the central-difference option.
Continuous-variable quantum Gaussian process regression and quantum singular value decomposition of nonsparse low-rank matrices

NASA Astrophysics Data System (ADS)

Das, Siddhartha; Siopsis, George; Weedbrook, Christian

2018-02-01

With the significant advancement in quantum computation during the past couple of decades, the exploration of machine-learning subroutines using quantum strategies has become increasingly popular. Gaussian process regression is a widely used technique in supervised classical machine learning. Here we introduce an algorithm for Gaussian process regression using continuous-variable quantum systems that can be realized with technology based on photonic quantum computers under certain assumptions regarding distribution of data and availability of efficient quantum access. Our algorithm shows that by using a continuous-variable quantum computer a dramatic speedup in computing Gaussian process regression can be achieved, i.e., the possibility of exponentially reducing the time to compute. Furthermore, our results also include a continuous-variable quantum-assisted singular value decomposition method of nonsparse low rank matrices and forms an important subroutine in our Gaussian process regression algorithm.
Developing Subdomain Allocation Algorithms Based on Spatial and Communicational Constraints to Accelerate Dust Storm Simulation

PubMed Central

Gui, Zhipeng; Yu, Manzhu; Yang, Chaowei; Jiang, Yunfeng; Chen, Songqing; Xia, Jizhe; Huang, Qunying; Liu, Kai; Li, Zhenlong; Hassan, Mohammed Anowarul; Jin, Baoxuan

2016-01-01

Dust storm has serious disastrous impacts on environment, human health, and assets. The developments and applications of dust storm models have contributed significantly to better understand and predict the distribution, intensity and structure of dust storms. However, dust storm simulation is a data and computing intensive process. To improve the computing performance, high performance computing has been widely adopted by dividing the entire study area into multiple subdomains and allocating each subdomain on different computing nodes in a parallel fashion. Inappropriate allocation may introduce imbalanced task loads and unnecessary communications among computing nodes. Therefore, allocation is a key factor that may impact the efficiency of parallel process. An allocation algorithm is expected to consider the computing cost and communication cost for each computing node to minimize total execution time and reduce overall communication cost for the entire simulation. This research introduces three algorithms to optimize the allocation by considering the spatial and communicational constraints: 1) an Integer Linear Programming (ILP) based algorithm from combinational optimization perspective; 2) a K-Means and Kernighan-Lin combined heuristic algorithm (K&K) integrating geometric and coordinate-free methods by merging local and global partitioning; 3) an automatic seeded region growing based geometric and local partitioning algorithm (ASRG). The performance and effectiveness of the three algorithms are compared based on different factors. Further, we adopt the K&K algorithm as the demonstrated algorithm for the experiment of dust model simulation with the non-hydrostatic mesoscale model (NMM-dust) and compared the performance with the MPI default sequential allocation. The results demonstrate that K&K method significantly improves the simulation performance with better subdomain allocation. This method can also be adopted for other relevant atmospheric and numerical modeling. PMID:27044039
Computation of multi-dimensional viscous supersonic jet flow

NASA Technical Reports Server (NTRS)

Kim, Y. N.; Buggeln, R. C.; Mcdonald, H.

1986-01-01

A new method has been developed for two- and three-dimensional computations of viscous supersonic flows with embedded subsonic regions adjacent to solid boundaries. The approach employs a reduced form of the Navier-Stokes equations which allows solution as an initial-boundary value problem in space, using an efficient noniterative forward marching algorithm. Numerical instability associated with forward marching algorithms for flows with embedded subsonic regions is avoided by approximation of the reduced form of the Navier-Stokes equations in the subsonic regions of the boundary layers. Supersonic and subsonic portions of the flow field are simultaneously calculated by a consistently split linearized block implicit computational algorithm. The results of computations for a series of test cases relevant to internal supersonic flow is presented and compared with data. Comparison between data and computation are in general excellent thus indicating that the computational technique has great promise as a tool for calculating supersonic flow with embedded subsonic regions. Finally, a User's Manual is presented for the computer code used to perform the calculations.
Energy scaling advantages of resistive memory crossbar based computation and its application to sparse coding

DOE Office of Scientific and Technical Information (OSTI.GOV)

Agarwal, Sapan; Quach, Tu -Thach; Parekh, Ojas

In this study, the exponential increase in data over the last decade presents a significant challenge to analytics efforts that seek to process and interpret such data for various applications. Neural-inspired computing approaches are being developed in order to leverage the computational properties of the analog, low-power data processing observed in biological systems. Analog resistive memory crossbars can perform a parallel read or a vector-matrix multiplication as well as a parallel write or a rank-1 update with high computational efficiency. For an N × N crossbar, these two kernels can be O(N) more energy efficient than a conventional digital memory-basedmore » architecture. If the read operation is noise limited, the energy to read a column can be independent of the crossbar size (O(1)). These two kernels form the basis of many neuromorphic algorithms such as image, text, and speech recognition. For instance, these kernels can be applied to a neural sparse coding algorithm to give an O(N) reduction in energy for the entire algorithm when run with finite precision. Sparse coding is a rich problem with a host of applications including computer vision, object tracking, and more generally unsupervised learning.« less
Energy scaling advantages of resistive memory crossbar based computation and its application to sparse coding

DOE PAGES

Agarwal, Sapan; Quach, Tu -Thach; Parekh, Ojas; ...

2016-01-06

In this study, the exponential increase in data over the last decade presents a significant challenge to analytics efforts that seek to process and interpret such data for various applications. Neural-inspired computing approaches are being developed in order to leverage the computational properties of the analog, low-power data processing observed in biological systems. Analog resistive memory crossbars can perform a parallel read or a vector-matrix multiplication as well as a parallel write or a rank-1 update with high computational efficiency. For an N × N crossbar, these two kernels can be O(N) more energy efficient than a conventional digital memory-basedmore » architecture. If the read operation is noise limited, the energy to read a column can be independent of the crossbar size (O(1)). These two kernels form the basis of many neuromorphic algorithms such as image, text, and speech recognition. For instance, these kernels can be applied to a neural sparse coding algorithm to give an O(N) reduction in energy for the entire algorithm when run with finite precision. Sparse coding is a rich problem with a host of applications including computer vision, object tracking, and more generally unsupervised learning.« less
A GENERAL ALGORITHM FOR THE CONSTRUCTION OF CONTOUR PLOTS

NASA Technical Reports Server (NTRS)

Johnson, W.

1994-01-01

The graphical presentation of experimentally or theoretically generated data sets frequently involves the construction of contour plots. A general computer algorithm has been developed for the construction of contour plots. The algorithm provides for efficient and accurate contouring with a modular approach which allows flexibility in modifying the algorithm for special applications. The algorithm accepts as input data values at a set of points irregularly distributed over a plane. The algorithm is based on an interpolation scheme in which the points in the plane are connected by straight line segments to form a set of triangles. In general, the data is smoothed using a least-squares-error fit of the data to a bivariate polynomial. To construct the contours, interpolation along the edges of the triangles is performed, using the bivariable polynomial if data smoothing was performed. Once the contour points have been located, the contour may be drawn. This program is written in FORTRAN IV for batch execution and has been implemented on an IBM 360 series computer with a central memory requirement of approximately 100K of 8-bit bytes. This computer algorithm was developed in 1981.
New Parallel Algorithms for Landscape Evolution Model

NASA Astrophysics Data System (ADS)

Jin, Y.; Zhang, H.; Shi, Y.

2017-12-01

Most landscape evolution models (LEM) developed in the last two decades solve the diffusion equation to simulate the transportation of surface sediments. This numerical approach is difficult to parallelize due to the computation of drainage area for each node, which needs huge amount of communication if run in parallel. In order to overcome this difficulty, we developed two parallel algorithms for LEM with a stream net. One algorithm handles the partition of grid with traditional methods and applies an efficient global reduction algorithm to do the computation of drainage areas and transport rates for the stream net; the other algorithm is based on a new partition algorithm, which partitions the nodes in catchments between processes first, and then partitions the cells according to the partition of nodes. Both methods focus on decreasing communication between processes and take the advantage of massive computing techniques, and numerical experiments show that they are both adequate to handle large scale problems with millions of cells. We implemented the two algorithms in our program based on the widely used finite element library deal.II, so that it can be easily coupled with ASPECT.
Renewable energy in electric utility capacity planning: a decomposition approach with application to a Mexican utility

DOE Office of Scientific and Technical Information (OSTI.GOV)

Staschus, K.

1985-01-01

In this dissertation, efficient algorithms for electric-utility capacity expansion planning with renewable energy are developed. The algorithms include a deterministic phase that quickly finds a near-optimal expansion plan using derating and a linearized approximation to the time-dependent availability of nondispatchable energy sources. A probabilistic second phase needs comparatively few computer-time consuming probabilistic simulation iterations to modify this solution towards the optimal expansion plan. For the deterministic first phase, two algorithms, based on a Lagrangian Dual decomposition and a Generalized Benders Decomposition, are developed. The probabilistic second phase uses a Generalized Benders Decomposition approach. Extensive computational tests of the algorithms aremore » reported. Among the deterministic algorithms, the one based on Lagrangian Duality proves fastest. The two-phase approach is shown to save up to 80% in computing time as compared to a purely probabilistic algorithm. The algorithms are applied to determine the optimal expansion plan for the Tijuana-Mexicali subsystem of the Mexican electric utility system. A strong recommendation to push conservation programs in the desert city of Mexicali results from this implementation.« less
User's Guide for Computer Program that Routes Signal Traces

NASA Technical Reports Server (NTRS)

Hedgley, David R., Jr.

2000-01-01

This disk contains both a FORTRAN computer program and the corresponding user's guide that facilitates both its incorporation into your system and its utility. The computer program represents an efficient algorithm that routes signal traces on layers of a printed circuit with both through-pins and surface mounts. The computer program included is an implementation of the ideas presented in the theoretical paper titled "A Formal Algorithm for Routing Signal Traces on a Printed Circuit Board", NASA TP-3639 published in 1996. The computer program in the "connects" file can be read with a FORTRAN compiler and readily integrated into software unique to each particular environment where it might be used.
Parallel Lattice Basis Reduction Using a Multi-threaded Schnorr-Euchner LLL Algorithm

NASA Astrophysics Data System (ADS)

Backes, Werner; Wetzel, Susanne

In this paper, we introduce a new parallel variant of the LLL lattice basis reduction algorithm. Our new, multi-threaded algorithm is the first to provide an efficient, parallel implementation of the Schorr-Euchner algorithm for today’s multi-processor, multi-core computer architectures. Experiments with sparse and dense lattice bases show a speed-up factor of about 1.8 for the 2-thread and about factor 3.2 for the 4-thread version of our new parallel lattice basis reduction algorithm in comparison to the traditional non-parallel algorithm.

Lane Detection on the iPhone

NASA Astrophysics Data System (ADS)

Ren, Feixiang; Huang, Jinsheng; Terauchi, Mutsuhiro; Jiang, Ruyi; Klette, Reinhard

A robust and efficient lane detection system is an essential component of Lane Departure Warning Systems, which are commonly used in many vision-based Driver Assistance Systems (DAS) in intelligent transportation. Various computation platforms have been proposed in the past few years for the implementation of driver assistance systems (e.g., PC, laptop, integrated chips, PlayStation, and so on). In this paper, we propose a new platform for the implementation of lane detection, which is based on a mobile phone (the iPhone). Due to physical limitations of the iPhone w.r.t. memory and computing power, a simple and efficient lane detection algorithm using a Hough transform is developed and implemented on the iPhone, as existing algorithms developed based on the PC platform are not suitable for mobile phone devices (currently). Experiments of the lane detection algorithm are made both on PC and on iPhone.
An Efficient Solution Method for Multibody Systems with Loops Using Multiple Processors

NASA Technical Reports Server (NTRS)

Ghosh, Tushar K.; Nguyen, Luong A.; Quiocho, Leslie J.

2015-01-01

This paper describes a multibody dynamics algorithm formulated for parallel implementation on multiprocessor computing platforms using the divide-and-conquer approach. The system of interest is a general topology of rigid and elastic articulated bodies with or without loops. The algorithm divides the multibody system into a number of smaller sets of bodies in chain or tree structures, called "branches" at convenient joints called "connection points", and uses an Order-N (O (N)) approach to formulate the dynamics of each branch in terms of the unknown spatial connection forces. The equations of motion for the branches, leaving the connection forces as unknowns, are implemented in separate processors in parallel for computational efficiency, and the equations for all the unknown connection forces are synthesized and solved in one or several processors. The performances of two implementations of this divide-and-conquer algorithm in multiple processors are compared with an existing method implemented on a single processor.
Fast adaptive flat-histogram ensemble to enhance the sampling in large systems

NASA Astrophysics Data System (ADS)

Xu, Shun; Zhou, Xin; Jiang, Yi; Wang, YanTing

2015-09-01

An efficient novel algorithm was developed to estimate the Density of States (DOS) for large systems by calculating the ensemble means of an extensive physical variable, such as the potential energy, U, in generalized canonical ensembles to interpolate the interior reverse temperature curve , where S( U) is the logarithm of the DOS. This curve is computed with different accuracies in different energy regions to capture the dependence of the reverse temperature on U without setting prior grid in the U space. By combining with a U-compression transformation, we decrease the computational complexity from O( N 3/2) in the normal Wang Landau type method to O( N 1/2) in the current algorithm, as the degrees of freedom of system N. The efficiency of the algorithm is demonstrated by applying to Lennard Jones fluids with various N, along with its ability to find different macroscopic states, including metastable states.
Phase-unwrapping algorithm by a rounding-least-squares approach

NASA Astrophysics Data System (ADS)

Juarez-Salazar, Rigoberto; Robledo-Sanchez, Carlos; Guerrero-Sanchez, Fermin

2014-02-01

A simple and efficient phase-unwrapping algorithm based on a rounding procedure and a global least-squares minimization is proposed. Instead of processing the gradient of the wrapped phase, this algorithm operates over the gradient of the phase jumps by a robust and noniterative scheme. Thus, the residue-spreading and over-smoothing effects are reduced. The algorithm's performance is compared with four well-known phase-unwrapping methods: minimum cost network flow (MCNF), fast Fourier transform (FFT), quality-guided, and branch-cut. A computer simulation and experimental results show that the proposed algorithm reaches a high-accuracy level than the MCNF method by a low-computing time similar to the FFT phase-unwrapping method. Moreover, since the proposed algorithm is simple, fast, and user-free, it could be used in metrological interferometric and fringe-projection automatic real-time applications.
An efficient parallel algorithm for matrix-vector multiplication

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hendrickson, B.; Leland, R.; Plimpton, S.

The multiplication of a vector by a matrix is the kernel computation of many algorithms in scientific computation. A fast parallel algorithm for this calculation is therefore necessary if one is to make full use of the new generation of parallel supercomputers. This paper presents a high performance, parallel matrix-vector multiplication algorithm that is particularly well suited to hypercube multiprocessors. For an n x n matrix on p processors, the communication cost of this algorithm is O(n/[radical]p + log(p)), independent of the matrix sparsity pattern. The performance of the algorithm is demonstrated by employing it as the kernel in themore » well-known NAS conjugate gradient benchmark, where a run time of 6.09 seconds was observed. This is the best published performance on this benchmark achieved to date using a massively parallel supercomputer.« less
Integral Images: Efficient Algorithms for Their Computation and Storage in Resource-Constrained Embedded Vision Systems

PubMed Central

Ehsan, Shoaib; Clark, Adrian F.; ur Rehman, Naveed; McDonald-Maier, Klaus D.

2015-01-01

The integral image, an intermediate image representation, has found extensive use in multi-scale local feature detection algorithms, such as Speeded-Up Robust Features (SURF), allowing fast computation of rectangular features at constant speed, independent of filter size. For resource-constrained real-time embedded vision systems, computation and storage of integral image presents several design challenges due to strict timing and hardware limitations. Although calculation of the integral image only consists of simple addition operations, the total number of operations is large owing to the generally large size of image data. Recursive equations allow substantial decrease in the number of operations but require calculation in a serial fashion. This paper presents two new hardware algorithms that are based on the decomposition of these recursive equations, allowing calculation of up to four integral image values in a row-parallel way without significantly increasing the number of operations. An efficient design strategy is also proposed for a parallel integral image computation unit to reduce the size of the required internal memory (nearly 35% for common HD video). Addressing the storage problem of integral image in embedded vision systems, the paper presents two algorithms which allow substantial decrease (at least 44.44%) in the memory requirements. Finally, the paper provides a case study that highlights the utility of the proposed architectures in embedded vision systems. PMID:26184211
Integral Images: Efficient Algorithms for Their Computation and Storage in Resource-Constrained Embedded Vision Systems.

PubMed

Ehsan, Shoaib; Clark, Adrian F; Naveed ur Rehman; McDonald-Maier, Klaus D

2015-07-10

The integral image, an intermediate image representation, has found extensive use in multi-scale local feature detection algorithms, such as Speeded-Up Robust Features (SURF), allowing fast computation of rectangular features at constant speed, independent of filter size. For resource-constrained real-time embedded vision systems, computation and storage of integral image presents several design challenges due to strict timing and hardware limitations. Although calculation of the integral image only consists of simple addition operations, the total number of operations is large owing to the generally large size of image data. Recursive equations allow substantial decrease in the number of operations but require calculation in a serial fashion. This paper presents two new hardware algorithms that are based on the decomposition of these recursive equations, allowing calculation of up to four integral image values in a row-parallel way without significantly increasing the number of operations. An efficient design strategy is also proposed for a parallel integral image computation unit to reduce the size of the required internal memory (nearly 35% for common HD video). Addressing the storage problem of integral image in embedded vision systems, the paper presents two algorithms which allow substantial decrease (at least 44.44%) in the memory requirements. Finally, the paper provides a case study that highlights the utility of the proposed architectures in embedded vision systems.
Real-time image dehazing using local adaptive neighborhoods and dark-channel-prior

NASA Astrophysics Data System (ADS)

Valderrama, Jesus A.; Díaz-Ramírez, Víctor H.; Kober, Vitaly; Hernandez, Enrique

2015-09-01

A real-time algorithm for single image dehazing is presented. The algorithm is based on calculation of local neighborhoods of a hazed image inside a moving window. The local neighborhoods are constructed by computing rank-order statistics. Next the dark-channel-prior approach is applied to the local neighborhoods to estimate the transmission function of the scene. By using the suggested approach there is no need for applying a refining algorithm to the estimated transmission such as the soft matting algorithm. To achieve high-rate signal processing the proposed algorithm is implemented exploiting massive parallelism on a graphics processing unit (GPU). Computer simulation results are carried out to test the performance of the proposed algorithm in terms of dehazing efficiency and speed of processing. These tests are performed using several synthetic and real images. The obtained results are analyzed and compared with those obtained with existing dehazing algorithms.
Applied Distributed Model Predictive Control for Energy Efficient Buildings and Ramp Metering

NASA Astrophysics Data System (ADS)

Koehler, Sarah Muraoka

Industrial large-scale control problems present an interesting algorithmic design challenge. A number of controllers must cooperate in real-time on a network of embedded hardware with limited computing power in order to maximize system efficiency while respecting constraints and despite communication delays. Model predictive control (MPC) can automatically synthesize a centralized controller which optimizes an objective function subject to a system model, constraints, and predictions of disturbance. Unfortunately, the computations required by model predictive controllers for large-scale systems often limit its industrial implementation only to medium-scale slow processes. Distributed model predictive control (DMPC) enters the picture as a way to decentralize a large-scale model predictive control problem. The main idea of DMPC is to split the computations required by the MPC problem amongst distributed processors that can compute in parallel and communicate iteratively to find a solution. Some popularly proposed solutions are distributed optimization algorithms such as dual decomposition and the alternating direction method of multipliers (ADMM). However, these algorithms ignore two practical challenges: substantial communication delays present in control systems and also problem non-convexity. This thesis presents two novel and practically effective DMPC algorithms. The first DMPC algorithm is based on a primal-dual active-set method which achieves fast convergence, making it suitable for large-scale control applications which have a large communication delay across its communication network. In particular, this algorithm is suited for MPC problems with a quadratic cost, linear dynamics, forecasted demand, and box constraints. We measure the performance of this algorithm and show that it significantly outperforms both dual decomposition and ADMM in the presence of communication delay. The second DMPC algorithm is based on an inexact interior point method which is suited for nonlinear optimization problems. The parallel computation of the algorithm exploits iterative linear algebra methods for the main linear algebra computations in the algorithm. We show that the splitting of the algorithm is flexible and can thus be applied to various distributed platform configurations. The two proposed algorithms are applied to two main energy and transportation control problems. The first application is energy efficient building control. Buildings represent 40% of energy consumption in the United States. Thus, it is significant to improve the energy efficiency of buildings. The goal is to minimize energy consumption subject to the physics of the building (e.g. heat transfer laws), the constraints of the actuators as well as the desired operating constraints (thermal comfort of the occupants), and heat load on the system. In this thesis, we describe the control systems of forced air building systems in practice. We discuss the "Trim and Respond" algorithm which is a distributed control algorithm that is used in practice, and show that it performs similarly to a one-step explicit DMPC algorithm. Then, we apply the novel distributed primal-dual active-set method and provide extensive numerical results for the building MPC problem. The second main application is the control of ramp metering signals to optimize traffic flow through a freeway system. This application is particularly important since urban congestion has more than doubled in the past few decades. The ramp metering problem is to maximize freeway throughput subject to freeway dynamics (derived from mass conservation), actuation constraints, freeway capacity constraints, and predicted traffic demand. In this thesis, we develop a hybrid model predictive controller for ramp metering that is guaranteed to be persistently feasible and stable. This contrasts to previous work on MPC for ramp metering where such guarantees are absent. We apply a smoothing method to the hybrid model predictive controller and apply the inexact interior point method to this nonlinear non-convex ramp metering problem.
An introduction to quantum machine learning

NASA Astrophysics Data System (ADS)

Schuld, Maria; Sinayskiy, Ilya; Petruccione, Francesco

2015-04-01

Machine learning algorithms learn a desired input-output relation from examples in order to interpret new inputs. This is important for tasks such as image and speech recognition or strategy optimisation, with growing applications in the IT industry. In the last couple of years, researchers investigated if quantum computing can help to improve classical machine learning algorithms. Ideas range from running computationally costly algorithms or their subroutines efficiently on a quantum computer to the translation of stochastic methods into the language of quantum theory. This contribution gives a systematic overview of the emerging field of quantum machine learning. It presents the approaches as well as technical details in an accessible way, and discusses the potential of a future theory of quantum learning.
A Strassen-Newton algorithm for high-speed parallelizable matrix inversion

NASA Technical Reports Server (NTRS)

Bailey, David H.; Ferguson, Helaman R. P.

1988-01-01

Techniques are described for computing matrix inverses by algorithms that are highly suited to massively parallel computation. The techniques are based on an algorithm suggested by Strassen (1969). Variations of this scheme use matrix Newton iterations and other methods to improve the numerical stability while at the same time preserving a very high level of parallelism. One-processor Cray-2 implementations of these schemes range from one that is up to 55 percent faster than a conventional library routine to one that is slower than a library routine but achieves excellent numerical stability. The problem of computing the solution to a single set of linear equations is discussed, and it is shown that this problem can also be solved efficiently using these techniques.
Colour based fire detection method with temporal intensity variation filtration

NASA Astrophysics Data System (ADS)

Trambitckii, K.; Anding, K.; Musalimov, V.; Linß, G.

2015-02-01

Development of video, computing technologies and computer vision gives a possibility of automatic fire detection on video information. Under that project different algorithms was implemented to find more efficient way of fire detection. In that article colour based fire detection algorithm is described. But it is not enough to use only colour information to detect fire properly. The main reason of this is that in the shooting conditions may be a lot of things having colour similar to fire. A temporary intensity variation of pixels is used to separate them from the fire. These variations are averaged over the series of several frames. This algorithm shows robust work and was realised as a computer program by using of the OpenCV library.
A fast bottom-up algorithm for computing the cut sets of noncoherent fault trees

DOE Office of Scientific and Technical Information (OSTI.GOV)

Corynen, G.C.

1987-11-01

An efficient procedure for finding the cut sets of large fault trees has been developed. Designed to address coherent or noncoherent systems, dependent events, shared or common-cause events, the method - called SHORTCUT - is based on a fast algorithm for transforming a noncoherent tree into a quasi-coherent tree (COHERE), and on a new algorithm for reducing cut sets (SUBSET). To assure sufficient clarity and precision, the procedure is discussed in the language of simple sets, which is also developed in this report. Although the new method has not yet been fully implemented on the computer, we report theoretical worst-casemore » estimates of its computational complexity. 12 refs., 10 figs.« less
A cellular automata based FPGA realization of a new metaheuristic bat-inspired algorithm

NASA Astrophysics Data System (ADS)

Progias, Pavlos; Amanatiadis, Angelos A.; Spataro, William; Trunfio, Giuseppe A.; Sirakoulis, Georgios Ch.

2016-10-01

Optimization algorithms are often inspired by processes occuring in nature, such as animal behavioral patterns. The main concern with implementing such algorithms in software is the large amounts of processing power they require. In contrast to software code, that can only perform calculations in a serial manner, an implementation in hardware, exploiting the inherent parallelism of single-purpose processors, can prove to be much more efficient both in speed and energy consumption. Furthermore, the use of Cellular Automata (CA) in such an implementation would be efficient both as a model for natural processes, as well as a computational paradigm implemented well on hardware. In this paper, we propose a VHDL implementation of a metaheuristic algorithm inspired by the echolocation behavior of bats. More specifically, the CA model is inspired by the metaheuristic algorithm proposed earlier in the literature, which could be considered at least as efficient than other existing optimization algorithms. The function of the FPGA implementation of our algorithm is explained in full detail and results of our simulations are also demonstrated.
Adaptive Wavelet Modeling of Geophysical Data

NASA Astrophysics Data System (ADS)

Plattner, A.; Maurer, H.; Dahmen, W.; Vorloeper, J.

2009-12-01

Despite the ever-increasing power of modern computers, realistic modeling of complex three-dimensional Earth models is still a challenging task and requires substantial computing resources. The overwhelming majority of current geophysical modeling approaches includes either finite difference or non-adaptive finite element algorithms, and variants thereof. These numerical methods usually require the subsurface to be discretized with a fine mesh to accurately capture the behavior of the physical fields. However, this may result in excessive memory consumption and computing times. A common feature of most of these algorithms is that the modeled data discretizations are independent of the model complexity, which may be wasteful when there are only minor to moderate spatial variations in the subsurface parameters. Recent developments in the theory of adaptive numerical solvers have the potential to overcome this problem. Here, we consider an adaptive wavelet based approach that is applicable to a large scope of problems, also including nonlinear problems. To the best of our knowledge such algorithms have not yet been applied in geophysics. Adaptive wavelet algorithms offer several attractive features: (i) for a given subsurface model, they allow the forward modeling domain to be discretized with a quasi minimal number of degrees of freedom, (ii) sparsity of the associated system matrices is guaranteed, which makes the algorithm memory efficient, and (iii) the modeling accuracy scales linearly with computing time. We have implemented the adaptive wavelet algorithm for solving three-dimensional geoelectric problems. To test its performance, numerical experiments were conducted with a series of conductivity models exhibiting varying degrees of structural complexity. Results were compared with a non-adaptive finite element algorithm, which incorporates an unstructured mesh to best fit subsurface boundaries. Such algorithms represent the current state-of-the-art in geoelectrical modeling. An analysis of the numerical accuracy as a function of the number of degrees of freedom revealed that the adaptive wavelet algorithm outperforms the finite element solver for simple and moderately complex models, whereas the results become comparable for models with spatially highly variable electrical conductivities. The linear dependency of the modeling error and the computing time proved to be model-independent. This feature will allow very efficient computations using large-scale models as soon as our experimental code is optimized in terms of its implementation.
Efficient Fingercode Classification

NASA Astrophysics Data System (ADS)

Sun, Hong-Wei; Law, Kwok-Yan; Gollmann, Dieter; Chung, Siu-Leung; Li, Jian-Bin; Sun, Jia-Guang

In this paper, we present an efficient fingerprint classification algorithm which is an essential component in many critical security application systems e. g. systems in the e-government and e-finance domains. Fingerprint identification is one of the most important security requirements in homeland security systems such as personnel screening and anti-money laundering. The problem of fingerprint identification involves searching (matching) the fingerprint of a person against each of the fingerprints of all registered persons. To enhance performance and reliability, a common approach is to reduce the search space by firstly classifying the fingerprints and then performing the search in the respective class. Jain et al. proposed a fingerprint classification algorithm based on a two-stage classifier, which uses a K-nearest neighbor classifier in its first stage. The fingerprint classification algorithm is based on the fingercode representation which is an encoding of fingerprints that has been demonstrated to be an effective fingerprint biometric scheme because of its ability to capture both local and global details in a fingerprint image. We enhance this approach by improving the efficiency of the K-nearest neighbor classifier for fingercode-based fingerprint classification. Our research firstly investigates the various fast search algorithms in vector quantization (VQ) and the potential application in fingerprint classification, and then proposes two efficient algorithms based on the pyramid-based search algorithms in VQ. Experimental results on DB1 of FVC 2004 demonstrate that our algorithms can outperform the full search algorithm and the original pyramid-based search algorithms in terms of computational efficiency without sacrificing accuracy.
Implementation and evaluation of ILLIAC 4 algorithms for multispectral image processing

NASA Technical Reports Server (NTRS)

Swain, P. H.

1974-01-01

Data concerning a multidisciplinary and multi-organizational effort to implement multispectral data analysis algorithms on a revolutionary computer, the Illiac 4, are reported. The effectiveness and efficiency of implementing the digital multispectral data analysis techniques for producing useful land use classifications from satellite collected data were demonstrated.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Pastore, Giovanni; Rabiti, Cristian; Pizzocri, Davide

PolyPole is a numerical algorithm for the calculation of intra-granular fission gas release. In particular, the algorithm solves the gas diffusion problem in a fuel grain in time-varying conditions. The program has been extensively tested. PolyPole combines a high accuracy with a high computational efficiency and is ideally suited for application in fuel performance codes.
SLIC superpixels compared to state-of-the-art superpixel methods.

PubMed

Achanta, Radhakrishna; Shaji, Appu; Smith, Kevin; Lucchi, Aurelien; Fua, Pascal; Süsstrunk, Sabine

2012-11-01

Computer vision applications have come to rely increasingly on superpixels in recent years, but it is not always clear what constitutes a good superpixel algorithm. In an effort to understand the benefits and drawbacks of existing methods, we empirically compare five state-of-the-art superpixel algorithms for their ability to adhere to image boundaries, speed, memory efficiency, and their impact on segmentation performance. We then introduce a new superpixel algorithm, simple linear iterative clustering (SLIC), which adapts a k-means clustering approach to efficiently generate superpixels. Despite its simplicity, SLIC adheres to boundaries as well as or better than previous methods. At the same time, it is faster and more memory efficient, improves segmentation performance, and is straightforward to extend to supervoxel generation.
A Formal Algorithm for Routing Traces on a Printed Circuit Board

NASA Technical Reports Server (NTRS)

Hedgley, David R., Jr.

1996-01-01

This paper addresses the classical problem of printed circuit board routing: that is, the problem of automatic routing by a computer other than by brute force that causes the execution time to grow exponentially as a function of the complexity. Most of the present solutions are either inexpensive but not efficient and fast, or efficient and fast but very costly. Many solutions are proprietary, so not much is written or known about the actual algorithms upon which these solutions are based. This paper presents a formal algorithm for routing traces on a print- ed circuit board. The solution presented is very fast and efficient and for the first time speaks to the question eloquently by way of symbolic statements.

Survivable algorithms and redundancy management in NASA's distributed computing systems

NASA Technical Reports Server (NTRS)

Malek, Miroslaw

1992-01-01

The design of survivable algorithms requires a solid foundation for executing them. While hardware techniques for fault-tolerant computing are relatively well understood, fault-tolerant operating systems, as well as fault-tolerant applications (survivable algorithms), are, by contrast, little understood, and much more work in this field is required. We outline some of our work that contributes to the foundation of ultrareliable operating systems and fault-tolerant algorithm design. We introduce our consensus-based framework for fault-tolerant system design. This is followed by a description of a hierarchical partitioning method for efficient consensus. A scheduler for redundancy management is introduced, and application-specific fault tolerance is described. We give an overview of our hybrid algorithm technique, which is an alternative to the formal approach given.
Local Competition-Based Superpixel Segmentation Algorithm in Remote Sensing

PubMed Central

Liu, Jiayin; Tang, Zhenmin; Cui, Ying; Wu, Guoxing

2017-01-01

Remote sensing technologies have been widely applied in urban environments’ monitoring, synthesis and modeling. Incorporating spatial information in perceptually coherent regions, superpixel-based approaches can effectively eliminate the “salt and pepper” phenomenon which is common in pixel-wise approaches. Compared with fixed-size windows, superpixels have adaptive sizes and shapes for different spatial structures. Moreover, superpixel-based algorithms can significantly improve computational efficiency owing to the greatly reduced number of image primitives. Hence, the superpixel algorithm, as a preprocessing technique, is more and more popularly used in remote sensing and many other fields. In this paper, we propose a superpixel segmentation algorithm called Superpixel Segmentation with Local Competition (SSLC), which utilizes a local competition mechanism to construct energy terms and label pixels. The local competition mechanism leads to energy terms locality and relativity, and thus, the proposed algorithm is less sensitive to the diversity of image content and scene layout. Consequently, SSLC could achieve consistent performance in different image regions. In addition, the Probability Density Function (PDF), which is estimated by Kernel Density Estimation (KDE) with the Gaussian kernel, is introduced to describe the color distribution of superpixels as a more sophisticated and accurate measure. To reduce computational complexity, a boundary optimization framework is introduced to only handle boundary pixels instead of the whole image. We conduct experiments to benchmark the proposed algorithm with the other state-of-the-art ones on the Berkeley Segmentation Dataset (BSD) and remote sensing images. Results demonstrate that the SSLC algorithm yields the best overall performance, while the computation time-efficiency is still competitive. PMID:28604641
A Computationally Efficient Visual Saliency Algorithm Suitable for an Analog CMOS Implementation.

PubMed

D'Angelo, Robert; Wood, Richard; Lowry, Nathan; Freifeld, Geremy; Huang, Haiyao; Salthouse, Christopher D; Hollosi, Brent; Muresan, Matthew; Uy, Wes; Tran, Nhut; Chery, Armand; Poppe, Dorothy C; Sonkusale, Sameer

2018-06-27

Computer vision algorithms are often limited in their application by the large amount of data that must be processed. Mammalian vision systems mitigate this high bandwidth requirement by prioritizing certain regions of the visual field with neural circuits that select the most salient regions. This work introduces a novel and computationally efficient visual saliency algorithm for performing this neuromorphic attention-based data reduction. The proposed algorithm has the added advantage that it is compatible with an analog CMOS design while still achieving comparable performance to existing state-of-the-art saliency algorithms. This compatibility allows for direct integration with the analog-to-digital conversion circuitry present in CMOS image sensors. This integration leads to power savings in the converter by quantizing only the salient pixels. Further system-level power savings are gained by reducing the amount of data that must be transmitted and processed in the digital domain. The analog CMOS compatible formulation relies on a pulse width (i.e., time mode) encoding of the pixel data that is compatible with pulse-mode imagers and slope based converters often used in imager designs. This letter begins by discussing this time-mode encoding for implementing neuromorphic architectures. Next, the proposed algorithm is derived. Hardware-oriented optimizations and modifications to this algorithm are proposed and discussed. Next, a metric for quantifying saliency accuracy is proposed, and simulation results of this metric are presented. Finally, an analog synthesis approach for a time-mode architecture is outlined, and postsynthesis transistor-level simulations that demonstrate functionality of an implementation in a modern CMOS process are discussed.
Local Competition-Based Superpixel Segmentation Algorithm in Remote Sensing.

PubMed

Liu, Jiayin; Tang, Zhenmin; Cui, Ying; Wu, Guoxing

2017-06-12

Remote sensing technologies have been widely applied in urban environments' monitoring, synthesis and modeling. Incorporating spatial information in perceptually coherent regions, superpixel-based approaches can effectively eliminate the "salt and pepper" phenomenon which is common in pixel-wise approaches. Compared with fixed-size windows, superpixels have adaptive sizes and shapes for different spatial structures. Moreover, superpixel-based algorithms can significantly improve computational efficiency owing to the greatly reduced number of image primitives. Hence, the superpixel algorithm, as a preprocessing technique, is more and more popularly used in remote sensing and many other fields. In this paper, we propose a superpixel segmentation algorithm called Superpixel Segmentation with Local Competition (SSLC), which utilizes a local competition mechanism to construct energy terms and label pixels. The local competition mechanism leads to energy terms locality and relativity, and thus, the proposed algorithm is less sensitive to the diversity of image content and scene layout. Consequently, SSLC could achieve consistent performance in different image regions. In addition, the Probability Density Function (PDF), which is estimated by Kernel Density Estimation (KDE) with the Gaussian kernel, is introduced to describe the color distribution of superpixels as a more sophisticated and accurate measure. To reduce computational complexity, a boundary optimization framework is introduced to only handle boundary pixels instead of the whole image. We conduct experiments to benchmark the proposed algorithm with the other state-of-the-art ones on the Berkeley Segmentation Dataset (BSD) and remote sensing images. Results demonstrate that the SSLC algorithm yields the best overall performance, while the computation time-efficiency is still competitive.
Staff | Computational Science | NREL

Science.gov Websites

develops and leads laboratory-wide efforts in high-performance computing and energy-efficient data centers Professional IV-High Perf Computing Jim.Albin@nrel.gov 303-275-4069 Ananthan, Shreyas Senior Scientist - High -Performance Algorithms and Modeling Shreyas.Ananthan@nrel.gov 303-275-4807 Bendl, Kurt IT Professional IV-High
LCC-Demons: a robust and accurate symmetric diffeomorphic registration algorithm.

PubMed

Lorenzi, M; Ayache, N; Frisoni, G B; Pennec, X

2013-11-01

Non-linear registration is a key instrument for computational anatomy to study the morphology of organs and tissues. However, in order to be an effective instrument for the clinical practice, registration algorithms must be computationally efficient, accurate and most importantly robust to the multiple biases affecting medical images. In this work we propose a fast and robust registration framework based on the log-Demons diffeomorphic registration algorithm. The transformation is parameterized by stationary velocity fields (SVFs), and the similarity metric implements a symmetric local correlation coefficient (LCC). Moreover, we show how the SVF setting provides a stable and consistent numerical scheme for the computation of the Jacobian determinant and the flux of the deformation across the boundaries of a given region. Thus, it provides a robust evaluation of spatial changes. We tested the LCC-Demons in the inter-subject registration setting, by comparing with state-of-the-art registration algorithms on public available datasets, and in the intra-subject longitudinal registration problem, for the statistically powered measurements of the longitudinal atrophy in Alzheimer's disease. Experimental results show that LCC-Demons is a generic, flexible, efficient and robust algorithm for the accurate non-linear registration of images, which can find several applications in the field of medical imaging. Without any additional optimization, it solves equally well intra & inter-subject registration problems, and compares favorably to state-of-the-art methods. Copyright © 2013 Elsevier Inc. All rights reserved.
A comparison of algorithms for inference and learning in probabilistic graphical models.

PubMed

Frey, Brendan J; Jojic, Nebojsa

2005-09-01

Research into methods for reasoning under uncertainty is currently one of the most exciting areas of artificial intelligence, largely because it has recently become possible to record, store, and process large amounts of data. While impressive achievements have been made in pattern classification problems such as handwritten character recognition, face detection, speaker identification, and prediction of gene function, it is even more exciting that researchers are on the verge of introducing systems that can perform large-scale combinatorial analyses of data, decomposing the data into interacting components. For example, computational methods for automatic scene analysis are now emerging in the computer vision community. These methods decompose an input image into its constituent objects, lighting conditions, motion patterns, etc. Two of the main challenges are finding effective representations and models in specific applications and finding efficient algorithms for inference and learning in these models. In this paper, we advocate the use of graph-based probability models and their associated inference and learning algorithms. We review exact techniques and various approximate, computationally efficient techniques, including iterated conditional modes, the expectation maximization (EM) algorithm, Gibbs sampling, the mean field method, variational techniques, structured variational techniques and the sum-product algorithm ("loopy" belief propagation). We describe how each technique can be applied in a vision model of multiple, occluding objects and contrast the behaviors and performances of the techniques using a unifying cost function, free energy.
An evolutionary computation based algorithm for calculating solar differential rotation by automatic tracking of coronal bright points

NASA Astrophysics Data System (ADS)

Shahamatnia, Ehsan; Dorotovič, Ivan; Fonseca, Jose M.; Ribeiro, Rita A.

2016-03-01

Developing specialized software tools is essential to support studies of solar activity evolution. With new space missions such as Solar Dynamics Observatory (SDO), solar images are being produced in unprecedented volumes. To capitalize on that huge data availability, the scientific community needs a new generation of software tools for automatic and efficient data processing. In this paper a prototype of a modular framework for solar feature detection, characterization, and tracking is presented. To develop an efficient system capable of automatic solar feature tracking and measuring, a hybrid approach combining specialized image processing, evolutionary optimization, and soft computing algorithms is being followed. The specialized hybrid algorithm for tracking solar features allows automatic feature tracking while gathering characterization details about the tracked features. The hybrid algorithm takes advantages of the snake model, a specialized image processing algorithm widely used in applications such as boundary delineation, image segmentation, and object tracking. Further, it exploits the flexibility and efficiency of Particle Swarm Optimization (PSO), a stochastic population based optimization algorithm. PSO has been used successfully in a wide range of applications including combinatorial optimization, control, clustering, robotics, scheduling, and image processing and video analysis applications. The proposed tool, denoted PSO-Snake model, was already successfully tested in other works for tracking sunspots and coronal bright points. In this work, we discuss the application of the PSO-Snake algorithm for calculating the sidereal rotational angular velocity of the solar corona. To validate the results we compare them with published manual results performed by an expert.
Computationally Efficient Multiconfigurational Reactive Molecular Dynamics

PubMed Central

Yamashita, Takefumi; Peng, Yuxing; Knight, Chris; Voth, Gregory A.

2012-01-01

It is a computationally demanding task to explicitly simulate the electronic degrees of freedom in a system to observe the chemical transformations of interest, while at the same time sampling the time and length scales required to converge statistical properties and thus reduce artifacts due to initial conditions, finite-size effects, and limited sampling. One solution that significantly reduces the computational expense consists of molecular models in which effective interactions between particles govern the dynamics of the system. If the interaction potentials in these models are developed to reproduce calculated properties from electronic structure calculations and/or ab initio molecular dynamics simulations, then one can calculate accurate properties at a fraction of the computational cost. Multiconfigurational algorithms model the system as a linear combination of several chemical bonding topologies to simulate chemical reactions, also sometimes referred to as “multistate”. These algorithms typically utilize energy and force calculations already found in popular molecular dynamics software packages, thus facilitating their implementation without significant changes to the structure of the code. However, the evaluation of energies and forces for several bonding topologies per simulation step can lead to poor computational efficiency if redundancy is not efficiently removed, particularly with respect to the calculation of long-ranged Coulombic interactions. This paper presents accurate approximations (effective long-range interaction and resulting hybrid methods) and multiple-program parallelization strategies for the efficient calculation of electrostatic interactions in reactive molecular simulations. PMID:25100924
An Improved Nested Sampling Algorithm for Model Selection and Assessment

NASA Astrophysics Data System (ADS)

Zeng, X.; Ye, M.; Wu, J.; WANG, D.

2017-12-01

Multimodel strategy is a general approach for treating model structure uncertainty in recent researches. The unknown groundwater system is represented by several plausible conceptual models. Each alternative conceptual model is attached with a weight which represents the possibility of this model. In Bayesian framework, the posterior model weight is computed as the product of model prior weight and marginal likelihood (or termed as model evidence). As a result, estimating marginal likelihoods is crucial for reliable model selection and assessment in multimodel analysis. Nested sampling estimator (NSE) is a new proposed algorithm for marginal likelihood estimation. The implementation of NSE comprises searching the parameters' space from low likelihood area to high likelihood area gradually, and this evolution is finished iteratively via local sampling procedure. Thus, the efficiency of NSE is dominated by the strength of local sampling procedure. Currently, Metropolis-Hasting (M-H) algorithm and its variants are often used for local sampling in NSE. However, M-H is not an efficient sampling algorithm for high-dimensional or complex likelihood function. For improving the performance of NSE, it could be feasible to integrate more efficient and elaborated sampling algorithm - DREAMzs into the local sampling. In addition, in order to overcome the computation burden problem of large quantity of repeating model executions in marginal likelihood estimation, an adaptive sparse grid stochastic collocation method is used to build the surrogates for original groundwater model.
On improving the algorithm efficiency in the particle-particle force calculations

NASA Astrophysics Data System (ADS)

Kozynchenko, Alexander I.; Kozynchenko, Sergey A.

2016-09-01

The problem of calculating inter-particle forces in the particle-particle (PP) simulation models takes an important place in scientific computing. Such simulation models are used in diverse scientific applications arising in astrophysics, plasma physics, particle accelerators, etc., where the long-range forces are considered. The inverse-square laws such as Coulomb's law of electrostatic forces and Newton's law of universal gravitation are the examples of laws pertaining to the long-range forces. The standard naïve PP method outlined, for example, by Hockney and Eastwood [1] is straightforward, processing all pairs of particles in a double nested loop. The PP algorithm provides the best accuracy of all possible methods, but its computational complexity is O (Np2), where Np is a total number of particles involved. Too low efficiency of the PP algorithm seems to be the challenging issue in some cases where the high accuracy is required. An example can be taken from the charged particle beam dynamics where, under computing the own space charge of the beam, so-called macro-particles are used (see e.g., Humphries Jr. [2], Kozynchenko and Svistunov [3]).
Comparative Evaluation of Different Optimization Algorithms for Structural Design Applications

NASA Technical Reports Server (NTRS)

Patnaik, Surya N.; Coroneos, Rula M.; Guptill, James D.; Hopkins, Dale A.

1996-01-01

Non-linear programming algorithms play an important role in structural design optimization. Fortunately, several algorithms with computer codes are available. At NASA Lewis Research Centre, a project was initiated to assess the performance of eight different optimizers through the development of a computer code CometBoards. This paper summarizes the conclusions of that research. CometBoards was employed to solve sets of small, medium and large structural problems, using the eight different optimizers on a Cray-YMP8E/8128 computer. The reliability and efficiency of the optimizers were determined from the performance of these problems. For small problems, the performance of most of the optimizers could be considered adequate. For large problems, however, three optimizers (two sequential quadratic programming routines, DNCONG of IMSL and SQP of IDESIGN, along with Sequential Unconstrained Minimizations Technique SUMT) outperformed others. At optimum, most optimizers captured an identical number of active displacement and frequency constraints but the number of active stress constraints differed among the optimizers. This discrepancy can be attributed to singularity conditions in the optimization and the alleviation of this discrepancy can improve the efficiency of optimizers.
Phase unwrapping with graph cuts optimization and dual decomposition acceleration for 3D high-resolution MRI data.

PubMed

Dong, Jianwu; Chen, Feng; Zhou, Dong; Liu, Tian; Yu, Zhaofei; Wang, Yi

2017-03-01

Existence of low SNR regions and rapid-phase variations pose challenges to spatial phase unwrapping algorithms. Global optimization-based phase unwrapping methods are widely used, but are significantly slower than greedy methods. In this paper, dual decomposition acceleration is introduced to speed up a three-dimensional graph cut-based phase unwrapping algorithm. The phase unwrapping problem is formulated as a global discrete energy minimization problem, whereas the technique of dual decomposition is used to increase the computational efficiency by splitting the full problem into overlapping subproblems and enforcing the congruence of overlapping variables. Using three dimensional (3D) multiecho gradient echo images from an agarose phantom and five brain hemorrhage patients, we compared this proposed method with an unaccelerated graph cut-based method. Experimental results show up to 18-fold acceleration in computation time. Dual decomposition significantly improves the computational efficiency of 3D graph cut-based phase unwrapping algorithms. Magn Reson Med 77:1353-1358, 2017. © 2016 International Society for Magnetic Resonance in Medicine. © 2016 International Society for Magnetic Resonance in Medicine.
Performance Trend of Different Algorithms for Structural Design Optimization

NASA Technical Reports Server (NTRS)

Patnaik, Surya N.; Coroneos, Rula M.; Guptill, James D.; Hopkins, Dale A.

1996-01-01

Nonlinear programming algorithms play an important role in structural design optimization. Fortunately, several algorithms with computer codes are available. At NASA Lewis Research Center, a project was initiated to assess performance of different optimizers through the development of a computer code CometBoards. This paper summarizes the conclusions of that research. CometBoards was employed to solve sets of small, medium and large structural problems, using different optimizers on a Cray-YMP8E/8128 computer. The reliability and efficiency of the optimizers were determined from the performance of these problems. For small problems, the performance of most of the optimizers could be considered adequate. For large problems however, three optimizers (two sequential quadratic programming routines, DNCONG of IMSL and SQP of IDESIGN, along with the sequential unconstrained minimizations technique SUMT) outperformed others. At optimum, most optimizers captured an identical number of active displacement and frequency constraints but the number of active stress constraints differed among the optimizers. This discrepancy can be attributed to singularity conditions in the optimization and the alleviation of this discrepancy can improve the efficiency of optimizers.
Efficient irregular wavefront propagation algorithms on Intel® Xeon Phi™

PubMed Central

Gomes, Jeremias M.; Teodoro, George; de Melo, Alba; Kong, Jun; Kurc, Tahsin; Saltz, Joel H.

2016-01-01

We investigate the execution of the Irregular Wavefront Propagation Pattern (IWPP), a fundamental computing structure used in several image analysis operations, on the Intel® Xeon Phi™ co-processor. An efficient implementation of IWPP on the Xeon Phi is a challenging problem because of IWPP’s irregularity and the use of atomic instructions in the original IWPP algorithm to resolve race conditions. On the Xeon Phi, the use of SIMD and vectorization instructions is critical to attain high performance. However, SIMD atomic instructions are not supported. Therefore, we propose a new IWPP algorithm that can take advantage of the supported SIMD instruction set. We also evaluate an alternate storage container (priority queue) to track active elements in the wavefront in an effort to improve the parallel algorithm efficiency. The new IWPP algorithm is evaluated with Morphological Reconstruction and Imfill operations as use cases. Our results show performance improvements of up to 5.63× on top of the original IWPP due to vectorization. Moreover, the new IWPP achieves speedups of 45.7× and 1.62×, respectively, as compared to efficient CPU and GPU implementations. PMID:27298591
Efficient irregular wavefront propagation algorithms on Intel® Xeon Phi™.

PubMed

Gomes, Jeremias M; Teodoro, George; de Melo, Alba; Kong, Jun; Kurc, Tahsin; Saltz, Joel H

2015-10-01

We investigate the execution of the Irregular Wavefront Propagation Pattern (IWPP), a fundamental computing structure used in several image analysis operations, on the Intel ® Xeon Phi ™ co-processor. An efficient implementation of IWPP on the Xeon Phi is a challenging problem because of IWPP's irregularity and the use of atomic instructions in the original IWPP algorithm to resolve race conditions. On the Xeon Phi, the use of SIMD and vectorization instructions is critical to attain high performance. However, SIMD atomic instructions are not supported. Therefore, we propose a new IWPP algorithm that can take advantage of the supported SIMD instruction set. We also evaluate an alternate storage container (priority queue) to track active elements in the wavefront in an effort to improve the parallel algorithm efficiency. The new IWPP algorithm is evaluated with Morphological Reconstruction and Imfill operations as use cases. Our results show performance improvements of up to 5.63 × on top of the original IWPP due to vectorization. Moreover, the new IWPP achieves speedups of 45.7 × and 1.62 × , respectively, as compared to efficient CPU and GPU implementations.
Automated flight path planning for virtual endoscopy.

PubMed

Paik, D S; Beaulieu, C F; Jeffrey, R B; Rubin, G D; Napel, S

1998-05-01

In this paper, a novel technique for rapid and automatic computation of flight paths for guiding virtual endoscopic exploration of three-dimensional medical images is described. While manually planning flight paths is a tedious and time consuming task, our algorithm is automated and fast. Our method for positioning the virtual camera is based on the medial axis transform but is much more computationally efficient. By iteratively correcting a path toward the medial axis, the necessity of evaluating simple point criteria during morphological thinning is eliminated. The virtual camera is also oriented in a stable viewing direction, avoiding sudden twists and turns. We tested our algorithm on volumetric data sets of eight colons, one aorta and one bronchial tree. The algorithm computed the flight paths in several minutes per volume on an inexpensive workstation with minimal computation time added for multiple paths through branching structures (10%-13% per extra path). The results of our algorithm are smooth, centralized paths that aid in the task of navigation in virtual endoscopic exploration of three-dimensional medical images.
Flood inundation extent mapping based on block compressed tracing

NASA Astrophysics Data System (ADS)

Shen, Dingtao; Rui, Yikang; Wang, Jiechen; Zhang, Yu; Cheng, Liang

2015-07-01

Flood inundation extent, depth, and duration are important factors affecting flood hazard evaluation. At present, flood inundation analysis is based mainly on a seeded region-growing algorithm, which is an inefficient process because it requires excessive recursive computations and it is incapable of processing massive datasets. To address this problem, we propose a block compressed tracing algorithm for mapping the flood inundation extent, which reads the DEM data in blocks before transferring them to raster compression storage. This allows a smaller computer memory to process a larger amount of data, which solves the problem of the regular seeded region-growing algorithm. In addition, the use of a raster boundary tracing technique allows the algorithm to avoid the time-consuming computations required by the seeded region-growing. Finally, we conduct a comparative evaluation in the Chin-sha River basin, results show that the proposed method solves the problem of flood inundation extent mapping based on massive DEM datasets with higher computational efficiency than the original method, which makes it suitable for practical applications.
Tail Biting Trellis Representation of Codes: Decoding and Construction

NASA Technical Reports Server (NTRS)

Shao. Rose Y.; Lin, Shu; Fossorier, Marc

1999-01-01

This paper presents two new iterative algorithms for decoding linear codes based on their tail biting trellises, one is unidirectional and the other is bidirectional. Both algorithms are computationally efficient and achieves virtually optimum error performance with a small number of decoding iterations. They outperform all the previous suboptimal decoding algorithms. The bidirectional algorithm also reduces decoding delay. Also presented in the paper is a method for constructing tail biting trellises for linear block codes.
Computationally efficient algorithms for Brownian dynamics simulation of long flexible macromolecules modeled as bead-rod chains

NASA Astrophysics Data System (ADS)

Moghani, Mahdy Malekzadeh; Khomami, Bamin

2017-02-01

The computational efficiency of Brownian dynamics (BD) simulation of the constrained model of a polymeric chain (bead-rod) with n beads and in the presence of hydrodynamic interaction (HI) is reduced to the order of n2 via an efficient algorithm which utilizes the conjugate-gradient (CG) method within a Picard iteration scheme. Moreover, the utility of the Barnes and Hut (BH) multipole method in BD simulation of polymeric solutions in the presence of HI, with regard to computational cost, scaling, and accuracy, is discussed. Overall, it is determined that this approach leads to a scaling of O (n1.2) . Furthermore, a stress algorithm is developed which accurately captures the transient stress growth in the startup of flow for the bead-rod model with HI and excluded volume (EV) interaction. Rheological properties of the chains up to n =350 in the presence of EV and HI are computed via the former algorithm. The result depicts qualitative differences in shear thinning behavior of the polymeric solutions in the intermediate values of the Weissenburg number (10

Visualizing Matrix Multiplication

ERIC Educational Resources Information Center

Daugulis, Peteris; Sondore, Anita

2018-01-01

Efficient visualizations of computational algorithms are important tools for students, educators, and researchers. In this article, we point out an innovative visualization technique for matrix multiplication. This method differs from the standard, formal approach by using block matrices to make computations more visual. We find this method a…
SPHINX--an algorithm for taxonomic binning of metagenomic sequences.

PubMed

Mohammed, Monzoorul Haque; Ghosh, Tarini Shankar; Singh, Nitin Kumar; Mande, Sharmila S

2011-01-01

Compared with composition-based binning algorithms, the binning accuracy and specificity of alignment-based binning algorithms is significantly higher. However, being alignment-based, the latter class of algorithms require enormous amount of time and computing resources for binning huge metagenomic datasets. The motivation was to develop a binning approach that can analyze metagenomic datasets as rapidly as composition-based approaches, but nevertheless has the accuracy and specificity of alignment-based algorithms. This article describes a hybrid binning approach (SPHINX) that achieves high binning efficiency by utilizing the principles of both 'composition'- and 'alignment'-based binning algorithms. Validation results with simulated sequence datasets indicate that SPHINX is able to analyze metagenomic sequences as rapidly as composition-based algorithms. Furthermore, the binning efficiency (in terms of accuracy and specificity of assignments) of SPHINX is observed to be comparable with results obtained using alignment-based algorithms. A web server for the SPHINX algorithm is available at http://metagenomics.atc.tcs.com/SPHINX/.
A robust multilevel simultaneous eigenvalue solver

NASA Technical Reports Server (NTRS)

Costiner, Sorin; Taasan, Shlomo

1993-01-01

Multilevel (ML) algorithms for eigenvalue problems are often faced with several types of difficulties such as: the mixing of approximated eigenvectors by the solution process, the approximation of incomplete clusters of eigenvectors, the poor representation of solution on coarse levels, and the existence of close or equal eigenvalues. Algorithms that do not treat appropriately these difficulties usually fail, or their performance degrades when facing them. These issues motivated the development of a robust adaptive ML algorithm which treats these difficulties, for the calculation of a few eigenvectors and their corresponding eigenvalues. The main techniques used in the new algorithm include: the adaptive completion and separation of the relevant clusters on different levels, the simultaneous treatment of solutions within each cluster, and the robustness tests which monitor the algorithm's efficiency and convergence. The eigenvectors' separation efficiency is based on a new ML projection technique generalizing the Rayleigh Ritz projection, combined with a technique, the backrotations. These separation techniques, when combined with an FMG formulation, in many cases lead to algorithms of O(qN) complexity, for q eigenvectors of size N on the finest level. Previously developed ML algorithms are less focused on the mentioned difficulties. Moreover, algorithms which employ fine level separation techniques are of O(q(sub 2)N) complexity and usually do not overcome all these difficulties. Computational examples are presented where Schrodinger type eigenvalue problems in 2-D and 3-D, having equal and closely clustered eigenvalues, are solved with the efficiency of the Poisson multigrid solver. A second order approximation is obtained in O(qN) work, where the total computational work is equivalent to only a few fine level relaxations per eigenvector.
Efficiency in nonequilibrium molecular dynamics Monte Carlo simulations

DOE PAGES

Radak, Brian K.; Roux, Benoît

2016-10-07

Hybrid algorithms combining nonequilibrium molecular dynamics and Monte Carlo (neMD/MC) offer a powerful avenue for improving the sampling efficiency of computer simulations of complex systems. These neMD/MC algorithms are also increasingly finding use in applications where conventional approaches are impractical, such as constant-pH simulations with explicit solvent. However, selecting an optimal nonequilibrium protocol for maximum efficiency often represents a non-trivial challenge. This work evaluates the efficiency of a broad class of neMD/MC algorithms and protocols within the theoretical framework of linear response theory. The approximations are validated against constant pH-MD simulations and shown to provide accurate predictions of neMD/MC performance.more » An assessment of a large set of protocols confirms (both theoretically and empirically) that a linear work protocol gives the best neMD/MC performance. Lastly, a well-defined criterion for optimizing the time parameters of the protocol is proposed and demonstrated with an adaptive algorithm that improves the performance on-the-fly with minimal cost.« less
Compound Event Barrier Coverage in Wireless Sensor Networks under Multi-Constraint Conditions.

PubMed

Zhuang, Yaoming; Wu, Chengdong; Zhang, Yunzhou; Jia, Zixi

2016-12-24

It is important to monitor compound event by barrier coverage issues in wireless sensor networks (WSNs). Compound event barrier coverage (CEBC) is a novel coverage problem. Unlike traditional ones, the data of compound event barrier coverage comes from different types of sensors. It will be subject to multiple constraints under complex conditions in real-world applications. The main objective of this paper is to design an efficient algorithm for complex conditions that can combine the compound event confidence. Moreover, a multiplier method based on an active-set strategy (ASMP) is proposed to optimize the multiple constraints in compound event barrier coverage. The algorithm can calculate the coverage ratio efficiently and allocate the sensor resources reasonably in compound event barrier coverage. The proposed algorithm can simplify complex problems to reduce the computational load of the network and improve the network efficiency. The simulation results demonstrate that the proposed algorithm is more effective and efficient than existing methods, especially in the allocation of sensor resources.
Compound Event Barrier Coverage in Wireless Sensor Networks under Multi-Constraint Conditions

PubMed Central

Zhuang, Yaoming; Wu, Chengdong; Zhang, Yunzhou; Jia, Zixi

2016-01-01

It is important to monitor compound event by barrier coverage issues in wireless sensor networks (WSNs). Compound event barrier coverage (CEBC) is a novel coverage problem. Unlike traditional ones, the data of compound event barrier coverage comes from different types of sensors. It will be subject to multiple constraints under complex conditions in real-world applications. The main objective of this paper is to design an efficient algorithm for complex conditions that can combine the compound event confidence. Moreover, a multiplier method based on an active-set strategy (ASMP) is proposed to optimize the multiple constraints in compound event barrier coverage. The algorithm can calculate the coverage ratio efficiently and allocate the sensor resources reasonably in compound event barrier coverage. The proposed algorithm can simplify complex problems to reduce the computational load of the network and improve the network efficiency. The simulation results demonstrate that the proposed algorithm is more effective and efficient than existing methods, especially in the allocation of sensor resources. PMID:28029118
Numerical Algorithms for Acoustic Integrals - The Devil is in the Details

NASA Technical Reports Server (NTRS)

Brentner, Kenneth S.

1996-01-01

The accurate prediction of the aeroacoustic field generated by aerospace vehicles or nonaerospace machinery is necessary for designers to control and reduce source noise. Powerful computational aeroacoustic methods, based on various acoustic analogies (primarily the Lighthill acoustic analogy) and Kirchhoff methods, have been developed for prediction of noise from complicated sources, such as rotating blades. Both methods ultimately predict the noise through a numerical evaluation of an integral formulation. In this paper, we consider three generic acoustic formulations and several numerical algorithms that have been used to compute the solutions to these formulations. Algorithms for retarded-time formulations are the most efficient and robust, but they are difficult to implement for supersonic-source motion. Collapsing-sphere and emission-surface formulations are good alternatives when supersonic-source motion is present, but the numerical implementations of these formulations are more computationally demanding. New algorithms - which utilize solution adaptation to provide a specified error level - are needed.
Computational Discovery of Materials Using the Firefly Algorithm

NASA Astrophysics Data System (ADS)

Avendaño-Franco, Guillermo; Romero, Aldo

Our current ability to model physical phenomena accurately, the increase computational power and better algorithms are the driving forces behind the computational discovery and design of novel materials, allowing for virtual characterization before their realization in the laboratory. We present the implementation of a novel firefly algorithm, a population-based algorithm for global optimization for searching the structure/composition space. This novel computation-intensive approach naturally take advantage of concurrency, targeted exploration and still keeping enough diversity. We apply the new method in both periodic and non-periodic structures and we present the implementation challenges and solutions to improve efficiency. The implementation makes use of computational materials databases and network analysis to optimize the search and get insights about the geometric structure of local minima on the energy landscape. The method has been implemented in our software PyChemia, an open-source package for materials discovery. We acknowledge the support of DMREF-NSF 1434897 and the Donors of the American Chemical Society Petroleum Research Fund for partial support of this research under Contract 54075-ND10.
Multigrid optimal mass transport for image registration and morphing

NASA Astrophysics Data System (ADS)

Rehman, Tauseef ur; Tannenbaum, Allen

2007-02-01

In this paper we present a computationally efficient Optimal Mass Transport algorithm. This method is based on the Monge-Kantorovich theory and is used for computing elastic registration and warping maps in image registration and morphing applications. This is a parameter free method which utilizes all of the grayscale data in an image pair in a symmetric fashion. No landmarks need to be specified for correspondence. In our work, we demonstrate significant improvement in computation time when our algorithm is applied as compared to the originally proposed method by Haker et al [1]. The original algorithm was based on a gradient descent method for removing the curl from an initial mass preserving map regarded as 2D vector field. This involves inverting the Laplacian in each iteration which is now computed using full multigrid technique resulting in an improvement in computational time by a factor of two. Greater improvement is achieved by decimating the curl in a multi-resolutional framework. The algorithm was applied to 2D short axis cardiac MRI images and brain MRI images for testing and comparison.
Smiles2Monomers: a link between chemical and biological structures for polymers.

PubMed

Dufresne, Yoann; Noé, Laurent; Leclère, Valérie; Pupin, Maude

2015-01-01

The monomeric composition of polymers is powerful for structure comparison and synthetic biology, among others. Many databases give access to the atomic structure of compounds but the monomeric structure of polymers is often lacking. We have designed a smart algorithm, implemented in the tool Smiles2Monomers (s2m), to infer efficiently and accurately the monomeric structure of a polymer from its chemical structure. Our strategy is divided into two steps: first, monomers are mapped on the atomic structure by an efficient subgraph-isomorphism algorithm ; second, the best tiling is computed so that non-overlapping monomers cover all the structure of the target polymer. The mapping is based on a Markovian index built by a dynamic programming algorithm. The index enables s2m to search quickly all the given monomers on a target polymer. After, a greedy algorithm combines the mapped monomers into a consistent monomeric structure. Finally, a local branch and cut algorithm refines the structure. We tested this method on two manually annotated databases of polymers and reconstructed the structures de novo with a sensitivity over 90 %. The average computation time per polymer is 2 s. s2m automatically creates de novo monomeric annotations for polymers, efficiently in terms of time computation and sensitivity. s2m allowed us to detect annotation errors in the tested databases and to easily find the accurate structures. So, s2m could be integrated into the curation process of databases of small compounds to verify the current entries and accelerate the annotation of new polymers. The full method can be downloaded or accessed via a website for peptide-like polymers at http://bioinfo.lifl.fr/norine/smiles2monomers.jsp.Graphical abstract:.
A fast marching algorithm for the factored eikonal equation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Treister, Eran, E-mail: erantreister@gmail.com; Haber, Eldad, E-mail: haber@math.ubc.ca; Department of Mathematics, The University of British Columbia, Vancouver, BC

The eikonal equation is instrumental in many applications in several fields ranging from computer vision to geoscience. This equation can be efficiently solved using the iterative Fast Sweeping (FS) methods and the direct Fast Marching (FM) methods. However, when used for a point source, the original eikonal equation is known to yield inaccurate numerical solutions, because of a singularity at the source. In this case, the factored eikonal equation is often preferred, and is known to yield a more accurate numerical solution. One application that requires the solution of the eikonal equation for point sources is travel time tomography. Thismore » inverse problem may be formulated using the eikonal equation as a forward problem. While this problem has been solved using FS in the past, the more recent choice for applying it involves FM methods because of the efficiency in which sensitivities can be obtained using them. However, while several FS methods are available for solving the factored equation, the FM method is available only for the original eikonal equation. In this paper we develop a Fast Marching algorithm for the factored eikonal equation, using both first and second order finite-difference schemes. Our algorithm follows the same lines as the original FM algorithm and requires the same computational effort. In addition, we show how to obtain sensitivities using this FM method and apply travel time tomography, formulated as an inverse factored eikonal equation. Numerical results in two and three dimensions show that our algorithm solves the factored eikonal equation efficiently, and demonstrate the achieved accuracy for computing the travel time. We also demonstrate a recovery of a 2D and 3D heterogeneous medium by travel time tomography using the eikonal equation for forward modeling and inversion by Gauss–Newton.« less
Parallel and Preemptable Dynamically Dimensioned Search Algorithms for Single and Multi-objective Optimization in Water Resources

NASA Astrophysics Data System (ADS)

Tolson, B.; Matott, L. S.; Gaffoor, T. A.; Asadzadeh, M.; Shafii, M.; Pomorski, P.; Xu, X.; Jahanpour, M.; Razavi, S.; Haghnegahdar, A.; Craig, J. R.

2015-12-01

We introduce asynchronous parallel implementations of the Dynamically Dimensioned Search (DDS) family of algorithms including DDS, discrete DDS, PA-DDS and DDS-AU. These parallel algorithms are unique from most existing parallel optimization algorithms in the water resources field in that parallel DDS is asynchronous and does not require an entire population (set of candidate solutions) to be evaluated before generating and then sending a new candidate solution for evaluation. One key advance in this study is developing the first parallel PA-DDS multi-objective optimization algorithm. The other key advance is enhancing the computational efficiency of solving optimization problems (such as model calibration) by combining a parallel optimization algorithm with the deterministic model pre-emption concept. These two efficiency techniques can only be combined because of the asynchronous nature of parallel DDS. Model pre-emption functions to terminate simulation model runs early, prior to completely simulating the model calibration period for example, when intermediate results indicate the candidate solution is so poor that it will definitely have no influence on the generation of further candidate solutions. The computational savings of deterministic model preemption available in serial implementations of population-based algorithms (e.g., PSO) disappear in synchronous parallel implementations as these algorithms. In addition to the key advances above, we implement the algorithms across a range of computation platforms (Windows and Unix-based operating systems from multi-core desktops to a supercomputer system) and package these for future modellers within a model-independent calibration software package called Ostrich as well as MATLAB versions. Results across multiple platforms and multiple case studies (from 4 to 64 processors) demonstrate the vast improvement over serial DDS-based algorithms and highlight the important role model pre-emption plays in the performance of parallel, pre-emptable DDS algorithms. Case studies include single- and multiple-objective optimization problems in water resources model calibration and in many cases linear or near linear speedups are observed.
Empirical study of parallel LRU simulation algorithms

NASA Technical Reports Server (NTRS)

Carr, Eric; Nicol, David M.

1994-01-01

This paper reports on the performance of five parallel algorithms for simulating a fully associative cache operating under the LRU (Least-Recently-Used) replacement policy. Three of the algorithms are SIMD, and are implemented on the MasPar MP-2 architecture. Two other algorithms are parallelizations of an efficient serial algorithm on the Intel Paragon. One SIMD algorithm is quite simple, but its cost is linear in the cache size. The two other SIMD algorithm are more complex, but have costs that are independent on the cache size. Both the second and third SIMD algorithms compute all stack distances; the second SIMD algorithm is completely general, whereas the third SIMD algorithm presumes and takes advantage of bounds on the range of reference tags. Both MIMD algorithm implemented on the Paragon are general and compute all stack distances; they differ in one step that may affect their respective scalability. We assess the strengths and weaknesses of these algorithms as a function of problem size and characteristics, and compare their performance on traces derived from execution of three SPEC benchmark programs.
PHoToNs–A parallel heterogeneous and threads oriented code for cosmological N-body simulation

NASA Astrophysics Data System (ADS)

Wang, Qiao; Cao, Zong-Yan; Gao, Liang; Chi, Xue-Bin; Meng, Chen; Wang, Jie; Wang, Long

2018-06-01

We introduce a new code for cosmological simulations, PHoToNs, which incorporates features for performing massive cosmological simulations on heterogeneous high performance computer (HPC) systems and threads oriented programming. PHoToNs adopts a hybrid scheme to compute gravitational force, with the conventional Particle-Mesh (PM) algorithm to compute the long-range force, the Tree algorithm to compute the short range force and the direct summation Particle-Particle (PP) algorithm to compute gravity from very close particles. A self-similar space filling a Peano-Hilbert curve is used to decompose the computing domain. Threads programming is advantageously used to more flexibly manage the domain communication, PM calculation and synchronization, as well as Dual Tree Traversal on the CPU+MIC platform. PHoToNs scales well and efficiency of the PP kernel achieves 68.6% of peak performance on MIC and 74.4% on CPU platforms. We also test the accuracy of the code against the much used Gadget-2 in the community and found excellent agreement.
Computation of multi-dimensional viscous supersonic flow

NASA Technical Reports Server (NTRS)

Buggeln, R. C.; Kim, Y. N.; Mcdonald, H.

1986-01-01

A method has been developed for two- and three-dimensional computations of viscous supersonic jet flows interacting with an external flow. The approach employs a reduced form of the Navier-Stokes equations which allows solution as an initial-boundary value problem in space, using an efficient noniterative forward marching algorithm. Numerical instability associated with forward marching algorithms for flows with embedded subsonic regions is avoided by approximation of the reduced form of the Navier-Stokes equations in the subsonic regions of the boundary layers. Supersonic and subsonic portions of the flow field are simultaneously calculated by a consistently split linearized block implicit computational algorithm. The results of computations for a series of test cases associated with supersonic jet flow is presented and compared with other calculations for axisymmetric cases. Demonstration calculations indicate that the computational technique has great promise as a tool for calculating a wide range of supersonic flow problems including jet flow. Finally, a User's Manual is presented for the computer code used to perform the calculations.
Implementation of a block Lanczos algorithm for Eigenproblem solution of gyroscopic systems

NASA Technical Reports Server (NTRS)

Gupta, Kajal K.; Lawson, Charles L.

1987-01-01

The details of implementation of a general numerical procedure developed for the accurate and economical computation of natural frequencies and associated modes of any elastic structure rotating along an arbitrary axis are described. A block version of the Lanczos algorithm is derived for the solution that fully exploits associated matrix sparsity and employs only real numbers in all relevant computations. It is also capable of determining multiple roots and proves to be most efficient when compared to other, similar, exisiting techniques.
Interprocedural Analysis and the Verification of Concurrent Programs

DTIC Science & Technology

2009-01-01

SSPE ) problem is to compute a regular expression that represents paths(s, v) for all vertices v in the graph. The syntax of regular expressions is as...follows: r ::= ∅ | ε | e | r1 ∪ r2 | r1.r2 | r∗, where e stands for an edge in G. We can use any algorithm for SSPE to compute regular expressions for...a closed representation of loops provides an exponential speedup.2 Tarjan’s path-expression algorithm solves the SSPE problem efficiently. It uses
Three-dimensional geoelectric modelling with optimal work/accuracy rate using an adaptive wavelet algorithm

NASA Astrophysics Data System (ADS)

Plattner, A.; Maurer, H. R.; Vorloeper, J.; Dahmen, W.

2010-08-01

Despite the ever-increasing power of modern computers, realistic modelling of complex 3-D earth models is still a challenging task and requires substantial computing resources. The overwhelming majority of current geophysical modelling approaches includes either finite difference or non-adaptive finite element algorithms and variants thereof. These numerical methods usually require the subsurface to be discretized with a fine mesh to accurately capture the behaviour of the physical fields. However, this may result in excessive memory consumption and computing times. A common feature of most of these algorithms is that the modelled data discretizations are independent of the model complexity, which may be wasteful when there are only minor to moderate spatial variations in the subsurface parameters. Recent developments in the theory of adaptive numerical solvers have the potential to overcome this problem. Here, we consider an adaptive wavelet-based approach that is applicable to a large range of problems, also including nonlinear problems. In comparison with earlier applications of adaptive solvers to geophysical problems we employ here a new adaptive scheme whose core ingredients arose from a rigorous analysis of the overall asymptotically optimal computational complexity, including in particular, an optimal work/accuracy rate. Our adaptive wavelet algorithm offers several attractive features: (i) for a given subsurface model, it allows the forward modelling domain to be discretized with a quasi minimal number of degrees of freedom, (ii) sparsity of the associated system matrices is guaranteed, which makes the algorithm memory efficient and (iii) the modelling accuracy scales linearly with computing time. We have implemented the adaptive wavelet algorithm for solving 3-D geoelectric problems. To test its performance, numerical experiments were conducted with a series of conductivity models exhibiting varying degrees of structural complexity. Results were compared with a non-adaptive finite element algorithm, which incorporates an unstructured mesh to best-fitting subsurface boundaries. Such algorithms represent the current state-of-the-art in geoelectric modelling. An analysis of the numerical accuracy as a function of the number of degrees of freedom revealed that the adaptive wavelet algorithm outperforms the finite element solver for simple and moderately complex models, whereas the results become comparable for models with high spatial variability of electrical conductivities. The linear dependence of the modelling error and the computing time proved to be model-independent. This feature will allow very efficient computations using large-scale models as soon as our experimental code is optimized in terms of its implementation.
Optimizing Approximate Weighted Matching on Nvidia Kepler K40

DOE Office of Scientific and Technical Information (OSTI.GOV)

Naim, Md; Manne, Fredrik; Halappanavar, Mahantesh

Matching is a fundamental graph problem with numerous applications in science and engineering. While algorithms for computing optimal matchings are difficult to parallelize, approximation algorithms on the other hand generally compute high quality solutions and are amenable to parallelization. In this paper, we present efficient implementations of the current best algorithm for half-approximate weighted matching, the Suitor algorithm, on Nvidia Kepler K-40 platform. We develop four variants of the algorithm that exploit hardware features to address key challenges for a GPU implementation. We also experiment with different combinations of work assigned to a warp. Using an exhaustive set ofmore » $269$ inputs, we demonstrate that the new implementation outperforms the previous best GPU algorithm by $10$ to $$100\\times$$ for over $100$ instances, and from $100$ to $$1000\\times$$ for $15$ instances. We also demonstrate up to $$20\\times$$ speedup relative to $2$ threads, and up to $$5\\times$$ relative to $16$ threads on Intel Xeon platform with $16$ cores for the same algorithm. The new algorithms and implementations provided in this paper will have a direct impact on several applications that repeatedly use matching as a key compute kernel. Further, algorithm designs and insights provided in this paper will benefit other researchers implementing graph algorithms on modern GPU architectures.« less
An efficient parallel-processing method for transposing large matrices in place.

PubMed

Portnoff, M R

1999-01-01

We have developed an efficient algorithm for transposing large matrices in place. The algorithm is efficient because data are accessed either sequentially in blocks or randomly within blocks small enough to fit in cache, and because the same indexing calculations are shared among identical procedures operating on independent subsets of the data. This inherent parallelism makes the method well suited for a multiprocessor computing environment. The algorithm is easy to implement because the same two procedures are applied to the data in various groupings to carry out the complete transpose operation. Using only a single processor, we have demonstrated nearly an order of magnitude increase in speed over the previously published algorithm by Gate and Twigg for transposing a large rectangular matrix in place. With multiple processors operating in parallel, the processing speed increases almost linearly with the number of processors. A simplified version of the algorithm for square matrices is presented as well as an extension for matrices large enough to require virtual memory.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.