computer algorithms: Topics by Science.gov

Sample records for computer algorithms

Efficient 3D geometric and Zernike moments computation from unstructured surface meshes.

PubMed

Pozo, José María; Villa-Uriol, Maria-Cruz; Frangi, Alejandro F

2011-03-01

This paper introduces and evaluates a fast exact algorithm and a series of faster approximate algorithms for the computation of 3D geometric moments from an unstructured surface mesh of triangles. Being based on the object surface reduces the computational complexity of these algorithms with respect to volumetric grid-based algorithms. In contrast, it can only be applied for the computation of geometric moments of homogeneous objects. This advantage and restriction is shared with other proposed algorithms based on the object boundary. The proposed exact algorithm reduces the computational complexity for computing geometric moments up to order N with respect to previously proposed exact algorithms, from N(9) to N(6). The approximate series algorithm appears as a power series on the rate between triangle size and object size, which can be truncated at any desired degree. The higher the number and quality of the triangles, the better the approximation. This approximate algorithm reduces the computational complexity to N(3). In addition, the paper introduces a fast algorithm for the computation of 3D Zernike moments from the computed geometric moments, with a computational complexity N(4), while the previously proposed algorithm is of order N(6). The error introduced by the proposed approximate algorithms is evaluated in different shapes and the cost-benefit ratio in terms of error, and computational time is analyzed for different moment orders.
Computation of Symmetric Discrete Cosine Transform Using Bakhvalov's Algorithm

NASA Technical Reports Server (NTRS)

Aburdene, Maurice F.; Strojny, Brian C.; Dorband, John E.

2005-01-01

A number of algorithms for recursive computation of the discrete cosine transform (DCT) have been developed recently. This paper presents a new method for computing the discrete cosine transform and its inverse using Bakhvalov's algorithm, a method developed for evaluation of a polynomial at a point. In this paper, we will focus on both the application of the algorithm to the computation of the DCT-I and its complexity. In addition, Bakhvalov s algorithm is compared with Clenshaw s algorithm for the computation of the DCT.
An efficient parallel termination detection algorithm

DOE Office of Scientific and Technical Information (OSTI.GOV)

Baker, A. H.; Crivelli, S.; Jessup, E. R.

2004-05-27

Information local to any one processor is insufficient to monitor the overall progress of most distributed computations. Typically, a second distributed computation for detecting termination of the main computation is necessary. In order to be a useful computational tool, the termination detection routine must operate concurrently with the main computation, adding minimal overhead, and it must promptly and correctly detect termination when it occurs. In this paper, we present a new algorithm for detecting the termination of a parallel computation on distributed-memory MIMD computers that satisfies all of those criteria. A variety of termination detection algorithms have been devised. Ofmore » these, the algorithm presented by Sinha, Kale, and Ramkumar (henceforth, the SKR algorithm) is unique in its ability to adapt to the load conditions of the system on which it runs, thereby minimizing the impact of termination detection on performance. Because their algorithm also detects termination quickly, we consider it to be the most efficient practical algorithm presently available. The termination detection algorithm presented here was developed for use in the PMESC programming library for distributed-memory MIMD computers. Like the SKR algorithm, our algorithm adapts to system loads and imposes little overhead. Also like the SKR algorithm, ours is tree-based, and it does not depend on any assumptions about the physical interconnection topology of the processors or the specifics of the distributed computation. In addition, our algorithm is easier to implement and requires only half as many tree traverses as does the SKR algorithm. This paper is organized as follows. In section 2, we define our computational model. In section 3, we review the SKR algorithm. We introduce our new algorithm in section 4, and prove its correctness in section 5. We discuss its efficiency and present experimental results in section 6.« less
QPSO-Based Adaptive DNA Computing Algorithm

PubMed Central

Karakose, Mehmet; Cigdem, Ugur

2013-01-01

DNA (deoxyribonucleic acid) computing that is a new computation model based on DNA molecules for information storage has been increasingly used for optimization and data analysis in recent years. However, DNA computing algorithm has some limitations in terms of convergence speed, adaptability, and effectiveness. In this paper, a new approach for improvement of DNA computing is proposed. This new approach aims to perform DNA computing algorithm with adaptive parameters towards the desired goal using quantum-behaved particle swarm optimization (QPSO). Some contributions provided by the proposed QPSO based on adaptive DNA computing algorithm are as follows: (1) parameters of population size, crossover rate, maximum number of operations, enzyme and virus mutation rate, and fitness function of DNA computing algorithm are simultaneously tuned for adaptive process, (2) adaptive algorithm is performed using QPSO algorithm for goal-driven progress, faster operation, and flexibility in data, and (3) numerical realization of DNA computing algorithm with proposed approach is implemented in system identification. Two experiments with different systems were carried out to evaluate the performance of the proposed approach with comparative results. Experimental results obtained with Matlab and FPGA demonstrate ability to provide effective optimization, considerable convergence speed, and high accuracy according to DNA computing algorithm. PMID:23935409
Edge Pushing is Equivalent to Vertex Elimination for Computing Hessians

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wang, Mu; Pothen, Alex; Hovland, Paul

We prove the equivalence of two different Hessian evaluation algorithms in AD. The first is the Edge Pushing algorithm of Gower and Mello, which may be viewed as a second order Reverse mode algorithm for computing the Hessian. In earlier work, we have derived the Edge Pushing algorithm by exploiting a Reverse mode invariant based on the concept of live variables in compiler theory. The second algorithm is based on eliminating vertices in a computational graph of the gradient, in which intermediate variables are successively eliminated from the graph, and the weights of the edges are updated suitably. We provemore » that if the vertices are eliminated in a reverse topological order while preserving symmetry in the computational graph of the gradient, then the Vertex Elimination algorithm and the Edge Pushing algorithm perform identical computations. In this sense, the two algorithms are equivalent. This insight that unifies two seemingly disparate approaches to Hessian computations could lead to improved algorithms and implementations for computing Hessians. Read More: http://epubs.siam.org/doi/10.1137/1.9781611974690.ch11« less
Algorithmic Mechanism Design of Evolutionary Computation.

PubMed

Pei, Yan

2015-01-01

We consider algorithmic design, enhancement, and improvement of evolutionary computation as a mechanism design problem. All individuals or several groups of individuals can be considered as self-interested agents. The individuals in evolutionary computation can manipulate parameter settings and operations by satisfying their own preferences, which are defined by an evolutionary computation algorithm designer, rather than by following a fixed algorithm rule. Evolutionary computation algorithm designers or self-adaptive methods should construct proper rules and mechanisms for all agents (individuals) to conduct their evolution behaviour correctly in order to definitely achieve the desired and preset objective(s). As a case study, we propose a formal framework on parameter setting, strategy selection, and algorithmic design of evolutionary computation by considering the Nash strategy equilibrium of a mechanism design in the search process. The evaluation results present the efficiency of the framework. This primary principle can be implemented in any evolutionary computation algorithm that needs to consider strategy selection issues in its optimization process. The final objective of our work is to solve evolutionary computation design as an algorithmic mechanism design problem and establish its fundamental aspect by taking this perspective. This paper is the first step towards achieving this objective by implementing a strategy equilibrium solution (such as Nash equilibrium) in evolutionary computation algorithm.
Algorithmic Mechanism Design of Evolutionary Computation

PubMed Central

2015-01-01

We consider algorithmic design, enhancement, and improvement of evolutionary computation as a mechanism design problem. All individuals or several groups of individuals can be considered as self-interested agents. The individuals in evolutionary computation can manipulate parameter settings and operations by satisfying their own preferences, which are defined by an evolutionary computation algorithm designer, rather than by following a fixed algorithm rule. Evolutionary computation algorithm designers or self-adaptive methods should construct proper rules and mechanisms for all agents (individuals) to conduct their evolution behaviour correctly in order to definitely achieve the desired and preset objective(s). As a case study, we propose a formal framework on parameter setting, strategy selection, and algorithmic design of evolutionary computation by considering the Nash strategy equilibrium of a mechanism design in the search process. The evaluation results present the efficiency of the framework. This primary principle can be implemented in any evolutionary computation algorithm that needs to consider strategy selection issues in its optimization process. The final objective of our work is to solve evolutionary computation design as an algorithmic mechanism design problem and establish its fundamental aspect by taking this perspective. This paper is the first step towards achieving this objective by implementing a strategy equilibrium solution (such as Nash equilibrium) in evolutionary computation algorithm. PMID:26257777
A review on quantum search algorithms

NASA Astrophysics Data System (ADS)

Giri, Pulak Ranjan; Korepin, Vladimir E.

2017-12-01

The use of superposition of states in quantum computation, known as quantum parallelism, has significant advantage in terms of speed over the classical computation. It is evident from the early invented quantum algorithms such as Deutsch's algorithm, Deutsch-Jozsa algorithm and its variation as Bernstein-Vazirani algorithm, Simon algorithm, Shor's algorithms, etc. Quantum parallelism also significantly speeds up the database search algorithm, which is important in computer science because it comes as a subroutine in many important algorithms. Quantum database search of Grover achieves the task of finding the target element in an unsorted database in a time quadratically faster than the classical computer. We review Grover's quantum search algorithms for a singe and multiple target elements in a database. The partial search algorithm of Grover and Radhakrishnan and its optimization by Korepin called GRK algorithm are also discussed.
QCE: A Simulator for Quantum Computer Hardware

NASA Astrophysics Data System (ADS)

Michielsen, Kristel; de Raedt, Hans

2003-09-01

The Quantum Computer Emulator (QCE) described in this paper consists of a simulator of a generic, general purpose quantum computer and a graphical user interface. The latter is used to control the simulator, to define the hardware of the quantum computer and to debug and execute quantum algorithms. QCE runs in a Windows 98/NT/2000/ME/XP environment. It can be used to validate designs of physically realizable quantum processors and as an interactive educational tool to learn about quantum computers and quantum algorithms. A detailed exposition is given of the implementation of the CNOT and the Toffoli gate, the quantum Fourier transform, Grover's database search algorithm, an order finding algorithm, Shor's algorithm, a three-input adder and a number partitioning algorithm. We also review the results of simulations of an NMR-like quantum computer.
Rational use of cognitive resources: levels of analysis between the computational and the algorithmic.

PubMed

Griffiths, Thomas L; Lieder, Falk; Goodman, Noah D

2015-04-01

Marr's levels of analysis-computational, algorithmic, and implementation-have served cognitive science well over the last 30 years. But the recent increase in the popularity of the computational level raises a new challenge: How do we begin to relate models at different levels of analysis? We propose that it is possible to define levels of analysis that lie between the computational and the algorithmic, providing a way to build a bridge between computational- and algorithmic-level models. The key idea is to push the notion of rationality, often used in defining computational-level models, deeper toward the algorithmic level. We offer a simple recipe for reverse-engineering the mind's cognitive strategies by deriving optimal algorithms for a series of increasingly more realistic abstract computational architectures, which we call "resource-rational analysis." Copyright © 2015 Cognitive Science Society, Inc.
A class of parallel algorithms for computation of the manipulator inertia matrix

NASA Technical Reports Server (NTRS)

Fijany, Amir; Bejczy, Antal K.

1989-01-01

Parallel and parallel/pipeline algorithms for computation of the manipulator inertia matrix are presented. An algorithm based on composite rigid-body spatial inertia method, which provides better features for parallelization, is used for the computation of the inertia matrix. Two parallel algorithms are developed which achieve the time lower bound in computation. Also described is the mapping of these algorithms with topological variation on a two-dimensional processor array, with nearest-neighbor connection, and with cardinality variation on a linear processor array. An efficient parallel/pipeline algorithm for the linear array was also developed, but at significantly higher efficiency.
Network Community Detection based on the Physarum-inspired Computational Framework.

PubMed

Gao, Chao; Liang, Mingxin; Li, Xianghua; Zhang, Zili; Wang, Zhen; Zhou, Zhili

2016-12-13

Community detection is a crucial and essential problem in the structure analytics of complex networks, which can help us understand and predict the characteristics and functions of complex networks. Many methods, ranging from the optimization-based algorithms to the heuristic-based algorithms, have been proposed for solving such a problem. Due to the inherent complexity of identifying network structure, how to design an effective algorithm with a higher accuracy and a lower computational cost still remains an open problem. Inspired by the computational capability and positive feedback mechanism in the wake of foraging process of Physarum, which is a large amoeba-like cell consisting of a dendritic network of tube-like pseudopodia, a general Physarum-based computational framework for community detection is proposed in this paper. Based on the proposed framework, the inter-community edges can be identified from the intra-community edges in a network and the positive feedback of solving process in an algorithm can be further enhanced, which are used to improve the efficiency of original optimization-based and heuristic-based community detection algorithms, respectively. Some typical algorithms (e.g., genetic algorithm, ant colony optimization algorithm, and Markov clustering algorithm) and real-world datasets have been used to estimate the efficiency of our proposed computational framework. Experiments show that the algorithms optimized by Physarum-inspired computational framework perform better than the original ones, in terms of accuracy and computational cost. Moreover, a computational complexity analysis verifies the scalability of our framework.
A highly efficient multi-core algorithm for clustering extremely large datasets

PubMed Central

2010-01-01

Background In recent years, the demand for computational power in computational biology has increased due to rapidly growing data sets from microarray and other high-throughput technologies. This demand is likely to increase. Standard algorithms for analyzing data, such as cluster algorithms, need to be parallelized for fast processing. Unfortunately, most approaches for parallelizing algorithms largely rely on network communication protocols connecting and requiring multiple computers. One answer to this problem is to utilize the intrinsic capabilities in current multi-core hardware to distribute the tasks among the different cores of one computer. Results We introduce a multi-core parallelization of the k-means and k-modes cluster algorithms based on the design principles of transactional memory for clustering gene expression microarray type data and categorial SNP data. Our new shared memory parallel algorithms show to be highly efficient. We demonstrate their computational power and show their utility in cluster stability and sensitivity analysis employing repeated runs with slightly changed parameters. Computation speed of our Java based algorithm was increased by a factor of 10 for large data sets while preserving computational accuracy compared to single-core implementations and a recently published network based parallelization. Conclusions Most desktop computers and even notebooks provide at least dual-core processors. Our multi-core algorithms show that using modern algorithmic concepts, parallelization makes it possible to perform even such laborious tasks as cluster sensitivity and cluster number estimation on the laboratory computer. PMID:20370922
Computationally efficient algorithm for high sampling-frequency operation of active noise control

NASA Astrophysics Data System (ADS)

Rout, Nirmal Kumar; Das, Debi Prasad; Panda, Ganapati

2015-05-01

In high sampling-frequency operation of active noise control (ANC) system the length of the secondary path estimate and the ANC filter are very long. This increases the computational complexity of the conventional filtered-x least mean square (FXLMS) algorithm. To reduce the computational complexity of long order ANC system using FXLMS algorithm, frequency domain block ANC algorithms have been proposed in past. These full block frequency domain ANC algorithms are associated with some disadvantages such as large block delay, quantization error due to computation of large size transforms and implementation difficulties in existing low-end DSP hardware. To overcome these shortcomings, the partitioned block ANC algorithm is newly proposed where the long length filters in ANC are divided into a number of equal partitions and suitably assembled to perform the FXLMS algorithm in the frequency domain. The complexity of this proposed frequency domain partitioned block FXLMS (FPBFXLMS) algorithm is quite reduced compared to the conventional FXLMS algorithm. It is further reduced by merging one fast Fourier transform (FFT)-inverse fast Fourier transform (IFFT) combination to derive the reduced structure FPBFXLMS (RFPBFXLMS) algorithm. Computational complexity analysis for different orders of filter and partition size are presented. Systematic computer simulations are carried out for both the proposed partitioned block ANC algorithms to show its accuracy compared to the time domain FXLMS algorithm.
A Parallel Compact Multi-Dimensional Numerical Algorithm with Aeroacoustics Applications

NASA Technical Reports Server (NTRS)

Povitsky, Alex; Morris, Philip J.

1999-01-01

In this study we propose a novel method to parallelize high-order compact numerical algorithms for the solution of three-dimensional PDEs (Partial Differential Equations) in a space-time domain. For this numerical integration most of the computer time is spent in computation of spatial derivatives at each stage of the Runge-Kutta temporal update. The most efficient direct method to compute spatial derivatives on a serial computer is a version of Gaussian elimination for narrow linear banded systems known as the Thomas algorithm. In a straightforward pipelined implementation of the Thomas algorithm processors are idle due to the forward and backward recurrences of the Thomas algorithm. To utilize processors during this time, we propose to use them for either non-local data independent computations, solving lines in the next spatial direction, or local data-dependent computations by the Runge-Kutta method. To achieve this goal, control of processor communication and computations by a static schedule is adopted. Thus, our parallel code is driven by a communication and computation schedule instead of the usual "creative, programming" approach. The obtained parallelization speed-up of the novel algorithm is about twice as much as that for the standard pipelined algorithm and close to that for the explicit DRP algorithm.
A parallel Jacobson-Oksman optimization algorithm. [parallel processing (computers)

NASA Technical Reports Server (NTRS)

Straeter, T. A.; Markos, A. T.

1975-01-01

A gradient-dependent optimization technique which exploits the vector-streaming or parallel-computing capabilities of some modern computers is presented. The algorithm, derived by assuming that the function to be minimized is homogeneous, is a modification of the Jacobson-Oksman serial minimization method. In addition to describing the algorithm, conditions insuring the convergence of the iterates of the algorithm and the results of numerical experiments on a group of sample test functions are presented. The results of these experiments indicate that this algorithm will solve optimization problems in less computing time than conventional serial methods on machines having vector-streaming or parallel-computing capabilities.
Algorithmic complexity of quantum capacity

NASA Astrophysics Data System (ADS)

Oskouei, Samad Khabbazi; Mancini, Stefano

2018-04-01

We analyze the notion of quantum capacity from the perspective of algorithmic (descriptive) complexity. To this end, we resort to the concept of semi-computability in order to describe quantum states and quantum channel maps. We introduce algorithmic entropies (like algorithmic quantum coherent information) and derive relevant properties for them. Then we show that quantum capacity based on semi-computable concept equals the entropy rate of algorithmic coherent information, which in turn equals the standard quantum capacity. Thanks to this, we finally prove that the quantum capacity, for a given semi-computable channel, is limit computable.
Efficient conjugate gradient algorithms for computation of the manipulator forward dynamics

NASA Technical Reports Server (NTRS)

Fijany, Amir; Scheid, Robert E.

1989-01-01

The applicability of conjugate gradient algorithms for computation of the manipulator forward dynamics is investigated. The redundancies in the previously proposed conjugate gradient algorithm are analyzed. A new version is developed which, by avoiding these redundancies, achieves a significantly greater efficiency. A preconditioned conjugate gradient algorithm is also presented. A diagonal matrix whose elements are the diagonal elements of the inertia matrix is proposed as the preconditioner. In order to increase the computational efficiency, an algorithm is developed which exploits the synergism between the computation of the diagonal elements of the inertia matrix and that required by the conjugate gradient algorithm.
Thermodynamic cost of computation, algorithmic complexity and the information metric

NASA Technical Reports Server (NTRS)

Zurek, W. H.

1989-01-01

Algorithmic complexity is discussed as a computational counterpart to the second law of thermodynamics. It is shown that algorithmic complexity, which is a measure of randomness, sets limits on the thermodynamic cost of computations and casts a new light on the limitations of Maxwell's demon. Algorithmic complexity can also be used to define distance between binary strings.
Parallel Directionally Split Solver Based on Reformulation of Pipelined Thomas Algorithm

NASA Technical Reports Server (NTRS)

Povitsky, A.

1998-01-01

In this research an efficient parallel algorithm for 3-D directionally split problems is developed. The proposed algorithm is based on a reformulated version of the pipelined Thomas algorithm that starts the backward step computations immediately after the completion of the forward step computations for the first portion of lines This algorithm has data available for other computational tasks while processors are idle from the Thomas algorithm. The proposed 3-D directionally split solver is based on the static scheduling of processors where local and non-local, data-dependent and data-independent computations are scheduled while processors are idle. A theoretical model of parallelization efficiency is used to define optimal parameters of the algorithm, to show an asymptotic parallelization penalty and to obtain an optimal cover of a global domain with subdomains. It is shown by computational experiments and by the theoretical model that the proposed algorithm reduces the parallelization penalty about two times over the basic algorithm for the range of the number of processors (subdomains) considered and the number of grid nodes per subdomain.

Parallel algorithms for computation of the manipulator inertia matrix

NASA Technical Reports Server (NTRS)

Amin-Javaheri, Masoud; Orin, David E.

1989-01-01

The development of an O(log2N) parallel algorithm for the manipulator inertia matrix is presented. It is based on the most efficient serial algorithm which uses the composite rigid body method. Recursive doubling is used to reformulate the linear recurrence equations which are required to compute the diagonal elements of the matrix. It results in O(log2N) levels of computation. Computation of the off-diagonal elements involves N linear recurrences of varying-size and a new method, which avoids redundant computation of position and orientation transforms for the manipulator, is developed. The O(log2N) algorithm is presented in both equation and graphic forms which clearly show the parallelism inherent in the algorithm.
Shor's factoring algorithm and modern cryptography. An illustration of the capabilities inherent in quantum computers

NASA Astrophysics Data System (ADS)

Gerjuoy, Edward

2005-06-01

The security of messages encoded via the widely used RSA public key encryption system rests on the enormous computational effort required to find the prime factors of a large number N using classical (conventional) computers. In 1994 Peter Shor showed that for sufficiently large N, a quantum computer could perform the factoring with much less computational effort. This paper endeavors to explain, in a fashion comprehensible to the nonexpert, the RSA encryption protocol; the various quantum computer manipulations constituting the Shor algorithm; how the Shor algorithm performs the factoring; and the precise sense in which a quantum computer employing Shor's algorithm can be said to accomplish the factoring of very large numbers with less computational effort than a classical computer. It is made apparent that factoring N generally requires many successive runs of the algorithm. Our analysis reveals that the probability of achieving a successful factorization on a single run is about twice as large as commonly quoted in the literature.
Algorithm-Based Fault Tolerance Integrated with Replication

NASA Technical Reports Server (NTRS)

Some, Raphael; Rennels, David

2008-01-01

In a proposed approach to programming and utilization of commercial off-the-shelf computing equipment, a combination of algorithm-based fault tolerance (ABFT) and replication would be utilized to obtain high degrees of fault tolerance without incurring excessive costs. The basic idea of the proposed approach is to integrate ABFT with replication such that the algorithmic portions of computations would be protected by ABFT, and the logical portions by replication. ABFT is an extremely efficient, inexpensive, high-coverage technique for detecting and mitigating faults in computer systems used for algorithmic computations, but does not protect against errors in logical operations surrounding algorithms.
Impedance computed tomography using an adaptive smoothing coefficient algorithm.

PubMed

Suzuki, A; Uchiyama, A

2001-01-01

In impedance computed tomography, a fixed coefficient regularization algorithm has been frequently used to improve the ill-conditioning problem of the Newton-Raphson algorithm. However, a lot of experimental data and a long period of computation time are needed to determine a good smoothing coefficient because a good smoothing coefficient has to be manually chosen from a number of coefficients and is a constant for each iteration calculation. Thus, sometimes the fixed coefficient regularization algorithm distorts the information or fails to obtain any effect. In this paper, a new adaptive smoothing coefficient algorithm is proposed. This algorithm automatically calculates the smoothing coefficient from the eigenvalue of the ill-conditioned matrix. Therefore, the effective images can be obtained within a short computation time. Also the smoothing coefficient is automatically adjusted by the information related to the real resistivity distribution and the data collection method. In our impedance system, we have reconstructed the resistivity distributions of two phantoms using this algorithm. As a result, this algorithm only needs one-fifth the computation time compared to the fixed coefficient regularization algorithm. When compared to the fixed coefficient regularization algorithm, it shows that the image is obtained more rapidly and applicable in real-time monitoring of the blood vessel.
QCCM Center for Quantum Algorithms

DTIC Science & Technology

2008-10-17

algorithms (e.g., quantum walks and adiabatic computing ), as well as theoretical advances relating algorithms to physical implementations (e.g...Park, NC 27709-2211 15. SUBJECT TERMS Quantum algorithms, quantum computing , fault-tolerant error correction Richard Cleve MITACS East Academic...0511200 Algebraic results on quantum automata A. Ambainis, M. Beaudry, M. Golovkins, A. Kikusts, M. Mercer, D. Thrien Theory of Computing Systems 39(2006
Distributed Algorithm for Voronoi Partition of Wireless Sensor Networks with a Limited Sensing Range.

PubMed

He, Chenlong; Feng, Zuren; Ren, Zhigang

2018-02-03

For Wireless Sensor Networks (WSNs), the Voronoi partition of a region is a challenging problem owing to the limited sensing ability of each sensor and the distributed organization of the network. In this paper, an algorithm is proposed for each sensor having a limited sensing range to compute its limited Voronoi cell autonomously, so that the limited Voronoi partition of the entire WSN is generated in a distributed manner. Inspired by Graham's Scan (GS) algorithm used to compute the convex hull of a point set, the limited Voronoi cell of each sensor is obtained by sequentially scanning two consecutive bisectors between the sensor and its neighbors. The proposed algorithm called the Boundary Scan (BS) algorithm has a lower computational complexity than the existing Range-Constrained Voronoi Cell (RCVC) algorithm and reaches the lower bound of the computational complexity of the algorithms used to solve the problem of this kind. Moreover, it also improves the time efficiency of a key step in the Adjust-Sensing-Radius (ASR) algorithm used to compute the exact Voronoi cell. Extensive numerical simulations are performed to demonstrate the correctness and effectiveness of the BS algorithm. The distributed realization of the BS combined with a localization algorithm in WSNs is used to justify the WSN nature of the proposed algorithm.
Distributed Algorithm for Voronoi Partition of Wireless Sensor Networks with a Limited Sensing Range

PubMed Central

Feng, Zuren; Ren, Zhigang

2018-01-01

For Wireless Sensor Networks (WSNs), the Voronoi partition of a region is a challenging problem owing to the limited sensing ability of each sensor and the distributed organization of the network. In this paper, an algorithm is proposed for each sensor having a limited sensing range to compute its limited Voronoi cell autonomously, so that the limited Voronoi partition of the entire WSN is generated in a distributed manner. Inspired by Graham’s Scan (GS) algorithm used to compute the convex hull of a point set, the limited Voronoi cell of each sensor is obtained by sequentially scanning two consecutive bisectors between the sensor and its neighbors. The proposed algorithm called the Boundary Scan (BS) algorithm has a lower computational complexity than the existing Range-Constrained Voronoi Cell (RCVC) algorithm and reaches the lower bound of the computational complexity of the algorithms used to solve the problem of this kind. Moreover, it also improves the time efficiency of a key step in the Adjust-Sensing-Radius (ASR) algorithm used to compute the exact Voronoi cell. Extensive numerical simulations are performed to demonstrate the correctness and effectiveness of the BS algorithm. The distributed realization of the BS combined with a localization algorithm in WSNs is used to justify the WSN nature of the proposed algorithm. PMID:29401649
Development and application of unified algorithms for problems in computational science

NASA Technical Reports Server (NTRS)

Shankar, Vijaya; Chakravarthy, Sukumar

1987-01-01

A framework is presented for developing computationally unified numerical algorithms for solving nonlinear equations that arise in modeling various problems in mathematical physics. The concept of computational unification is an attempt to encompass efficient solution procedures for computing various nonlinear phenomena that may occur in a given problem. For example, in Computational Fluid Dynamics (CFD), a unified algorithm will be one that allows for solutions to subsonic (elliptic), transonic (mixed elliptic-hyperbolic), and supersonic (hyperbolic) flows for both steady and unsteady problems. The objectives are: development of superior unified algorithms emphasizing accuracy and efficiency aspects; development of codes based on selected algorithms leading to validation; application of mature codes to realistic problems; and extension/application of CFD-based algorithms to problems in other areas of mathematical physics. The ultimate objective is to achieve integration of multidisciplinary technologies to enhance synergism in the design process through computational simulation. Specific unified algorithms for a hierarchy of gas dynamics equations and their applications to two other areas: electromagnetic scattering, and laser-materials interaction accounting for melting.
Efficient mapping algorithms for scheduling robot inverse dynamics computation on a multiprocessor system

NASA Technical Reports Server (NTRS)

Lee, C. S. G.; Chen, C. L.

1989-01-01

Two efficient mapping algorithms for scheduling the robot inverse dynamics computation consisting of m computational modules with precedence relationship to be executed on a multiprocessor system consisting of p identical homogeneous processors with processor and communication costs to achieve minimum computation time are presented. An objective function is defined in terms of the sum of the processor finishing time and the interprocessor communication time. The minimax optimization is performed on the objective function to obtain the best mapping. This mapping problem can be formulated as a combination of the graph partitioning and the scheduling problems; both have been known to be NP-complete. Thus, to speed up the searching for a solution, two heuristic algorithms were proposed to obtain fast but suboptimal mapping solutions. The first algorithm utilizes the level and the communication intensity of the task modules to construct an ordered priority list of ready modules and the module assignment is performed by a weighted bipartite matching algorithm. For a near-optimal mapping solution, the problem can be solved by the heuristic algorithm with simulated annealing. These proposed optimization algorithms can solve various large-scale problems within a reasonable time. Computer simulations were performed to evaluate and verify the performance and the validity of the proposed mapping algorithms. Finally, experiments for computing the inverse dynamics of a six-jointed PUMA-like manipulator based on the Newton-Euler dynamic equations were implemented on an NCUBE/ten hypercube computer to verify the proposed mapping algorithms. Computer simulation and experimental results are compared and discussed.
Tools for Analyzing Computing Resource Management Strategies and Algorithms for SDR Clouds

NASA Astrophysics Data System (ADS)

Marojevic, Vuk; Gomez-Miguelez, Ismael; Gelonch, Antoni

2012-09-01

Software defined radio (SDR) clouds centralize the computing resources of base stations. The computing resource pool is shared between radio operators and dynamically loads and unloads digital signal processing chains for providing wireless communications services on demand. Each new user session request particularly requires the allocation of computing resources for executing the corresponding SDR transceivers. The huge amount of computing resources of SDR cloud data centers and the numerous session requests at certain hours of a day require an efficient computing resource management. We propose a hierarchical approach, where the data center is divided in clusters that are managed in a distributed way. This paper presents a set of computing resource management tools for analyzing computing resource management strategies and algorithms for SDR clouds. We use the tools for evaluating a different strategies and algorithms. The results show that more sophisticated algorithms can achieve higher resource occupations and that a tradeoff exists between cluster size and algorithm complexity.
A cross-disciplinary introduction to quantum annealing-based algorithms

NASA Astrophysics Data System (ADS)

Venegas-Andraca, Salvador E.; Cruz-Santos, William; McGeoch, Catherine; Lanzagorta, Marco

2018-04-01

A central goal in quantum computing is the development of quantum hardware and quantum algorithms in order to analyse challenging scientific and engineering problems. Research in quantum computation involves contributions from both physics and computer science; hence this article presents a concise introduction to basic concepts from both fields that are used in annealing-based quantum computation, an alternative to the more familiar quantum gate model. We introduce some concepts from computer science required to define difficult computational problems and to realise the potential relevance of quantum algorithms to find novel solutions to those problems. We introduce the structure of quantum annealing-based algorithms as well as two examples of this kind of algorithms for solving instances of the max-SAT and Minimum Multicut problems. An overview of the quantum annealing systems manufactured by D-Wave Systems is also presented.
Massively parallel algorithms for real-time wavefront control of a dense adaptive optics system

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fijany, A.; Milman, M.; Redding, D.

1994-12-31

In this paper massively parallel algorithms and architectures for real-time wavefront control of a dense adaptive optic system (SELENE) are presented. The authors have already shown that the computation of a near optimal control algorithm for SELENE can be reduced to the solution of a discrete Poisson equation on a regular domain. Although, this represents an optimal computation, due the large size of the system and the high sampling rate requirement, the implementation of this control algorithm poses a computationally challenging problem since it demands a sustained computational throughput of the order of 10 GFlops. They develop a novel algorithm,more » designated as Fast Invariant Imbedding algorithm, which offers a massive degree of parallelism with simple communication and synchronization requirements. Due to these features, this algorithm is significantly more efficient than other Fast Poisson Solvers for implementation on massively parallel architectures. The authors also discuss two massively parallel, algorithmically specialized, architectures for low-cost and optimal implementation of the Fast Invariant Imbedding algorithm.« less
GeoBuilder: a geometric algorithm visualization and debugging system for 2D and 3D geometric computing.

PubMed

Wei, Jyh-Da; Tsai, Ming-Hung; Lee, Gen-Cher; Huang, Jeng-Hung; Lee, Der-Tsai

2009-01-01

Algorithm visualization is a unique research topic that integrates engineering skills such as computer graphics, system programming, database management, computer networks, etc., to facilitate algorithmic researchers in testing their ideas, demonstrating new findings, and teaching algorithm design in the classroom. Within the broad applications of algorithm visualization, there still remain performance issues that deserve further research, e.g., system portability, collaboration capability, and animation effect in 3D environments. Using modern technologies of Java programming, we develop an algorithm visualization and debugging system, dubbed GeoBuilder, for geometric computing. The GeoBuilder system features Java's promising portability, engagement of collaboration in algorithm development, and automatic camera positioning for tracking 3D geometric objects. In this paper, we describe the design of the GeoBuilder system and demonstrate its applications.
Exact parallel algorithms for some members of the traveling salesman problem family

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pekny, J.F.

1989-01-01

The traveling salesman problem and its many generalizations comprise one of the best known combinatorial optimization problem families. Most members of the family are NP-complete problems so that exact algorithms require an unpredictable and sometimes large computational effort. Parallel computers offer hope for providing the power required to meet these demands. A major barrier to applying parallel computers is the lack of parallel algorithms. The contributions presented in this thesis center around new exact parallel algorithms for the asymmetric traveling salesman problem (ATSP), prize collecting traveling salesman problem (PCTSP), and resource constrained traveling salesman problem (RCTSP). The RCTSP is amore » particularly difficult member of the family since finding a feasible solution is an NP-complete problem. An exact sequential algorithm is also presented for the directed hamiltonian cycle problem (DHCP). The DHCP algorithm is superior to current heuristic approaches and represents the first exact method applicable to large graphs. Computational results presented for each of the algorithms demonstrates the effectiveness of combining efficient algorithms with parallel computing methods. Performance statistics are reported for randomly generated ATSPs with 7,500 cities, PCTSPs with 200 cities, RCTSPs with 200 cities, DHCPs with 3,500 vertices, and assignment problems of size 10,000. Sequential results were collected on a Sun 4/260 engineering workstation, while parallel results were collected using a 14 and 100 processor BBN Butterfly Plus computer. The computational results represent the largest instances ever solved to optimality on any type of computer.« less
Symbolic Computation of Strongly Connected Components Using Saturation

NASA Technical Reports Server (NTRS)

Zhao, Yang; Ciardo, Gianfranco

2010-01-01

Finding strongly connected components (SCCs) in the state-space of discrete-state models is a critical task in formal verification of LTL and fair CTL properties, but the potentially huge number of reachable states and SCCs constitutes a formidable challenge. This paper is concerned with computing the sets of states in SCCs or terminal SCCs of asynchronous systems. Because of its advantages in many applications, we employ saturation on two previously proposed approaches: the Xie-Beerel algorithm and transitive closure. First, saturation speeds up state-space exploration when computing each SCC in the Xie-Beerel algorithm. Then, our main contribution is a novel algorithm to compute the transitive closure using saturation. Experimental results indicate that our improved algorithms achieve a clear speedup over previous algorithms in some cases. With the help of the new transitive closure computation algorithm, up to 10(exp 150) SCCs can be explored within a few seconds.
A High Performance Cloud-Based Protein-Ligand Docking Prediction Algorithm

PubMed Central

Chen, Jui-Le; Yang, Chu-Sing

2013-01-01

The potential of predicting druggability for a particular disease by integrating biological and computer science technologies has witnessed success in recent years. Although the computer science technologies can be used to reduce the costs of the pharmaceutical research, the computation time of the structure-based protein-ligand docking prediction is still unsatisfied until now. Hence, in this paper, a novel docking prediction algorithm, named fast cloud-based protein-ligand docking prediction algorithm (FCPLDPA), is presented to accelerate the docking prediction algorithm. The proposed algorithm works by leveraging two high-performance operators: (1) the novel migration (information exchange) operator is designed specially for cloud-based environments to reduce the computation time; (2) the efficient operator is aimed at filtering out the worst search directions. Our simulation results illustrate that the proposed method outperforms the other docking algorithms compared in this paper in terms of both the computation time and the quality of the end result. PMID:23762864
Energy-Aware Computation Offloading of IoT Sensors in Cloudlet-Based Mobile Edge Computing.

PubMed

Ma, Xiao; Lin, Chuang; Zhang, Han; Liu, Jianwei

2018-06-15

Mobile edge computing is proposed as a promising computing paradigm to relieve the excessive burden of data centers and mobile networks, which is induced by the rapid growth of Internet of Things (IoT). This work introduces the cloud-assisted multi-cloudlet framework to provision scalable services in cloudlet-based mobile edge computing. Due to the constrained computation resources of cloudlets and limited communication resources of wireless access points (APs), IoT sensors with identical computation offloading decisions interact with each other. To optimize the processing delay and energy consumption of computation tasks, theoretic analysis of the computation offloading decision problem of IoT sensors is presented in this paper. In more detail, the computation offloading decision problem of IoT sensors is formulated as a computation offloading game and the condition of Nash equilibrium is derived by introducing the tool of a potential game. By exploiting the finite improvement property of the game, the Computation Offloading Decision (COD) algorithm is designed to provide decentralized computation offloading strategies for IoT sensors. Simulation results demonstrate that the COD algorithm can significantly reduce the system cost compared with the random-selection algorithm and the cloud-first algorithm. Furthermore, the COD algorithm can scale well with increasing IoT sensors.
Parallel Algorithms for Least Squares and Related Computations.

DTIC Science & Technology

1991-03-22

for dense computations in linear algebra . The work has recently been published in a general reference book on parallel algorithms by SIAM. AFO SR...written his Ph.D. dissertation with the principal investigator. (See publication 6.) • Parallel Algorithms for Dense Linear Algebra Computations. Our...and describe and to put into perspective a selection of the more important parallel algorithms for numerical linear algebra . We give a major new
CUDA Optimization Strategies for Compute- and Memory-Bound Neuroimaging Algorithms

PubMed Central

Lee, Daren; Dinov, Ivo; Dong, Bin; Gutman, Boris; Yanovsky, Igor; Toga, Arthur W.

2011-01-01

As neuroimaging algorithms and technology continue to grow faster than CPU performance in complexity and image resolution, data-parallel computing methods will be increasingly important. The high performance, data-parallel architecture of modern graphical processing units (GPUs) can reduce computational times by orders of magnitude. However, its massively threaded architecture introduces challenges when GPU resources are exceeded. This paper presents optimization strategies for compute- and memory-bound algorithms for the CUDA architecture. For compute-bound algorithms, the registers are reduced through variable reuse via shared memory and the data throughput is increased through heavier thread workloads and maximizing the thread configuration for a single thread block per multiprocessor. For memory-bound algorithms, fitting the data into the fast but limited GPU resources is achieved through reorganizing the data into self-contained structures and employing a multi-pass approach. Memory latencies are reduced by selecting memory resources whose cache performance are optimized for the algorithm's access patterns. We demonstrate the strategies on two computationally expensive algorithms and achieve optimized GPU implementations that perform up to 6× faster than unoptimized ones. Compared to CPU implementations, we achieve peak GPU speedups of 129× for the 3D unbiased nonlinear image registration technique and 93× for the non-local means surface denoising algorithm. PMID:21159404
CUDA optimization strategies for compute- and memory-bound neuroimaging algorithms.

PubMed

Lee, Daren; Dinov, Ivo; Dong, Bin; Gutman, Boris; Yanovsky, Igor; Toga, Arthur W

2012-06-01

As neuroimaging algorithms and technology continue to grow faster than CPU performance in complexity and image resolution, data-parallel computing methods will be increasingly important. The high performance, data-parallel architecture of modern graphical processing units (GPUs) can reduce computational times by orders of magnitude. However, its massively threaded architecture introduces challenges when GPU resources are exceeded. This paper presents optimization strategies for compute- and memory-bound algorithms for the CUDA architecture. For compute-bound algorithms, the registers are reduced through variable reuse via shared memory and the data throughput is increased through heavier thread workloads and maximizing the thread configuration for a single thread block per multiprocessor. For memory-bound algorithms, fitting the data into the fast but limited GPU resources is achieved through reorganizing the data into self-contained structures and employing a multi-pass approach. Memory latencies are reduced by selecting memory resources whose cache performance are optimized for the algorithm's access patterns. We demonstrate the strategies on two computationally expensive algorithms and achieve optimized GPU implementations that perform up to 6× faster than unoptimized ones. Compared to CPU implementations, we achieve peak GPU speedups of 129× for the 3D unbiased nonlinear image registration technique and 93× for the non-local means surface denoising algorithm. Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.

Conjugate Gradient Algorithms For Manipulator Simulation

NASA Technical Reports Server (NTRS)

Fijany, Amir; Scheid, Robert E.

1991-01-01

Report discusses applicability of conjugate-gradient algorithms to computation of forward dynamics of robotic manipulators. Rapid computation of forward dynamics essential to teleoperation and other advanced robotic applications. Part of continuing effort to find algorithms meeting requirements for increased computational efficiency and speed. Method used for iterative solution of systems of linear equations.
Interactive visualization of Earth and Space Science computations

NASA Technical Reports Server (NTRS)

Hibbard, William L.; Paul, Brian E.; Santek, David A.; Dyer, Charles R.; Battaiola, Andre L.; Voidrot-Martinez, Marie-Francoise

1994-01-01

Computers have become essential tools for scientists simulating and observing nature. Simulations are formulated as mathematical models but are implemented as computer algorithms to simulate complex events. Observations are also analyzed and understood in terms of mathematical models, but the number of these observations usually dictates that we automate analyses with computer algorithms. In spite of their essential role, computers are also barriers to scientific understanding. Unlike hand calculations, automated computations are invisible and, because of the enormous numbers of individual operations in automated computations, the relation between an algorithm's input and output is often not intuitive. This problem is illustrated by the behavior of meteorologists responsible for forecasting weather. Even in this age of computers, many meteorologists manually plot weather observations on maps, then draw isolines of temperature, pressure, and other fields by hand (special pads of maps are printed for just this purpose). Similarly, radiologists use computers to collect medical data but are notoriously reluctant to apply image-processing algorithms to that data. To these scientists with life-and-death responsibilities, computer algorithms are black boxes that increase rather than reduce risk. The barrier between scientists and their computations can be bridged by techniques that make the internal workings of algorithms visible and that allow scientists to experiment with their computations. Here we describe two interactive systems developed at the University of Wisconsin-Madison Space Science and Engineering Center (SSEC) that provide these capabilities to Earth and space scientists.
Implementation of Multispectral Image Classification on a Remote Adaptive Computer

NASA Technical Reports Server (NTRS)

Figueiredo, Marco A.; Gloster, Clay S.; Stephens, Mark; Graves, Corey A.; Nakkar, Mouna

1999-01-01

As the demand for higher performance computers for the processing of remote sensing science algorithms increases, the need to investigate new computing paradigms its justified. Field Programmable Gate Arrays enable the implementation of algorithms at the hardware gate level, leading to orders of m a,gnitude performance increase over microprocessor based systems. The automatic classification of spaceborne multispectral images is an example of a computation intensive application, that, can benefit from implementation on an FPGA - based custom computing machine (adaptive or reconfigurable computer). A probabilistic neural network is used here to classify pixels of of a multispectral LANDSAT-2 image. The implementation described utilizes Java client/server application programs to access the adaptive computer from a remote site. Results verify that a remote hardware version of the algorithm (implemented on an adaptive computer) is significantly faster than a local software version of the same algorithm implemented on a typical general - purpose computer).
Computer methods for sampling from the gamma distribution

DOE Office of Scientific and Technical Information (OSTI.GOV)

Johnson, M.E.; Tadikamalla, P.R.

1978-01-01

Considerable attention has recently been directed at developing ever faster algorithms for generating gamma random variates on digital computers. This paper surveys the current state of the art including the leading algorithms of Ahrens and Dieter, Atkinson, Cheng, Fishman, Marsaglia, Tadikamalla, and Wallace. General random variate generation techniques are explained with reference to these gamma algorithms. Computer simulation experiments on IBM and CDC computers are reported.
RNA folding kinetics using Monte Carlo and Gillespie algorithms.

PubMed

Clote, Peter; Bayegan, Amir H

2018-04-01

RNA secondary structure folding kinetics is known to be important for the biological function of certain processes, such as the hok/sok system in E. coli. Although linear algebra provides an exact computational solution of secondary structure folding kinetics with respect to the Turner energy model for tiny ([Formula: see text]20 nt) RNA sequences, the folding kinetics for larger sequences can only be approximated by binning structures into macrostates in a coarse-grained model, or by repeatedly simulating secondary structure folding with either the Monte Carlo algorithm or the Gillespie algorithm. Here we investigate the relation between the Monte Carlo algorithm and the Gillespie algorithm. We prove that asymptotically, the expected time for a K-step trajectory of the Monte Carlo algorithm is equal to [Formula: see text] times that of the Gillespie algorithm, where [Formula: see text] denotes the Boltzmann expected network degree. If the network is regular (i.e. every node has the same degree), then the mean first passage time (MFPT) computed by the Monte Carlo algorithm is equal to MFPT computed by the Gillespie algorithm multiplied by [Formula: see text]; however, this is not true for non-regular networks. In particular, RNA secondary structure folding kinetics, as computed by the Monte Carlo algorithm, is not equal to the folding kinetics, as computed by the Gillespie algorithm, although the mean first passage times are roughly correlated. Simulation software for RNA secondary structure folding according to the Monte Carlo and Gillespie algorithms is publicly available, as is our software to compute the expected degree of the network of secondary structures of a given RNA sequence-see http://bioinformatics.bc.edu/clote/RNAexpNumNbors .
Zero-block mode decision algorithm for H.264/AVC.

PubMed

Lee, Yu-Ming; Lin, Yinyi

2009-03-01

In the previous paper , we proposed a zero-block intermode decision algorithm for H.264 video coding based upon the number of zero-blocks of 4 x 4 DCT coefficients between the current macroblock and the co-located macroblock. The proposed algorithm can achieve significant improvement in computation, but the computation performance is limited for high bit-rate coding. To improve computation efficiency, in this paper, we suggest an enhanced zero-block decision algorithm, which uses an early zero-block detection method to compute the number of zero-blocks instead of direct DCT and quantization (DCT/Q) calculation and incorporates two adequate decision methods into semi-stationary and nonstationary regions of a video sequence. In addition, the zero-block decision algorithm is also applied to the intramode prediction in the P frame. The enhanced zero-block decision algorithm brings out a reduction of average 27% of total encoding time compared to the zero-block decision algorithm.
Quantum rendering

NASA Astrophysics Data System (ADS)

Lanzagorta, Marco O.; Gomez, Richard B.; Uhlmann, Jeffrey K.

2003-08-01

In recent years, computer graphics has emerged as a critical component of the scientific and engineering process, and it is recognized as an important computer science research area. Computer graphics are extensively used for a variety of aerospace and defense training systems and by Hollywood's special effects companies. All these applications require the computer graphics systems to produce high quality renderings of extremely large data sets in short periods of time. Much research has been done in "classical computing" toward the development of efficient methods and techniques to reduce the rendering time required for large datasets. Quantum Computing's unique algorithmic features offer the possibility of speeding up some of the known rendering algorithms currently used in computer graphics. In this paper we discuss possible implementations of quantum rendering algorithms. In particular, we concentrate on the implementation of Grover's quantum search algorithm for Z-buffering, ray-tracing, radiosity, and scene management techniques. We also compare the theoretical performance between the classical and quantum versions of the algorithms.
A unifying framework for rigid multibody dynamics and serial and parallel computational issues

NASA Technical Reports Server (NTRS)

Fijany, Amir; Jain, Abhinandan

1989-01-01

A unifying framework for various formulations of the dynamics of open-chain rigid multibody systems is discussed. Their suitability for serial and parallel processing is assessed. The framework is based on the derivation of intrinsic, i.e., coordinate-free, equations of the algorithms which provides a suitable abstraction and permits a distinction to be made between the computational redundancy in the intrinsic and extrinsic equations. A set of spatial notation is used which allows the derivation of the various algorithms in a common setting and thus clarifies the relationships among them. The three classes of algorithms viz., O(n), O(n exp 2) and O(n exp 3) or the solution of the dynamics problem are investigated. Researchers begin with the derivation of O(n exp 3) algorithms based on the explicit computation of the mass matrix and it provides insight into the underlying basis of the O(n) algorithms. From a computational perspective, the optimal choice of a coordinate frame for the projection of the intrinsic equations is discussed and the serial computational complexity of the different algorithms is evaluated. The three classes of algorithms are also analyzed for suitability for parallel processing. It is shown that the problem belongs to the class of N C and the time and processor bounds are of O(log2/2(n)) and O(n exp 4), respectively. However, the algorithm that achieves the above bounds is not stable. Researchers show that the fastest stable parallel algorithm achieves a computational complexity of O(n) with O(n exp 4), respectively. However, the algorithm that achieves the above bounds is not stable. Researchers show that the fastest stable parallel algorithm achieves a computational complexity of O(n) with O(n exp 2) processors, and results from the parallelization of the O(n exp 3) serial algorithm.
Accuracy and speed in computing the Chebyshev collocation derivative

NASA Technical Reports Server (NTRS)

Don, Wai-Sun; Solomonoff, Alex

1991-01-01

We studied several algorithms for computing the Chebyshev spectral derivative and compare their roundoff error. For a large number of collocation points, the elements of the Chebyshev differentiation matrix, if constructed in the usual way, are not computed accurately. A subtle cause is is found to account for the poor accuracy when computing the derivative by the matrix-vector multiplication method. Methods for accurately computing the elements of the matrix are presented, and we find that if the entities of the matrix are computed accurately, the roundoff error of the matrix-vector multiplication is as small as that of the transform-recursion algorithm. Results of CPU time usage are shown for several different algorithms for computing the derivative by the Chebyshev collocation method for a wide variety of two-dimensional grid sizes on both an IBM and a Cray 2 computer. We found that which algorithm is fastest on a particular machine depends not only on the grid size, but also on small details of the computer hardware as well. For most practical grid sizes used in computation, the even-odd decomposition algorithm is found to be faster than the transform-recursion method.
Cloud computing task scheduling strategy based on improved differential evolution algorithm

NASA Astrophysics Data System (ADS)

Ge, Junwei; He, Qian; Fang, Yiqiu

2017-04-01

In order to optimize the cloud computing task scheduling scheme, an improved differential evolution algorithm for cloud computing task scheduling is proposed. Firstly, the cloud computing task scheduling model, according to the model of the fitness function, and then used improved optimization calculation of the fitness function of the evolutionary algorithm, according to the evolution of generation of dynamic selection strategy through dynamic mutation strategy to ensure the global and local search ability. The performance test experiment was carried out in the CloudSim simulation platform, the experimental results show that the improved differential evolution algorithm can reduce the cloud computing task execution time and user cost saving, good implementation of the optimal scheduling of cloud computing tasks.
Hardware Acceleration of Adaptive Neural Algorithms.

DOE Office of Scientific and Technical Information (OSTI.GOV)

James, Conrad D.

As tradit ional numerical computing has faced challenges, researchers have turned towards alternative computing approaches to reduce power - per - computation metrics and improve algorithm performance. Here, we describe an approach towards non - conventional computing that strengthens the connection between machine learning and neuroscience concepts. The Hardware Acceleration of Adaptive Neural Algorithms (HAANA) project ha s develop ed neural machine learning algorithms and hardware for applications in image processing and cybersecurity. While machine learning methods are effective at extracting relevant features from many types of data, the effectiveness of these algorithms degrades when subjected to real - worldmore » conditions. Our team has generated novel neural - inspired approa ches to improve the resiliency and adaptability of machine learning algorithms. In addition, we have also designed and fabricated hardware architectures and microelectronic devices specifically tuned towards the training and inference operations of neural - inspired algorithms. Finally, our multi - scale simulation framework allows us to assess the impact of microelectronic device properties on algorithm performance.« less
A Discussion of Using a Reconfigurable Processor to Implement the Discrete Fourier Transform

NASA Technical Reports Server (NTRS)

White, Michael J.

2004-01-01

This paper presents the design and implementation of the Discrete Fourier Transform (DFT) algorithm on a reconfigurable processor system. While highly applicable to many engineering problems, the DFT is an extremely computationally intensive algorithm. Consequently, the eventual goal of this work is to enhance the execution of a floating-point precision DFT algorithm by off loading the algorithm from the computing system. This computing system, within the context of this research, is a typical high performance desktop computer with an may of field programmable gate arrays (FPGAs). FPGAs are hardware devices that are configured by software to execute an algorithm. If it is desired to change the algorithm, the software is changed to reflect the modification, then download to the FPGA, which is then itself modified. This paper will discuss methodology for developing the DFT algorithm to be implemented on the FPGA. We will discuss the algorithm, the FPGA code effort, and the results to date.
A review of classification algorithms for EEG-based brain-computer interfaces.

PubMed

Lotte, F; Congedo, M; Lécuyer, A; Lamarche, F; Arnaldi, B

2007-06-01

In this paper we review classification algorithms used to design brain-computer interface (BCI) systems based on electroencephalography (EEG). We briefly present the commonly employed algorithms and describe their critical properties. Based on the literature, we compare them in terms of performance and provide guidelines to choose the suitable classification algorithm(s) for a specific BCI.
Sorting on STAR. [CDC computer algorithm timing comparison

NASA Technical Reports Server (NTRS)

Stone, H. S.

1978-01-01

Timing comparisons are given for three sorting algorithms written for the CDC STAR computer. One algorithm is Hoare's (1962) Quicksort, which is the fastest or nearly the fastest sorting algorithm for most computers. A second algorithm is a vector version of Quicksort that takes advantage of the STAR's vector operations. The third algorithm is an adaptation of Batcher's (1968) sorting algorithm, which makes especially good use of vector operations but has a complexity of N(log N)-squared as compared with a complexity of N log N for the Quicksort algorithms. In spite of its worse complexity, Batcher's sorting algorithm is competitive with the serial version of Quicksort for vectors up to the largest that can be treated by STAR. Vector Quicksort outperforms the other two algorithms and is generally preferred. These results indicate that unusual instruction sets can introduce biases in program execution time that counter results predicted by worst-case asymptotic complexity analysis.
Microscope self-calibration based on micro laser line imaging and soft computing algorithms

NASA Astrophysics Data System (ADS)

Apolinar Muñoz Rodríguez, J.

2018-06-01

A technique to perform microscope self-calibration via micro laser line and soft computing algorithms is presented. In this technique, the microscope vision parameters are computed by means of soft computing algorithms based on laser line projection. To implement the self-calibration, a microscope vision system is constructed by means of a CCD camera and a 38 μm laser line. From this arrangement, the microscope vision parameters are represented via Bezier approximation networks, which are accomplished through the laser line position. In this procedure, a genetic algorithm determines the microscope vision parameters by means of laser line imaging. Also, the approximation networks compute the three-dimensional vision by means of the laser line position. Additionally, the soft computing algorithms re-calibrate the vision parameters when the microscope vision system is modified during the vision task. The proposed self-calibration improves accuracy of the traditional microscope calibration, which is accomplished via external references to the microscope system. The capability of the self-calibration based on soft computing algorithms is determined by means of the calibration accuracy and the micro-scale measurement error. This contribution is corroborated by an evaluation based on the accuracy of the traditional microscope calibration.
Demonstration of a small programmable quantum computer with atomic qubits.

PubMed

Debnath, S; Linke, N M; Figgatt, C; Landsman, K A; Wright, K; Monroe, C

2016-08-04

Quantum computers can solve certain problems more efficiently than any possible conventional computer. Small quantum algorithms have been demonstrated on multiple quantum computing platforms, many specifically tailored in hardware to implement a particular algorithm or execute a limited number of computational paths. Here we demonstrate a five-qubit trapped-ion quantum computer that can be programmed in software to implement arbitrary quantum algorithms by executing any sequence of universal quantum logic gates. We compile algorithms into a fully connected set of gate operations that are native to the hardware and have a mean fidelity of 98 per cent. Reconfiguring these gate sequences provides the flexibility to implement a variety of algorithms without altering the hardware. As examples, we implement the Deutsch-Jozsa and Bernstein-Vazirani algorithms with average success rates of 95 and 90 per cent, respectively. We also perform a coherent quantum Fourier transform on five trapped-ion qubits for phase estimation and period finding with average fidelities of 62 and 84 per cent, respectively. This small quantum computer can be scaled to larger numbers of qubits within a single register, and can be further expanded by connecting several such modules through ion shuttling or photonic quantum channels.
Demonstration of a small programmable quantum computer with atomic qubits

NASA Astrophysics Data System (ADS)

Debnath, S.; Linke, N. M.; Figgatt, C.; Landsman, K. A.; Wright, K.; Monroe, C.

2016-08-01

Quantum computers can solve certain problems more efficiently than any possible conventional computer. Small quantum algorithms have been demonstrated on multiple quantum computing platforms, many specifically tailored in hardware to implement a particular algorithm or execute a limited number of computational paths. Here we demonstrate a five-qubit trapped-ion quantum computer that can be programmed in software to implement arbitrary quantum algorithms by executing any sequence of universal quantum logic gates. We compile algorithms into a fully connected set of gate operations that are native to the hardware and have a mean fidelity of 98 per cent. Reconfiguring these gate sequences provides the flexibility to implement a variety of algorithms without altering the hardware. As examples, we implement the Deutsch-Jozsa and Bernstein-Vazirani algorithms with average success rates of 95 and 90 per cent, respectively. We also perform a coherent quantum Fourier transform on five trapped-ion qubits for phase estimation and period finding with average fidelities of 62 and 84 per cent, respectively. This small quantum computer can be scaled to larger numbers of qubits within a single register, and can be further expanded by connecting several such modules through ion shuttling or photonic quantum channels.
Parallel conjugate gradient algorithms for manipulator dynamic simulation

NASA Technical Reports Server (NTRS)

Fijany, Amir; Scheld, Robert E.

1989-01-01

Parallel conjugate gradient algorithms for the computation of multibody dynamics are developed for the specialized case of a robot manipulator. For an n-dimensional positive-definite linear system, the Classical Conjugate Gradient (CCG) algorithms are guaranteed to converge in n iterations, each with a computation cost of O(n); this leads to a total computational cost of O(n sq) on a serial processor. A conjugate gradient algorithms is presented that provide greater efficiency using a preconditioner, which reduces the number of iterations required, and by exploiting parallelism, which reduces the cost of each iteration. Two Preconditioned Conjugate Gradient (PCG) algorithms are proposed which respectively use a diagonal and a tridiagonal matrix, composed of the diagonal and tridiagonal elements of the mass matrix, as preconditioners. Parallel algorithms are developed to compute the preconditioners and their inversions in O(log sub 2 n) steps using n processors. A parallel algorithm is also presented which, on the same architecture, achieves the computational time of O(log sub 2 n) for each iteration. Simulation results for a seven degree-of-freedom manipulator are presented. Variants of the proposed algorithms are also developed which can be efficiently implemented on the Robot Mathematics Processor (RMP).
Bio-inspired algorithms applied to molecular docking simulations.

PubMed

Heberlé, G; de Azevedo, W F

2011-01-01

Nature as a source of inspiration has been shown to have a great beneficial impact on the development of new computational methodologies. In this scenario, analyses of the interactions between a protein target and a ligand can be simulated by biologically inspired algorithms (BIAs). These algorithms mimic biological systems to create new paradigms for computation, such as neural networks, evolutionary computing, and swarm intelligence. This review provides a description of the main concepts behind BIAs applied to molecular docking simulations. Special attention is devoted to evolutionary algorithms, guided-directed evolutionary algorithms, and Lamarckian genetic algorithms. Recent applications of these methodologies to protein targets identified in the Mycobacterium tuberculosis genome are described.
Mental Computation or Standard Algorithm? Children's Strategy Choices on Multi-Digit Subtractions

ERIC Educational Resources Information Center

Torbeyns, Joke; Verschaffel, Lieven

2016-01-01

This study analyzed children's use of mental computation strategies and the standard algorithm on multi-digit subtractions. Fifty-eight Flemish 4th graders of varying mathematical achievement level were individually offered subtractions that either stimulated the use of mental computation strategies or the standard algorithm in one choice and two…

The application of dynamic programming in production planning

NASA Astrophysics Data System (ADS)

Wu, Run

2017-05-01

Nowadays, with the popularity of the computers, various industries and fields are widely applying computer information technology, which brings about huge demand for a variety of application software. In order to develop software meeting various needs with most economical cost and best quality, programmers must design efficient algorithms. A superior algorithm can not only soul up one thing, but also maximize the benefits and generate the smallest overhead. As one of the common algorithms, dynamic programming algorithms are used to solving problems with some sort of optimal properties. When solving problems with a large amount of sub-problems that needs repetitive calculations, the ordinary sub-recursive method requires to consume exponential time, and dynamic programming algorithm can reduce the time complexity of the algorithm to the polynomial level, according to which we can conclude that dynamic programming algorithm is a very efficient compared to other algorithms reducing the computational complexity and enriching the computational results. In this paper, we expound the concept, basic elements, properties, core, solving steps and difficulties of the dynamic programming algorithm besides, establish the dynamic programming model of the production planning problem.
Computationally efficient multibody simulations

NASA Technical Reports Server (NTRS)

Ramakrishnan, Jayant; Kumar, Manoj

1994-01-01

Computationally efficient approaches to the solution of the dynamics of multibody systems are presented in this work. The computational efficiency is derived from both the algorithmic and implementational standpoint. Order(n) approaches provide a new formulation of the equations of motion eliminating the assembly and numerical inversion of a system mass matrix as required by conventional algorithms. Computational efficiency is also gained in the implementation phase by the symbolic processing and parallel implementation of these equations. Comparison of this algorithm with existing multibody simulation programs illustrates the increased computational efficiency.
Designing a parallel evolutionary algorithm for inferring gene networks on the cloud computing environment.

PubMed

Lee, Wei-Po; Hsiao, Yu-Ting; Hwang, Wei-Che

2014-01-16

To improve the tedious task of reconstructing gene networks through testing experimentally the possible interactions between genes, it becomes a trend to adopt the automated reverse engineering procedure instead. Some evolutionary algorithms have been suggested for deriving network parameters. However, to infer large networks by the evolutionary algorithm, it is necessary to address two important issues: premature convergence and high computational cost. To tackle the former problem and to enhance the performance of traditional evolutionary algorithms, it is advisable to use parallel model evolutionary algorithms. To overcome the latter and to speed up the computation, it is advocated to adopt the mechanism of cloud computing as a promising solution: most popular is the method of MapReduce programming model, a fault-tolerant framework to implement parallel algorithms for inferring large gene networks. This work presents a practical framework to infer large gene networks, by developing and parallelizing a hybrid GA-PSO optimization method. Our parallel method is extended to work with the Hadoop MapReduce programming model and is executed in different cloud computing environments. To evaluate the proposed approach, we use a well-known open-source software GeneNetWeaver to create several yeast S. cerevisiae sub-networks and use them to produce gene profiles. Experiments have been conducted and the results have been analyzed. They show that our parallel approach can be successfully used to infer networks with desired behaviors and the computation time can be largely reduced. Parallel population-based algorithms can effectively determine network parameters and they perform better than the widely-used sequential algorithms in gene network inference. These parallel algorithms can be distributed to the cloud computing environment to speed up the computation. By coupling the parallel model population-based optimization method and the parallel computational framework, high quality solutions can be obtained within relatively short time. This integrated approach is a promising way for inferring large networks.
Designing a parallel evolutionary algorithm for inferring gene networks on the cloud computing environment

PubMed Central

2014-01-01

Background To improve the tedious task of reconstructing gene networks through testing experimentally the possible interactions between genes, it becomes a trend to adopt the automated reverse engineering procedure instead. Some evolutionary algorithms have been suggested for deriving network parameters. However, to infer large networks by the evolutionary algorithm, it is necessary to address two important issues: premature convergence and high computational cost. To tackle the former problem and to enhance the performance of traditional evolutionary algorithms, it is advisable to use parallel model evolutionary algorithms. To overcome the latter and to speed up the computation, it is advocated to adopt the mechanism of cloud computing as a promising solution: most popular is the method of MapReduce programming model, a fault-tolerant framework to implement parallel algorithms for inferring large gene networks. Results This work presents a practical framework to infer large gene networks, by developing and parallelizing a hybrid GA-PSO optimization method. Our parallel method is extended to work with the Hadoop MapReduce programming model and is executed in different cloud computing environments. To evaluate the proposed approach, we use a well-known open-source software GeneNetWeaver to create several yeast S. cerevisiae sub-networks and use them to produce gene profiles. Experiments have been conducted and the results have been analyzed. They show that our parallel approach can be successfully used to infer networks with desired behaviors and the computation time can be largely reduced. Conclusions Parallel population-based algorithms can effectively determine network parameters and they perform better than the widely-used sequential algorithms in gene network inference. These parallel algorithms can be distributed to the cloud computing environment to speed up the computation. By coupling the parallel model population-based optimization method and the parallel computational framework, high quality solutions can be obtained within relatively short time. This integrated approach is a promising way for inferring large networks. PMID:24428926
Experimental quantum computing to solve systems of linear equations.

PubMed

Cai, X-D; Weedbrook, C; Su, Z-E; Chen, M-C; Gu, Mile; Zhu, M-J; Li, Li; Liu, Nai-Le; Lu, Chao-Yang; Pan, Jian-Wei

2013-06-07

Solving linear systems of equations is ubiquitous in all areas of science and engineering. With rapidly growing data sets, such a task can be intractable for classical computers, as the best known classical algorithms require a time proportional to the number of variables N. A recently proposed quantum algorithm shows that quantum computers could solve linear systems in a time scale of order log(N), giving an exponential speedup over classical computers. Here we realize the simplest instance of this algorithm, solving 2×2 linear equations for various input vectors on a quantum computer. We use four quantum bits and four controlled logic gates to implement every subroutine required, demonstrating the working principle of this algorithm.
A fast D.F.T. algorithm using complex integer transforms

NASA Technical Reports Server (NTRS)

Reed, I. S.; Truong, T. K.

1978-01-01

Winograd (1976) has developed a new class of algorithms which depend heavily on the computation of a cyclic convolution for computing the conventional DFT (discrete Fourier transform); this new algorithm, for a few hundred transform points, requires substantially fewer multiplications than the conventional FFT algorithm. Reed and Truong have defined a special class of finite Fourier-like transforms over GF(q squared), where q = 2 to the p power minus 1 is a Mersenne prime for p = 2, 3, 5, 7, 13, 17, 19, 31, 61. In the present paper it is shown that Winograd's algorithm can be combined with the aforementioned Fourier-like transform to yield a new algorithm for computing the DFT. A fast method for accurately computing the DFT of a sequence of complex numbers of very long transform-lengths is thus obtained.
Parallel, stochastic measurement of molecular surface area.

PubMed

Juba, Derek; Varshney, Amitabh

2008-08-01

Biochemists often wish to compute surface areas of proteins. A variety of algorithms have been developed for this task, but they are designed for traditional single-processor architectures. The current trend in computer hardware is towards increasingly parallel architectures for which these algorithms are not well suited. We describe a parallel, stochastic algorithm for molecular surface area computation that maps well to the emerging multi-core architectures. Our algorithm is also progressive, providing a rough estimate of surface area immediately and refining this estimate as time goes on. Furthermore, the algorithm generates points on the molecular surface which can be used for point-based rendering. We demonstrate a GPU implementation of our algorithm and show that it compares favorably with several existing molecular surface computation programs, giving fast estimates of the molecular surface area with good accuracy.
CCOMP: An efficient algorithm for complex roots computation of determinantal equations

NASA Astrophysics Data System (ADS)

Zouros, Grigorios P.

2018-01-01

In this paper a free Python algorithm, entitled CCOMP (Complex roots COMPutation), is developed for the efficient computation of complex roots of determinantal equations inside a prescribed complex domain. The key to the method presented is the efficient determination of the candidate points inside the domain which, in their close neighborhood, a complex root may lie. Once these points are detected, the algorithm proceeds to a two-dimensional minimization problem with respect to the minimum modulus eigenvalue of the system matrix. In the core of CCOMP exist three sub-algorithms whose tasks are the efficient estimation of the minimum modulus eigenvalues of the system matrix inside the prescribed domain, the efficient computation of candidate points which guarantee the existence of minima, and finally, the computation of minima via bound constrained minimization algorithms. Theoretical results and heuristics support the development and the performance of the algorithm, which is discussed in detail. CCOMP supports general complex matrices, and its efficiency, applicability and validity is demonstrated to a variety of microwave applications.
Cooperative combinatorial optimization: evolutionary computation case study.

PubMed

Burgin, Mark; Eberbach, Eugene

2008-01-01

This paper presents a formalization of the notion of cooperation and competition of multiple systems that work toward a common optimization goal of the population using evolutionary computation techniques. It is proved that evolutionary algorithms are more expressive than conventional recursive algorithms, such as Turing machines. Three classes of evolutionary computations are introduced and studied: bounded finite, unbounded finite, and infinite computations. Universal evolutionary algorithms are constructed. Such properties of evolutionary algorithms as completeness, optimality, and search decidability are examined. A natural extension of evolutionary Turing machine (ETM) model is proposed to properly reflect phenomena of cooperation and competition in the whole population.
Petascale self-consistent electromagnetic computations using scalable and accurate algorithms for complex structures

NASA Astrophysics Data System (ADS)

Cary, John R.; Abell, D.; Amundson, J.; Bruhwiler, D. L.; Busby, R.; Carlsson, J. A.; Dimitrov, D. A.; Kashdan, E.; Messmer, P.; Nieter, C.; Smithe, D. N.; Spentzouris, P.; Stoltz, P.; Trines, R. M.; Wang, H.; Werner, G. R.

2006-09-01

As the size and cost of particle accelerators escalate, high-performance computing plays an increasingly important role; optimization through accurate, detailed computermodeling increases performance and reduces costs. But consequently, computer simulations face enormous challenges. Early approximation methods, such as expansions in distance from the design orbit, were unable to supply detailed accurate results, such as in the computation of wake fields in complex cavities. Since the advent of message-passing supercomputers with thousands of processors, earlier approximations are no longer necessary, and it is now possible to compute wake fields, the effects of dampers, and self-consistent dynamics in cavities accurately. In this environment, the focus has shifted towards the development and implementation of algorithms that scale to large numbers of processors. So-called charge-conserving algorithms evolve the electromagnetic fields without the need for any global solves (which are difficult to scale up to many processors). Using cut-cell (or embedded) boundaries, these algorithms can simulate the fields in complex accelerator cavities with curved walls. New implicit algorithms, which are stable for any time-step, conserve charge as well, allowing faster simulation of structures with details small compared to the characteristic wavelength. These algorithmic and computational advances have been implemented in the VORPAL7 Framework, a flexible, object-oriented, massively parallel computational application that allows run-time assembly of algorithms and objects, thus composing an application on the fly.
A systematic investigation of computation models for predicting Adverse Drug Reactions (ADRs).

PubMed

Kuang, Qifan; Wang, MinQi; Li, Rong; Dong, YongCheng; Li, Yizhou; Li, Menglong

2014-01-01

Early and accurate identification of adverse drug reactions (ADRs) is critically important for drug development and clinical safety. Computer-aided prediction of ADRs has attracted increasing attention in recent years, and many computational models have been proposed. However, because of the lack of systematic analysis and comparison of the different computational models, there remain limitations in designing more effective algorithms and selecting more useful features. There is therefore an urgent need to review and analyze previous computation models to obtain general conclusions that can provide useful guidance to construct more effective computational models to predict ADRs. In the current study, the main work is to compare and analyze the performance of existing computational methods to predict ADRs, by implementing and evaluating additional algorithms that have been earlier used for predicting drug targets. Our results indicated that topological and intrinsic features were complementary to an extent and the Jaccard coefficient had an important and general effect on the prediction of drug-ADR associations. By comparing the structure of each algorithm, final formulas of these algorithms were all converted to linear model in form, based on this finding we propose a new algorithm called the general weighted profile method and it yielded the best overall performance among the algorithms investigated in this paper. Several meaningful conclusions and useful findings regarding the prediction of ADRs are provided for selecting optimal features and algorithms.
Algorithm for space-time analysis of data on geomagnetic field

NASA Technical Reports Server (NTRS)

Kulanin, N. V.; Golokov, V. P. (Editor); Tyupkin, S. (Editor)

1984-01-01

The algorithm for the execution of the space-time analysis of data on geomagnetic fields is described. The primary constraints figuring in the specific realization of the algorithm on a computer stem exclusively from the limited possibilities of the computer involved. It is realized in the form of a program for the BESM-6 computer.
Generalization of the Lord-Wingersky Algorithm to Computing the Distribution of Summed Test Scores Based on Real-Number Item Scores

ERIC Educational Resources Information Center

Kim, Seonghoon

2013-01-01

With known item response theory (IRT) item parameters, Lord and Wingersky provided a recursive algorithm for computing the conditional frequency distribution of number-correct test scores, given proficiency. This article presents a generalized algorithm for computing the conditional distribution of summed test scores involving real-number item…
Fast algorithm for computing complex number-theoretic transforms

NASA Technical Reports Server (NTRS)

Reed, I. S.; Liu, K. Y.; Truong, T. K.

1977-01-01

A high-radix FFT algorithm for computing transforms over FFT, where q is a Mersenne prime, is developed to implement fast circular convolutions. This new algorithm requires substantially fewer multiplications than the conventional FFT.
A Parallel Nonrigid Registration Algorithm Based on B-Spline for Medical Images.

PubMed

Du, Xiaogang; Dang, Jianwu; Wang, Yangping; Wang, Song; Lei, Tao

2016-01-01

The nonrigid registration algorithm based on B-spline Free-Form Deformation (FFD) plays a key role and is widely applied in medical image processing due to the good flexibility and robustness. However, it requires a tremendous amount of computing time to obtain more accurate registration results especially for a large amount of medical image data. To address the issue, a parallel nonrigid registration algorithm based on B-spline is proposed in this paper. First, the Logarithm Squared Difference (LSD) is considered as the similarity metric in the B-spline registration algorithm to improve registration precision. After that, we create a parallel computing strategy and lookup tables (LUTs) to reduce the complexity of the B-spline registration algorithm. As a result, the computing time of three time-consuming steps including B-splines interpolation, LSD computation, and the analytic gradient computation of LSD, is efficiently reduced, for the B-spline registration algorithm employs the Nonlinear Conjugate Gradient (NCG) optimization method. Experimental results of registration quality and execution efficiency on the large amount of medical images show that our algorithm achieves a better registration accuracy in terms of the differences between the best deformation fields and ground truth and a speedup of 17 times over the single-threaded CPU implementation due to the powerful parallel computing ability of Graphics Processing Unit (GPU).
Developing Subdomain Allocation Algorithms Based on Spatial and Communicational Constraints to Accelerate Dust Storm Simulation

PubMed Central

Gui, Zhipeng; Yu, Manzhu; Yang, Chaowei; Jiang, Yunfeng; Chen, Songqing; Xia, Jizhe; Huang, Qunying; Liu, Kai; Li, Zhenlong; Hassan, Mohammed Anowarul; Jin, Baoxuan

2016-01-01

Dust storm has serious disastrous impacts on environment, human health, and assets. The developments and applications of dust storm models have contributed significantly to better understand and predict the distribution, intensity and structure of dust storms. However, dust storm simulation is a data and computing intensive process. To improve the computing performance, high performance computing has been widely adopted by dividing the entire study area into multiple subdomains and allocating each subdomain on different computing nodes in a parallel fashion. Inappropriate allocation may introduce imbalanced task loads and unnecessary communications among computing nodes. Therefore, allocation is a key factor that may impact the efficiency of parallel process. An allocation algorithm is expected to consider the computing cost and communication cost for each computing node to minimize total execution time and reduce overall communication cost for the entire simulation. This research introduces three algorithms to optimize the allocation by considering the spatial and communicational constraints: 1) an Integer Linear Programming (ILP) based algorithm from combinational optimization perspective; 2) a K-Means and Kernighan-Lin combined heuristic algorithm (K&K) integrating geometric and coordinate-free methods by merging local and global partitioning; 3) an automatic seeded region growing based geometric and local partitioning algorithm (ASRG). The performance and effectiveness of the three algorithms are compared based on different factors. Further, we adopt the K&K algorithm as the demonstrated algorithm for the experiment of dust model simulation with the non-hydrostatic mesoscale model (NMM-dust) and compared the performance with the MPI default sequential allocation. The results demonstrate that K&K method significantly improves the simulation performance with better subdomain allocation. This method can also be adopted for other relevant atmospheric and numerical modeling. PMID:27044039
Computer-aided US diagnosis of breast lesions by using cell-based contour grouping.

PubMed

Cheng, Jie-Zhi; Chou, Yi-Hong; Huang, Chiun-Sheng; Chang, Yeun-Chung; Tiu, Chui-Mei; Chen, Kuei-Wu; Chen, Chung-Ming

2010-06-01

To develop a computer-aided diagnostic algorithm with automatic boundary delineation for differential diagnosis of benign and malignant breast lesions at ultrasonography (US) and investigate the effect of boundary quality on the performance of a computer-aided diagnostic algorithm. This was an institutional review board-approved retrospective study with waiver of informed consent. A cell-based contour grouping (CBCG) segmentation algorithm was used to delineate the lesion boundaries automatically. Seven morphologic features were extracted. The classifier was a logistic regression function. Five hundred twenty breast US scans were obtained from 520 subjects (age range, 15-89 years), including 275 benign (mean size, 15 mm; range, 5-35 mm) and 245 malignant (mean size, 18 mm; range, 8-29 mm) lesions. The newly developed computer-aided diagnostic algorithm was evaluated on the basis of boundary quality and differentiation performance. The segmentation algorithms and features in two conventional computer-aided diagnostic algorithms were used for comparative study. The CBCG-generated boundaries were shown to be comparable with the manually delineated boundaries. The area under the receiver operating characteristic curve (AUC) and differentiation accuracy were 0.968 +/- 0.010 and 93.1% +/- 0.7, respectively, for all 520 breast lesions. At the 5% significance level, the newly developed algorithm was shown to be superior to the use of the boundaries and features of the two conventional computer-aided diagnostic algorithms in terms of AUC (0.974 +/- 0.007 versus 0.890 +/- 0.008 and 0.788 +/- 0.024, respectively). The newly developed computer-aided diagnostic algorithm that used a CBCG segmentation method to measure boundaries achieved a high differentiation performance. Copyright RSNA, 2010
Computational complexities and storage requirements of some Riccati equation solvers

NASA Technical Reports Server (NTRS)

Utku, Senol; Garba, John A.; Ramesh, A. V.

1989-01-01

The linear optimal control problem of an nth-order time-invariant dynamic system with a quadratic performance functional is usually solved by the Hamilton-Jacobi approach. This leads to the solution of the differential matrix Riccati equation with a terminal condition. The bulk of the computation for the optimal control problem is related to the solution of this equation. There are various algorithms in the literature for solving the matrix Riccati equation. However, computational complexities and storage requirements as a function of numbers of state variables, control variables, and sensors are not available for all these algorithms. In this work, the computational complexities and storage requirements for some of these algorithms are given. These expressions show the immensity of the computational requirements of the algorithms in solving the Riccati equation for large-order systems such as the control of highly flexible space structures. The expressions are also needed to compute the speedup and efficiency of any implementation of these algorithms on concurrent machines.
Interfacing External Quantum Devices to a Universal Quantum Computer

PubMed Central

Lagana, Antonio A.; Lohe, Max A.; von Smekal, Lorenz

2011-01-01

We present a scheme to use external quantum devices using the universal quantum computer previously constructed. We thereby show how the universal quantum computer can utilize networked quantum information resources to carry out local computations. Such information may come from specialized quantum devices or even from remote universal quantum computers. We show how to accomplish this by devising universal quantum computer programs that implement well known oracle based quantum algorithms, namely the Deutsch, Deutsch-Jozsa, and the Grover algorithms using external black-box quantum oracle devices. In the process, we demonstrate a method to map existing quantum algorithms onto the universal quantum computer. PMID:22216276
Interfacing external quantum devices to a universal quantum computer.

PubMed

Lagana, Antonio A; Lohe, Max A; von Smekal, Lorenz

2011-01-01

We present a scheme to use external quantum devices using the universal quantum computer previously constructed. We thereby show how the universal quantum computer can utilize networked quantum information resources to carry out local computations. Such information may come from specialized quantum devices or even from remote universal quantum computers. We show how to accomplish this by devising universal quantum computer programs that implement well known oracle based quantum algorithms, namely the Deutsch, Deutsch-Jozsa, and the Grover algorithms using external black-box quantum oracle devices. In the process, we demonstrate a method to map existing quantum algorithms onto the universal quantum computer. © 2011 Lagana et al.

Plasmid mapping computer program.

PubMed Central

Nolan, G P; Maina, C V; Szalay, A A

1984-01-01

Three new computer algorithms are described which rapidly order the restriction fragments of a plasmid DNA which has been cleaved with two restriction endonucleases in single and double digestions. Two of the algorithms are contained within a single computer program (called MPCIRC). The Rule-Oriented algorithm, constructs all logical circular map solutions within sixty seconds (14 double-digestion fragments) when used in conjunction with the Permutation method. The program is written in Apple Pascal and runs on an Apple II Plus Microcomputer with 64K of memory. A third algorithm is described which rapidly maps double digests and uses the above two algorithms as adducts. Modifications of the algorithms for linear mapping are also presented. PMID:6320105
High-order hydrodynamic algorithms for exascale computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Morgan, Nathaniel Ray

Hydrodynamic algorithms are at the core of many laboratory missions ranging from simulating ICF implosions to climate modeling. The hydrodynamic algorithms commonly employed at the laboratory and in industry (1) typically lack requisite accuracy for complex multi- material vortical flows and (2) are not well suited for exascale computing due to poor data locality and poor FLOP/memory ratios. Exascale computing requires advances in both computer science and numerical algorithms. We propose to research the second requirement and create a new high-order hydrodynamic algorithm that has superior accuracy, excellent data locality, and excellent FLOP/memory ratios. This proposal will impact a broadmore » range of research areas including numerical theory, discrete mathematics, vorticity evolution, gas dynamics, interface instability evolution, turbulent flows, fluid dynamics and shock driven flows. If successful, the proposed research has the potential to radically transform simulation capabilities and help position the laboratory for computing at the exascale.« less
Performance comparison of heuristic algorithms for task scheduling in IaaS cloud computing environment.

PubMed

Madni, Syed Hamid Hussain; Abd Latiff, Muhammad Shafie; Abdullahi, Mohammed; Abdulhamid, Shafi'i Muhammad; Usman, Mohammed Joda

2017-01-01

Cloud computing infrastructure is suitable for meeting computational needs of large task sizes. Optimal scheduling of tasks in cloud computing environment has been proved to be an NP-complete problem, hence the need for the application of heuristic methods. Several heuristic algorithms have been developed and used in addressing this problem, but choosing the appropriate algorithm for solving task assignment problem of a particular nature is difficult since the methods are developed under different assumptions. Therefore, six rule based heuristic algorithms are implemented and used to schedule autonomous tasks in homogeneous and heterogeneous environments with the aim of comparing their performance in terms of cost, degree of imbalance, makespan and throughput. First Come First Serve (FCFS), Minimum Completion Time (MCT), Minimum Execution Time (MET), Max-min, Min-min and Sufferage are the heuristic algorithms considered for the performance comparison and analysis of task scheduling in cloud computing.
Performance comparison of heuristic algorithms for task scheduling in IaaS cloud computing environment

PubMed Central

Madni, Syed Hamid Hussain; Abd Latiff, Muhammad Shafie; Abdullahi, Mohammed; Usman, Mohammed Joda

2017-01-01

Cloud computing infrastructure is suitable for meeting computational needs of large task sizes. Optimal scheduling of tasks in cloud computing environment has been proved to be an NP-complete problem, hence the need for the application of heuristic methods. Several heuristic algorithms have been developed and used in addressing this problem, but choosing the appropriate algorithm for solving task assignment problem of a particular nature is difficult since the methods are developed under different assumptions. Therefore, six rule based heuristic algorithms are implemented and used to schedule autonomous tasks in homogeneous and heterogeneous environments with the aim of comparing their performance in terms of cost, degree of imbalance, makespan and throughput. First Come First Serve (FCFS), Minimum Completion Time (MCT), Minimum Execution Time (MET), Max-min, Min-min and Sufferage are the heuristic algorithms considered for the performance comparison and analysis of task scheduling in cloud computing. PMID:28467505
160-fold acceleration of the Smith-Waterman algorithm using a field programmable gate array (FPGA)

PubMed Central

Li, Isaac TS; Shum, Warren; Truong, Kevin

2007-01-01

Background To infer homology and subsequently gene function, the Smith-Waterman (SW) algorithm is used to find the optimal local alignment between two sequences. When searching sequence databases that may contain hundreds of millions of sequences, this algorithm becomes computationally expensive. Results In this paper, we focused on accelerating the Smith-Waterman algorithm by using FPGA-based hardware that implemented a module for computing the score of a single cell of the SW matrix. Then using a grid of this module, the entire SW matrix was computed at the speed of field propagation through the FPGA circuit. These modifications dramatically accelerated the algorithm's computation time by up to 160 folds compared to a pure software implementation running on the same FPGA with an Altera Nios II softprocessor. Conclusion This design of FPGA accelerated hardware offers a new promising direction to seeking computation improvement of genomic database searching. PMID:17555593
160-fold acceleration of the Smith-Waterman algorithm using a field programmable gate array (FPGA).

PubMed

Li, Isaac T S; Shum, Warren; Truong, Kevin

2007-06-07

To infer homology and subsequently gene function, the Smith-Waterman (SW) algorithm is used to find the optimal local alignment between two sequences. When searching sequence databases that may contain hundreds of millions of sequences, this algorithm becomes computationally expensive. In this paper, we focused on accelerating the Smith-Waterman algorithm by using FPGA-based hardware that implemented a module for computing the score of a single cell of the SW matrix. Then using a grid of this module, the entire SW matrix was computed at the speed of field propagation through the FPGA circuit. These modifications dramatically accelerated the algorithm's computation time by up to 160 folds compared to a pure software implementation running on the same FPGA with an Altera Nios II softprocessor. This design of FPGA accelerated hardware offers a new promising direction to seeking computation improvement of genomic database searching.
Intelligent fuzzy approach for fast fractal image compression

NASA Astrophysics Data System (ADS)

Nodehi, Ali; Sulong, Ghazali; Al-Rodhaan, Mznah; Al-Dhelaan, Abdullah; Rehman, Amjad; Saba, Tanzila

2014-12-01

Fractal image compression (FIC) is recognized as a NP-hard problem, and it suffers from a high number of mean square error (MSE) computations. In this paper, a two-phase algorithm was proposed to reduce the MSE computation of FIC. In the first phase, based on edge property, range and domains are arranged. In the second one, imperialist competitive algorithm (ICA) is used according to the classified blocks. For maintaining the quality of the retrieved image and accelerating algorithm operation, we divided the solutions into two groups: developed countries and undeveloped countries. Simulations were carried out to evaluate the performance of the developed approach. Promising results thus achieved exhibit performance better than genetic algorithm (GA)-based and Full-search algorithms in terms of decreasing the number of MSE computations. The number of MSE computations was reduced by the proposed algorithm for 463 times faster compared to the Full-search algorithm, although the retrieved image quality did not have a considerable change.
A ridge tracking algorithm and error estimate for efficient computation of Lagrangian coherent structures.

PubMed

Lipinski, Doug; Mohseni, Kamran

2010-03-01

A ridge tracking algorithm for the computation and extraction of Lagrangian coherent structures (LCS) is developed. This algorithm takes advantage of the spatial coherence of LCS by tracking the ridges which form LCS to avoid unnecessary computations away from the ridges. We also make use of the temporal coherence of LCS by approximating the time dependent motion of the LCS with passive tracer particles. To justify this approximation, we provide an estimate of the difference between the motion of the LCS and that of tracer particles which begin on the LCS. In addition to the speedup in computational time, the ridge tracking algorithm uses less memory and results in smaller output files than the standard LCS algorithm. Finally, we apply our ridge tracking algorithm to two test cases, an analytically defined double gyre as well as the more complicated example of the numerical simulation of a swimming jellyfish. In our test cases, we find up to a 35 times speedup when compared with the standard LCS algorithm.
Approximation algorithms for planning and control

NASA Technical Reports Server (NTRS)

Boddy, Mark; Dean, Thomas

1989-01-01

A control system operating in a complex environment will encounter a variety of different situations, with varying amounts of time available to respond to critical events. Ideally, such a control system will do the best possible with the time available. In other words, its responses should approximate those that would result from having unlimited time for computation, where the degree of the approximation depends on the amount of time it actually has. There exist approximation algorithms for a wide variety of problems. Unfortunately, the solution to any reasonably complex control problem will require solving several computationally intensive problems. Algorithms for successive approximation are a subclass of the class of anytime algorithms, algorithms that return answers for any amount of computation time, where the answers improve as more time is allotted. An architecture is described for allocating computation time to a set of anytime algorithms, based on expectations regarding the value of the answers they return. The architecture described is quite general, producing optimal schedules for a set of algorithms under widely varying conditions.
Bulk refrigeration of fruits and vegetables. Part 2: Computer algorithm for heat loads and moisture loss

DOE Office of Scientific and Technical Information (OSTI.GOV)

Becker, B.; Misra, A.; Fricke, B.A.

1997-12-31

A computer algorithm was developed that estimates the latent and sensible heat loads due to the bulk refrigeration of fruits and vegetables. The algorithm also predicts the commodity moisture loss and temperature distribution which occurs during refrigeration. Part 1 focused upon the thermophysical properties of commodities and the flowfield parameters which govern the heat and mass transfer from fresh fruits and vegetables. This paper, Part 2, discusses the modeling methodology utilized in the current computer algorithm and describes the development of the heat and mass transfer models. Part 2 also compares the results of the computer algorithm to experimental datamore » taken from the literature and describes a parametric study which was performed with the algorithm. In addition, this paper also reviews existing numerical models for determining the heat and mass transfer in bulk loads of fruits and vegetables.« less
[Orthogonal Vector Projection Algorithm for Spectral Unmixing].

PubMed

Song, Mei-ping; Xu, Xing-wei; Chang, Chein-I; An, Ju-bai; Yao, Li

2015-12-01

Spectrum unmixing is an important part of hyperspectral technologies, which is essential for material quantity analysis in hyperspectral imagery. Most linear unmixing algorithms require computations of matrix multiplication and matrix inversion or matrix determination. These are difficult for programming, especially hard for realization on hardware. At the same time, the computation costs of the algorithms increase significantly as the number of endmembers grows. Here, based on the traditional algorithm Orthogonal Subspace Projection, a new method called. Orthogonal Vector Projection is prompted using orthogonal principle. It simplifies this process by avoiding matrix multiplication and inversion. It firstly computes the final orthogonal vector via Gram-Schmidt process for each endmember spectrum. And then, these orthogonal vectors are used as projection vector for the pixel signature. The unconstrained abundance can be obtained directly by projecting the signature to the projection vectors, and computing the ratio of projected vector length and orthogonal vector length. Compared to the Orthogonal Subspace Projection and Least Squares Error algorithms, this method does not need matrix inversion, which is much computation costing and hard to implement on hardware. It just completes the orthogonalization process by repeated vector operations, easy for application on both parallel computation and hardware. The reasonability of the algorithm is proved by its relationship with Orthogonal Sub-space Projection and Least Squares Error algorithms. And its computational complexity is also compared with the other two algorithms', which is the lowest one. At last, the experimental results on synthetic image and real image are also provided, giving another evidence for effectiveness of the method.
Evaluation of a Text Compression Algorithm Against Computer-Aided Instruction (CAI) Material.

ERIC Educational Resources Information Center

Knight, Joseph M., Jr.

This report describes the initial evaluation of a text compression algorithm against computer assisted instruction (CAI) material. A review of some concepts related to statistical text compression is followed by a detailed description of a practical text compression algorithm. A simulation of the algorithm was programed and used to obtain…
A projected preconditioned conjugate gradient algorithm for computing many extreme eigenpairs of a Hermitian matrix [A projected preconditioned conjugate gradient algorithm for computing a large eigenspace of a Hermitian matrix

DOE PAGES

Vecharynski, Eugene; Yang, Chao; Pask, John E.

2015-02-25

Here, we present an iterative algorithm for computing an invariant subspace associated with the algebraically smallest eigenvalues of a large sparse or structured Hermitian matrix A. We are interested in the case in which the dimension of the invariant subspace is large (e.g., over several hundreds or thousands) even though it may still be small relative to the dimension of A. These problems arise from, for example, density functional theory (DFT) based electronic structure calculations for complex materials. The key feature of our algorithm is that it performs fewer Rayleigh–Ritz calculations compared to existing algorithms such as the locally optimalmore » block preconditioned conjugate gradient or the Davidson algorithm. It is a block algorithm, and hence can take advantage of efficient BLAS3 operations and be implemented with multiple levels of concurrency. We discuss a number of practical issues that must be addressed in order to implement the algorithm efficiently on a high performance computer.« less
A Stochastic Total Least Squares Solution of Adaptive Filtering Problem

PubMed Central

Ahmad, Noor Atinah

2014-01-01

An efficient and computationally linear algorithm is derived for total least squares solution of adaptive filtering problem, when both input and output signals are contaminated by noise. The proposed total least mean squares (TLMS) algorithm is designed by recursively computing an optimal solution of adaptive TLS problem by minimizing instantaneous value of weighted cost function. Convergence analysis of the algorithm is given to show the global convergence of the proposed algorithm, provided that the stepsize parameter is appropriately chosen. The TLMS algorithm is computationally simpler than the other TLS algorithms and demonstrates a better performance as compared with the least mean square (LMS) and normalized least mean square (NLMS) algorithms. It provides minimum mean square deviation by exhibiting better convergence in misalignment for unknown system identification under noisy inputs. PMID:24688412
Postprocessing of Voxel-Based Topologies for Additive Manufacturing Using the Computational Geometry Algorithms Library (CGAL)

DTIC Science & Technology

2015-06-01

10-2014 to 00-11-2014 4. TITLE AND SUBTITLE Postprocessing of Voxel-Based Topologies for Additive Manufacturing Using the Computational Geometry...ABSTRACT Postprocessing of 3-dimensional (3-D) topologies that are defined as a set of voxels using the Computational Geometry Algorithms Library (CGAL... computational geometry algorithms, several of which are suited to the task. The work flow described in this report involves first defining a set of
Exact and heuristic algorithms for Space Information Flow.

PubMed

Uwitonze, Alfred; Huang, Jiaqing; Ye, Yuanqing; Cheng, Wenqing; Li, Zongpeng

2018-01-01

Space Information Flow (SIF) is a new promising research area that studies network coding in geometric space, such as Euclidean space. The design of algorithms that compute the optimal SIF solutions remains one of the key open problems in SIF. This work proposes the first exact SIF algorithm and a heuristic SIF algorithm that compute min-cost multicast network coding for N (N ≥ 3) given terminal nodes in 2-D Euclidean space. Furthermore, we find that the Butterfly network in Euclidean space is the second example besides the Pentagram network where SIF is strictly better than Euclidean Steiner minimal tree. The exact algorithm design is based on two key techniques: Delaunay triangulation and linear programming. Delaunay triangulation technique helps to find practically good candidate relay nodes, after which a min-cost multicast linear programming model is solved over the terminal nodes and the candidate relay nodes, to compute the optimal multicast network topology, including the optimal relay nodes selected by linear programming from all the candidate relay nodes and the flow rates on the connection links. The heuristic algorithm design is also based on Delaunay triangulation and linear programming techniques. The exact algorithm can achieve the optimal SIF solution with an exponential computational complexity, while the heuristic algorithm can achieve the sub-optimal SIF solution with a polynomial computational complexity. We prove the correctness of the exact SIF algorithm. The simulation results show the effectiveness of the heuristic SIF algorithm.
Computationally efficient algorithms for real-time attitude estimation

NASA Technical Reports Server (NTRS)

Pringle, Steven R.

1993-01-01

For many practical spacecraft applications, algorithms for determining spacecraft attitude must combine inputs from diverse sensors and provide redundancy in the event of sensor failure. A Kalman filter is suitable for this task, however, it may impose a computational burden which may be avoided by sub optimal methods. A suboptimal estimator is presented which was implemented successfully on the Delta Star spacecraft which performed a 9 month SDI flight experiment in 1989. This design sought to minimize algorithm complexity to accommodate the limitations of an 8K guidance computer. The algorithm used is interpreted in the framework of Kalman filtering and a derivation is given for the computation.
Real-time slicing algorithm for Stereolithography (STL) CAD model applied in additive manufacturing industry

NASA Astrophysics Data System (ADS)

Adnan, F. A.; Romlay, F. R. M.; Shafiq, M.

2018-04-01

Owing to the advent of the industrial revolution 4.0, the need for further evaluating processes applied in the additive manufacturing application particularly the computational process for slicing is non-trivial. This paper evaluates a real-time slicing algorithm for slicing an STL formatted computer-aided design (CAD). A line-plane intersection equation was applied to perform the slicing procedure at any given height. The application of this algorithm has found to provide a better computational time regardless the number of facet in the STL model. The performance of this algorithm is evaluated by comparing the results of the computational time for different geometry.
Toward a computational psycholinguistics of reference production.

PubMed

van Deemter, Kees; Gatt, Albert; van Gompel, Roger P G; Krahmer, Emiel

2012-04-01

This article introduces the topic ''Production of Referring Expressions: Bridging the Gap between Computational and Empirical Approaches to Reference'' of the journal Topics in Cognitive Science. We argue that computational and psycholinguistic approaches to reference production can benefit from closer interaction, and that this is likely to result in the construction of algorithms that differ markedly from the ones currently known in the computational literature. We focus particularly on determinism, the feature of existing algorithms that is perhaps most clearly at odds with psycholinguistic results, discussing how future algorithms might include non-determinism, and how new psycholinguistic experiments could inform the development of such algorithms. Copyright © 2012 Cognitive Science Society, Inc.
Obtaining lower bounds from the progressive hedging algorithm for stochastic mixed-integer programs

DOE PAGES

Gade, Dinakar; Hackebeil, Gabriel; Ryan, Sarah M.; ...

2016-04-02

We present a method for computing lower bounds in the progressive hedging algorithm (PHA) for two-stage and multi-stage stochastic mixed-integer programs. Computing lower bounds in the PHA allows one to assess the quality of the solutions generated by the algorithm contemporaneously. The lower bounds can be computed in any iteration of the algorithm by using dual prices that are calculated during execution of the standard PHA. In conclusion, we report computational results on stochastic unit commitment and stochastic server location problem instances, and explore the relationship between key PHA parameters and the quality of the resulting lower bounds.

Reversible Data Hiding Based on DNA Computing

PubMed Central

Xie, Yingjie

2017-01-01

Biocomputing, especially DNA, computing has got great development. It is widely used in information security. In this paper, a novel algorithm of reversible data hiding based on DNA computing is proposed. Inspired by the algorithm of histogram modification, which is a classical algorithm for reversible data hiding, we combine it with DNA computing to realize this algorithm based on biological technology. Compared with previous results, our experimental results have significantly improved the ER (Embedding Rate). Furthermore, some PSNR (peak signal-to-noise ratios) of test images are also improved. Experimental results show that it is suitable for protecting the copyright of cover image in DNA-based information security. PMID:28280504
Efficient Geometric Sound Propagation Using Visibility Culling

NASA Astrophysics Data System (ADS)

Chandak, Anish

2011-07-01

Simulating propagation of sound can improve the sense of realism in interactive applications such as video games and can lead to better designs in engineering applications such as architectural acoustics. In this thesis, we present geometric sound propagation techniques which are faster than prior methods and map well to upcoming parallel multi-core CPUs. We model specular reflections by using the image-source method and model finite-edge diffraction by using the well-known Biot-Tolstoy-Medwin (BTM) model. We accelerate the computation of specular reflections by applying novel visibility algorithms, FastV and AD-Frustum, which compute visibility from a point. We accelerate finite-edge diffraction modeling by applying a novel visibility algorithm which computes visibility from a region. Our visibility algorithms are based on frustum tracing and exploit recent advances in fast ray-hierarchy intersections, data-parallel computations, and scalable, multi-core algorithms. The AD-Frustum algorithm adapts its computation to the scene complexity and allows small errors in computing specular reflection paths for higher computational efficiency. FastV and our visibility algorithm from a region are general, object-space, conservative visibility algorithms that together significantly reduce the number of image sources compared to other techniques while preserving the same accuracy. Our geometric propagation algorithms are an order of magnitude faster than prior approaches for modeling specular reflections and two to ten times faster for modeling finite-edge diffraction. Our algorithms are interactive, scale almost linearly on multi-core CPUs, and can handle large, complex, and dynamic scenes. We also compare the accuracy of our sound propagation algorithms with other methods. Once sound propagation is performed, it is desirable to listen to the propagated sound in interactive and engineering applications. We can generate smooth, artifact-free output audio signals by applying efficient audio-processing algorithms. We also present the first efficient audio-processing algorithm for scenarios with simultaneously moving source and moving receiver (MS-MR) which incurs less than 25% overhead compared to static source and moving receiver (SS-MR) or moving source and static receiver (MS-SR) scenario.
Real-time polarization imaging algorithm for camera-based polarization navigation sensors.

PubMed

Lu, Hao; Zhao, Kaichun; You, Zheng; Huang, Kaoli

2017-04-10

Biologically inspired polarization navigation is a promising approach due to its autonomous nature, high precision, and robustness. Many researchers have built point source-based and camera-based polarization navigation prototypes in recent years. Camera-based prototypes can benefit from their high spatial resolution but incur a heavy computation load. The pattern recognition algorithm in most polarization imaging algorithms involves several nonlinear calculations that impose a significant computation burden. In this paper, the polarization imaging and pattern recognition algorithms are optimized through reduction to several linear calculations by exploiting the orthogonality of the Stokes parameters without affecting precision according to the features of the solar meridian and the patterns of the polarized skylight. The algorithm contains a pattern recognition algorithm with a Hough transform as well as orientation measurement algorithms. The algorithm was loaded and run on a digital signal processing system to test its computational complexity. The test showed that the running time decreased to several tens of milliseconds from several thousand milliseconds. Through simulations and experiments, it was found that the algorithm can measure orientation without reducing precision. It can hence satisfy the practical demands of low computational load and high precision for use in embedded systems.
Molecular Isotopic Distribution Analysis (MIDAs) with Adjustable Mass Accuracy

NASA Astrophysics Data System (ADS)

Alves, Gelio; Ogurtsov, Aleksey Y.; Yu, Yi-Kuo

2014-01-01

In this paper, we present Molecular Isotopic Distribution Analysis (MIDAs), a new software tool designed to compute molecular isotopic distributions with adjustable accuracies. MIDAs offers two algorithms, one polynomial-based and one Fourier-transform-based, both of which compute molecular isotopic distributions accurately and efficiently. The polynomial-based algorithm contains few novel aspects, whereas the Fourier-transform-based algorithm consists mainly of improvements to other existing Fourier-transform-based algorithms. We have benchmarked the performance of the two algorithms implemented in MIDAs with that of eight software packages (BRAIN, Emass, Mercury, Mercury5, NeutronCluster, Qmass, JFC, IC) using a consensus set of benchmark molecules. Under the proposed evaluation criteria, MIDAs's algorithms, JFC, and Emass compute with comparable accuracy the coarse-grained (low-resolution) isotopic distributions and are more accurate than the other software packages. For fine-grained isotopic distributions, we compared IC, MIDAs's polynomial algorithm, and MIDAs's Fourier transform algorithm. Among the three, IC and MIDAs's polynomial algorithm compute isotopic distributions that better resemble their corresponding exact fine-grained (high-resolution) isotopic distributions. MIDAs can be accessed freely through a user-friendly web-interface at http://www.ncbi.nlm.nih.gov/CBBresearch/Yu/midas/index.html.
Molecular Isotopic Distribution Analysis (MIDAs) with adjustable mass accuracy.

PubMed

Alves, Gelio; Ogurtsov, Aleksey Y; Yu, Yi-Kuo

2014-01-01

In this paper, we present Molecular Isotopic Distribution Analysis (MIDAs), a new software tool designed to compute molecular isotopic distributions with adjustable accuracies. MIDAs offers two algorithms, one polynomial-based and one Fourier-transform-based, both of which compute molecular isotopic distributions accurately and efficiently. The polynomial-based algorithm contains few novel aspects, whereas the Fourier-transform-based algorithm consists mainly of improvements to other existing Fourier-transform-based algorithms. We have benchmarked the performance of the two algorithms implemented in MIDAs with that of eight software packages (BRAIN, Emass, Mercury, Mercury5, NeutronCluster, Qmass, JFC, IC) using a consensus set of benchmark molecules. Under the proposed evaluation criteria, MIDAs's algorithms, JFC, and Emass compute with comparable accuracy the coarse-grained (low-resolution) isotopic distributions and are more accurate than the other software packages. For fine-grained isotopic distributions, we compared IC, MIDAs's polynomial algorithm, and MIDAs's Fourier transform algorithm. Among the three, IC and MIDAs's polynomial algorithm compute isotopic distributions that better resemble their corresponding exact fine-grained (high-resolution) isotopic distributions. MIDAs can be accessed freely through a user-friendly web-interface at http://www.ncbi.nlm.nih.gov/CBBresearch/Yu/midas/index.html.
Exploration of a physiologically-inspired hearing-aid algorithm using a computer model mimicking impaired hearing.

PubMed

Jürgens, Tim; Clark, Nicholas R; Lecluyse, Wendy; Meddis, Ray

2016-01-01

To use a computer model of impaired hearing to explore the effects of a physiologically-inspired hearing-aid algorithm on a range of psychoacoustic measures. A computer model of a hypothetical impaired listener's hearing was constructed by adjusting parameters of a computer model of normal hearing. Absolute thresholds, estimates of compression, and frequency selectivity (summarized to a hearing profile) were assessed using this model with and without pre-processing the stimuli by a hearing-aid algorithm. The influence of different settings of the algorithm on the impaired profile was investigated. To validate the model predictions, the effect of the algorithm on hearing profiles of human impaired listeners was measured. A computer model simulating impaired hearing (total absence of basilar membrane compression) was used, and three hearing-impaired listeners participated. The hearing profiles of the model and the listeners showed substantial changes when the test stimuli were pre-processed by the hearing-aid algorithm. These changes consisted of lower absolute thresholds, steeper temporal masking curves, and sharper psychophysical tuning curves. The hearing-aid algorithm affected the impaired hearing profile of the model to approximate a normal hearing profile. Qualitatively similar results were found with the impaired listeners' hearing profiles.
Computing rank-revealing QR factorizations of dense matrices.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bischof, C. H.; Quintana-Orti, G.; Mathematics and Computer Science

1998-06-01

We develop algorithms and implementations for computing rank-revealing QR (RRQR) factorizations of dense matrices. First, we develop an efficient block algorithm for approximating an RRQR factorization, employing a windowed version of the commonly used Golub pivoting strategy, aided by incremental condition estimation. Second, we develop efficiently implementable variants of guaranteed reliable RRQR algorithms for triangular matrices originally suggested by Chandrasekaran and Ipsen and by Pan and Tang. We suggest algorithmic improvements with respect to condition estimation, termination criteria, and Givens updating. By combining the block algorithm with one of the triangular postprocessing steps, we arrive at an efficient and reliablemore » algorithm for computing an RRQR factorization of a dense matrix. Experimental results on IBM RS/6000 SGI R8000 platforms show that this approach performs up to three times faster that the less reliable QR factorization with column pivoting as it is currently implemented in LAPACK, and comes within 15% of the performance of the LAPACK block algorithm for computing a QR factorization without any column exchanges. Thus, we expect this routine to be useful in may circumstances where numerical rank deficiency cannot be ruled out, but currently has been ignored because of the computational cost of dealing with it.« less
Real-time optical flow estimation on a GPU for a skied-steered mobile robot

NASA Astrophysics Data System (ADS)

Kniaz, V. V.

2016-04-01

Accurate egomotion estimation is required for mobile robot navigation. Often the egomotion is estimated using optical flow algorithms. For an accurate estimation of optical flow most of modern algorithms require high memory resources and processor speed. However simple single-board computers that control the motion of the robot usually do not provide such resources. On the other hand, most of modern single-board computers are equipped with an embedded GPU that could be used in parallel with a CPU to improve the performance of the optical flow estimation algorithm. This paper presents a new Z-flow algorithm for efficient computation of an optical flow using an embedded GPU. The algorithm is based on the phase correlation optical flow estimation and provide a real-time performance on a low cost embedded GPU. The layered optical flow model is used. Layer segmentation is performed using graph-cut algorithm with a time derivative based energy function. Such approach makes the algorithm both fast and robust in low light and low texture conditions. The algorithm implementation for a Raspberry Pi Model B computer is discussed. For evaluation of the algorithm the computer was mounted on a Hercules mobile skied-steered robot equipped with a monocular camera. The evaluation was performed using a hardware-in-the-loop simulation and experiments with Hercules mobile robot. Also the algorithm was evaluated using KITTY Optical Flow 2015 dataset. The resulting endpoint error of the optical flow calculated with the developed algorithm was low enough for navigation of the robot along the desired trajectory.
A parallel algorithm for the initial screening of space debris collisions prediction using the SGP4/SDP4 models and GPU acceleration

NASA Astrophysics Data System (ADS)

Lin, Mingpei; Xu, Ming; Fu, Xiaoyu

2017-05-01

Currently, a tremendous amount of space debris in Earth's orbit imperils operational spacecraft. It is essential to undertake risk assessments of collisions and predict dangerous encounters in space. However, collision predictions for an enormous amount of space debris give rise to large-scale computations. In this paper, a parallel algorithm is established on the Compute Unified Device Architecture (CUDA) platform of NVIDIA Corporation for collision prediction. According to the parallel structure of NVIDIA graphics processors, a block decomposition strategy is adopted in the algorithm. Space debris is divided into batches, and the computation and data transfer operations of adjacent batches overlap. As a consequence, the latency to access shared memory during the entire computing process is significantly reduced, and a higher computing speed is reached. Theoretically, a simulation of collision prediction for space debris of any amount and for any time span can be executed. To verify this algorithm, a simulation example including 1382 pieces of debris, whose operational time scales vary from 1 min to 3 days, is conducted on Tesla C2075 of NVIDIA. The simulation results demonstrate that with the same computational accuracy as that of a CPU, the computing speed of the parallel algorithm on a GPU is 30 times that on a CPU. Based on this algorithm, collision prediction of over 150 Chinese spacecraft for a time span of 3 days can be completed in less than 3 h on a single computer, which meets the timeliness requirement of the initial screening task. Furthermore, the algorithm can be adapted for multiple tasks, including particle filtration, constellation design, and Monte-Carlo simulation of an orbital computation.
Computer Music

NASA Astrophysics Data System (ADS)

Cook, Perry R.

This chapter covers algorithms, technologies, computer languages, and systems for computer music. Computer music involves the application of computers and other digital/electronic technologies to music composition, performance, theory, history, and the study of perception. The field combines digital signal processing, computational algorithms, computer languages, hardware and software systems, acoustics, psychoacoustics (low-level perception of sounds from the raw acoustic signal), and music cognition (higher-level perception of musical style, form, emotion, etc.).
An efficient and accurate 3D displacements tracking strategy for digital volume correlation

NASA Astrophysics Data System (ADS)

Pan, Bing; Wang, Bo; Wu, Dafang; Lubineau, Gilles

2014-07-01

Owing to its inherent computational complexity, practical implementation of digital volume correlation (DVC) for internal displacement and strain mapping faces important challenges in improving its computational efficiency. In this work, an efficient and accurate 3D displacement tracking strategy is proposed for fast DVC calculation. The efficiency advantage is achieved by using three improvements. First, to eliminate the need of updating Hessian matrix in each iteration, an efficient 3D inverse compositional Gauss-Newton (3D IC-GN) algorithm is introduced to replace existing forward additive algorithms for accurate sub-voxel displacement registration. Second, to ensure the 3D IC-GN algorithm that converges accurately and rapidly and avoid time-consuming integer-voxel displacement searching, a generalized reliability-guided displacement tracking strategy is designed to transfer accurate and complete initial guess of deformation for each calculation point from its computed neighbors. Third, to avoid the repeated computation of sub-voxel intensity interpolation coefficients, an interpolation coefficient lookup table is established for tricubic interpolation. The computational complexity of the proposed fast DVC and the existing typical DVC algorithms are first analyzed quantitatively according to necessary arithmetic operations. Then, numerical tests are performed to verify the performance of the fast DVC algorithm in terms of measurement accuracy and computational efficiency. The experimental results indicate that, compared with the existing DVC algorithm, the presented fast DVC algorithm produces similar precision and slightly higher accuracy at a substantially reduced computational cost.
A Systematic Investigation of Computation Models for Predicting Adverse Drug Reactions (ADRs)

PubMed Central

Kuang, Qifan; Wang, MinQi; Li, Rong; Dong, YongCheng; Li, Yizhou; Li, Menglong

2014-01-01

Background Early and accurate identification of adverse drug reactions (ADRs) is critically important for drug development and clinical safety. Computer-aided prediction of ADRs has attracted increasing attention in recent years, and many computational models have been proposed. However, because of the lack of systematic analysis and comparison of the different computational models, there remain limitations in designing more effective algorithms and selecting more useful features. There is therefore an urgent need to review and analyze previous computation models to obtain general conclusions that can provide useful guidance to construct more effective computational models to predict ADRs. Principal Findings In the current study, the main work is to compare and analyze the performance of existing computational methods to predict ADRs, by implementing and evaluating additional algorithms that have been earlier used for predicting drug targets. Our results indicated that topological and intrinsic features were complementary to an extent and the Jaccard coefficient had an important and general effect on the prediction of drug-ADR associations. By comparing the structure of each algorithm, final formulas of these algorithms were all converted to linear model in form, based on this finding we propose a new algorithm called the general weighted profile method and it yielded the best overall performance among the algorithms investigated in this paper. Conclusion Several meaningful conclusions and useful findings regarding the prediction of ADRs are provided for selecting optimal features and algorithms. PMID:25180585
Far-field radiation patterns of aperture antennas by the Winograd Fourier transform algorithm

NASA Technical Reports Server (NTRS)

Heisler, R.

1978-01-01

A more time-efficient algorithm for computing the discrete Fourier transform, the Winograd Fourier transform (WFT), is described. The WFT algorithm is compared with other transform algorithms. Results indicate that the WFT algorithm in antenna analysis appears to be a very successful application. Significant savings in cpu time will improve the computer turn around time and circumvent the need to resort to weekend runs.
A New Augmentation Based Algorithm for Extracting Maximal Chordal Subgraphs.

PubMed

Bhowmick, Sanjukta; Chen, Tzu-Yi; Halappanavar, Mahantesh

2015-02-01

A graph is chordal if every cycle of length greater than three contains an edge between non-adjacent vertices. Chordal graphs are of interest both theoretically, since they admit polynomial time solutions to a range of NP-hard graph problems, and practically, since they arise in many applications including sparse linear algebra, computer vision, and computational biology. A maximal chordal subgraph is a chordal subgraph that is not a proper subgraph of any other chordal subgraph. Existing algorithms for computing maximal chordal subgraphs depend on dynamically ordering the vertices, which is an inherently sequential process and therefore limits the algorithms' parallelizability. In this paper we explore techniques to develop a scalable parallel algorithm for extracting a maximal chordal subgraph. We demonstrate that an earlier attempt at developing a parallel algorithm may induce a non-optimal vertex ordering and is therefore not guaranteed to terminate with a maximal chordal subgraph. We then give a new algorithm that first computes and then repeatedly augments a spanning chordal subgraph. After proving that the algorithm terminates with a maximal chordal subgraph, we then demonstrate that this algorithm is more amenable to parallelization and that the parallel version also terminates with a maximal chordal subgraph. That said, the complexity of the new algorithm is higher than that of the previous parallel algorithm, although the earlier algorithm computes a chordal subgraph which is not guaranteed to be maximal. We experimented with our augmentation-based algorithm on both synthetic and real-world graphs. We provide scalability results and also explore the effect of different choices for the initial spanning chordal subgraph on both the running time and on the number of edges in the maximal chordal subgraph.
TSaT-MUSIC: a novel algorithm for rapid and accurate ultrasonic 3D localization

NASA Astrophysics Data System (ADS)

Mizutani, Kyohei; Ito, Toshio; Sugimoto, Masanori; Hashizume, Hiromichi

2011-12-01

We describe a fast and accurate indoor localization technique using the multiple signal classification (MUSIC) algorithm. The MUSIC algorithm is known as a high-resolution method for estimating directions of arrival (DOAs) or propagation delays. A critical problem in using the MUSIC algorithm for localization is its computational complexity. Therefore, we devised a novel algorithm called Time Space additional Temporal-MUSIC, which can rapidly and simultaneously identify DOAs and delays of mul-ticarrier ultrasonic waves from transmitters. Computer simulations have proved that the computation time of the proposed algorithm is almost constant in spite of increasing numbers of incoming waves and is faster than that of existing methods based on the MUSIC algorithm. The robustness of the proposed algorithm is discussed through simulations. Experiments in real environments showed that the standard deviation of position estimations in 3D space is less than 10 mm, which is satisfactory for indoor localization.
Field-Programmable Gate Array Computer in Structural Analysis: An Initial Exploration

NASA Technical Reports Server (NTRS)

Singleterry, Robert C., Jr.; Sobieszczanski-Sobieski, Jaroslaw; Brown, Samuel

2002-01-01

This paper reports on an initial assessment of using a Field-Programmable Gate Array (FPGA) computational device as a new tool for solving structural mechanics problems. A FPGA is an assemblage of binary gates arranged in logical blocks that are interconnected via software in a manner dependent on the algorithm being implemented and can be reprogrammed thousands of times per second. In effect, this creates a computer specialized for the problem that automatically exploits all the potential for parallel computing intrinsic in an algorithm. This inherent parallelism is the most important feature of the FPGA computational environment. It is therefore important that if a problem offers a choice of different solution algorithms, an algorithm of a higher degree of inherent parallelism should be selected. It is found that in structural analysis, an 'analog computer' style of programming, which solves problems by direct simulation of the terms in the governing differential equations, yields a more favorable solution algorithm than current solution methods. This style of programming is facilitated by a 'drag-and-drop' graphic programming language that is supplied with the particular type of FPGA computer reported in this paper. Simple examples in structural dynamics and statics illustrate the solution approach used. The FPGA system also allows linear scalability in computing capability. As the problem grows, the number of FPGA chips can be increased with no loss of computing efficiency due to data flow or algorithmic latency that occurs when a single problem is distributed among many conventional processors that operate in parallel. This initial assessment finds the FPGA hardware and software to be in their infancy in regard to the user conveniences; however, they have enormous potential for shrinking the elapsed time of structural analysis solutions if programmed with algorithms that exhibit inherent parallelism and linear scalability. This potential warrants further development of FPGA-tailored algorithms for structural analysis.
Signal and image processing algorithm performance in a virtual and elastic computing environment

NASA Astrophysics Data System (ADS)

Bennett, Kelly W.; Robertson, James

2013-05-01

The U.S. Army Research Laboratory (ARL) supports the development of classification, detection, tracking, and localization algorithms using multiple sensing modalities including acoustic, seismic, E-field, magnetic field, PIR, and visual and IR imaging. Multimodal sensors collect large amounts of data in support of algorithm development. The resulting large amount of data, and their associated high-performance computing needs, increases and challenges existing computing infrastructures. Purchasing computer power as a commodity using a Cloud service offers low-cost, pay-as-you-go pricing models, scalability, and elasticity that may provide solutions to develop and optimize algorithms without having to procure additional hardware and resources. This paper provides a detailed look at using a commercial cloud service provider, such as Amazon Web Services (AWS), to develop and deploy simple signal and image processing algorithms in a cloud and run the algorithms on a large set of data archived in the ARL Multimodal Signatures Database (MMSDB). Analytical results will provide performance comparisons with existing infrastructure. A discussion on using cloud computing with government data will discuss best security practices that exist within cloud services, such as AWS.
Parallel-vector unsymmetric Eigen-Solver on high performance computers

NASA Technical Reports Server (NTRS)

Nguyen, Duc T.; Jiangning, Qin

1993-01-01

The popular QR algorithm for solving all eigenvalues of an unsymmetric matrix is reviewed. Among the basic components in the QR algorithm, it was concluded from this study, that the reduction of an unsymmetric matrix to a Hessenberg form (before applying the QR algorithm itself) can be done effectively by exploiting the vector speed and multiple processors offered by modern high-performance computers. Numerical examples of several test cases have indicated that the proposed parallel-vector algorithm for converting a given unsymmetric matrix to a Hessenberg form offers computational advantages over the existing algorithm. The time saving obtained by the proposed methods is increased as the problem size increased.
A rapid algorithm for realistic human reaching and its use in a virtual reality system

NASA Technical Reports Server (NTRS)

Aldridge, Ann; Pandya, Abhilash; Goldsby, Michael; Maida, James

1994-01-01

The Graphics Analysis Facility (GRAF) at JSC has developed a rapid algorithm for computing realistic human reaching. The algorithm was applied to GRAF's anthropometrically correct human model and used in a 3D computer graphics system and a virtual reality system. The nature of the algorithm and its uses are discussed.
Markov chains: computing limit existence and approximations with DNA.

PubMed

Cardona, M; Colomer, M A; Conde, J; Miret, J M; Miró, J; Zaragoza, A

2005-09-01

We present two algorithms to perform computations over Markov chains. The first one determines whether the sequence of powers of the transition matrix of a Markov chain converges or not to a limit matrix. If it does converge, the second algorithm enables us to estimate this limit. The combination of these algorithms allows the computation of a limit using DNA computing. In this sense, we have encoded the states and the transition probabilities using strands of DNA for generating paths of the Markov chain.

Quantum Computation: Entangling with the Future

NASA Technical Reports Server (NTRS)

Jiang, Zhang

2017-01-01

Commercial applications of quantum computation have become viable due to the rapid progress of the field in the recent years. Efficient quantum algorithms are discovered to cope with the most challenging real-world problems that are too hard for classical computers. Manufactured quantum hardware has reached unprecedented precision and controllability, enabling fault-tolerant quantum computation. Here, I give a brief introduction on what principles in quantum mechanics promise its unparalleled computational power. I will discuss several important quantum algorithms that achieve exponential or polynomial speedup over any classical algorithm. Building a quantum computer is a daunting task, and I will talk about the criteria and various implementations of quantum computers. I conclude the talk with near-future commercial applications of a quantum computer.
Rapid solution of large-scale systems of equations

NASA Technical Reports Server (NTRS)

Storaasli, Olaf O.

1994-01-01

The analysis and design of complex aerospace structures requires the rapid solution of large systems of linear and nonlinear equations, eigenvalue extraction for buckling, vibration and flutter modes, structural optimization and design sensitivity calculation. Computers with multiple processors and vector capabilities can offer substantial computational advantages over traditional scalar computer for these analyses. These computers fall into two categories: shared memory computers and distributed memory computers. This presentation covers general-purpose, highly efficient algorithms for generation/assembly or element matrices, solution of systems of linear and nonlinear equations, eigenvalue and design sensitivity analysis and optimization. All algorithms are coded in FORTRAN for shared memory computers and many are adapted to distributed memory computers. The capability and numerical performance of these algorithms will be addressed.
A Parallel Nonrigid Registration Algorithm Based on B-Spline for Medical Images

PubMed Central

Wang, Yangping; Wang, Song

2016-01-01

The nonrigid registration algorithm based on B-spline Free-Form Deformation (FFD) plays a key role and is widely applied in medical image processing due to the good flexibility and robustness. However, it requires a tremendous amount of computing time to obtain more accurate registration results especially for a large amount of medical image data. To address the issue, a parallel nonrigid registration algorithm based on B-spline is proposed in this paper. First, the Logarithm Squared Difference (LSD) is considered as the similarity metric in the B-spline registration algorithm to improve registration precision. After that, we create a parallel computing strategy and lookup tables (LUTs) to reduce the complexity of the B-spline registration algorithm. As a result, the computing time of three time-consuming steps including B-splines interpolation, LSD computation, and the analytic gradient computation of LSD, is efficiently reduced, for the B-spline registration algorithm employs the Nonlinear Conjugate Gradient (NCG) optimization method. Experimental results of registration quality and execution efficiency on the large amount of medical images show that our algorithm achieves a better registration accuracy in terms of the differences between the best deformation fields and ground truth and a speedup of 17 times over the single-threaded CPU implementation due to the powerful parallel computing ability of Graphics Processing Unit (GPU). PMID:28053653
On the performances of computer vision algorithms on mobile platforms

NASA Astrophysics Data System (ADS)

Battiato, S.; Farinella, G. M.; Messina, E.; Puglisi, G.; Ravì, D.; Capra, A.; Tomaselli, V.

2012-01-01

Computer Vision enables mobile devices to extract the meaning of the observed scene from the information acquired with the onboard sensor cameras. Nowadays, there is a growing interest in Computer Vision algorithms able to work on mobile platform (e.g., phone camera, point-and-shot-camera, etc.). Indeed, bringing Computer Vision capabilities on mobile devices open new opportunities in different application contexts. The implementation of vision algorithms on mobile devices is still a challenging task since these devices have poor image sensors and optics as well as limited processing power. In this paper we have considered different algorithms covering classic Computer Vision tasks: keypoint extraction, face detection, image segmentation. Several tests have been done to compare the performances of the involved mobile platforms: Nokia N900, LG Optimus One, Samsung Galaxy SII.
Algorithms for Disconnected Diagrams in Lattice QCD

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gambhir, Arjun Singh; Stathopoulos, Andreas; Orginos, Konstantinos

2016-11-01

Computing disconnected diagrams in Lattice QCD (operator insertion in a quark loop) entails the computationally demanding problem of taking the trace of the all to all quark propagator. We first outline the basic algorithm used to compute a quark loop as well as improvements to this method. Then, we motivate and introduce an algorithm based on the synergy between hierarchical probing and singular value deflation. We present results for the chiral condensate using a 2+1-flavor clover ensemble and compare estimates of the nucleon charges with the basic algorithm.
Complexity of the Quantum Adiabatic Algorithm

NASA Technical Reports Server (NTRS)

Hen, Itay

2013-01-01

The Quantum Adiabatic Algorithm (QAA) has been proposed as a mechanism for efficiently solving optimization problems on a quantum computer. Since adiabatic computation is analog in nature and does not require the design and use of quantum gates, it can be thought of as a simpler and perhaps more profound method for performing quantum computations that might also be easier to implement experimentally. While these features have generated substantial research in QAA, to date there is still a lack of solid evidence that the algorithm can outperform classical optimization algorithms.
Unified algorithm of cone optics to compute solar flux on central receiver

NASA Astrophysics Data System (ADS)

Grigoriev, Victor; Corsi, Clotilde

2017-06-01

Analytical algorithms to compute flux distribution on central receiver are considered as a faster alternative to ray tracing. They have quite too many modifications, with HFLCAL and UNIZAR being the most recognized and verified. In this work, a generalized algorithm is presented which is valid for arbitrary sun shape of radial symmetry. Heliostat mirrors can have a nonrectangular profile, and the effects of shading and blocking, strong defocusing and astigmatism can be taken into account. The algorithm is suitable for parallel computing and can benefit from hardware acceleration of polygon texturing.
Analysis of basic clustering algorithms for numerical estimation of statistical averages in biomolecules.

PubMed

Anandakrishnan, Ramu; Onufriev, Alexey

2008-03-01

In statistical mechanics, the equilibrium properties of a physical system of particles can be calculated as the statistical average over accessible microstates of the system. In general, these calculations are computationally intractable since they involve summations over an exponentially large number of microstates. Clustering algorithms are one of the methods used to numerically approximate these sums. The most basic clustering algorithms first sub-divide the system into a set of smaller subsets (clusters). Then, interactions between particles within each cluster are treated exactly, while all interactions between different clusters are ignored. These smaller clusters have far fewer microstates, making the summation over these microstates, tractable. These algorithms have been previously used for biomolecular computations, but remain relatively unexplored in this context. Presented here, is a theoretical analysis of the error and computational complexity for the two most basic clustering algorithms that were previously applied in the context of biomolecular electrostatics. We derive a tight, computationally inexpensive, error bound for the equilibrium state of a particle computed via these clustering algorithms. For some practical applications, it is the root mean square error, which can be significantly lower than the error bound, that may be more important. We how that there is a strong empirical relationship between error bound and root mean square error, suggesting that the error bound could be used as a computationally inexpensive metric for predicting the accuracy of clustering algorithms for practical applications. An example of error analysis for such an application-computation of average charge of ionizable amino-acids in proteins-is given, demonstrating that the clustering algorithm can be accurate enough for practical purposes.
Duality quantum algorithm efficiently simulates open quantum systems

PubMed Central

Wei, Shi-Jie; Ruan, Dong; Long, Gui-Lu

2016-01-01

Because of inevitable coupling with the environment, nearly all practical quantum systems are open system, where the evolution is not necessarily unitary. In this paper, we propose a duality quantum algorithm for simulating Hamiltonian evolution of an open quantum system. In contrast to unitary evolution in a usual quantum computer, the evolution operator in a duality quantum computer is a linear combination of unitary operators. In this duality quantum algorithm, the time evolution of the open quantum system is realized by using Kraus operators which is naturally implemented in duality quantum computer. This duality quantum algorithm has two distinct advantages compared to existing quantum simulation algorithms with unitary evolution operations. Firstly, the query complexity of the algorithm is O(d3) in contrast to O(d4) in existing unitary simulation algorithm, where d is the dimension of the open quantum system. Secondly, By using a truncated Taylor series of the evolution operators, this duality quantum algorithm provides an exponential improvement in precision compared with previous unitary simulation algorithm. PMID:27464855
Study on the algorithm of computational ghost imaging based on discrete fourier transform measurement matrix

NASA Astrophysics Data System (ADS)

Zhang, Leihong; Liang, Dong; Li, Bei; Kang, Yi; Pan, Zilan; Zhang, Dawei; Gao, Xiumin; Ma, Xiuhua

2016-07-01

On the basis of analyzing the cosine light field with determined analytic expression and the pseudo-inverse method, the object is illuminated by a presetting light field with a determined discrete Fourier transform measurement matrix, and the object image is reconstructed by the pseudo-inverse method. The analytic expression of the algorithm of computational ghost imaging based on discrete Fourier transform measurement matrix is deduced theoretically, and compared with the algorithm of compressive computational ghost imaging based on random measurement matrix. The reconstruction process and the reconstruction error are analyzed. On this basis, the simulation is done to verify the theoretical analysis. When the sampling measurement number is similar to the number of object pixel, the rank of discrete Fourier transform matrix is the same as the one of the random measurement matrix, the PSNR of the reconstruction image of FGI algorithm and PGI algorithm are similar, the reconstruction error of the traditional CGI algorithm is lower than that of reconstruction image based on FGI algorithm and PGI algorithm. As the decreasing of the number of sampling measurement, the PSNR of reconstruction image based on FGI algorithm decreases slowly, and the PSNR of reconstruction image based on PGI algorithm and CGI algorithm decreases sharply. The reconstruction time of FGI algorithm is lower than that of other algorithms and is not affected by the number of sampling measurement. The FGI algorithm can effectively filter out the random white noise through a low-pass filter and realize the reconstruction denoising which has a higher denoising capability than that of the CGI algorithm. The FGI algorithm can improve the reconstruction accuracy and the reconstruction speed of computational ghost imaging.
Robust tuning of robot control systems

NASA Technical Reports Server (NTRS)

Minis, I.; Uebel, M.

1992-01-01

The computed torque control problem is examined for a robot arm with flexible, geared, joint drive systems which are typical in many industrial robots. The standard computed torque algorithm is not directly applicable to this class of manipulators because of the dynamics introduced by the joint drive system. The proposed approach to computed torque control combines a computed torque algorithm with torque controller at each joint. Three such control schemes are proposed. The first scheme uses the joint torque control system currently implemented on the robot arm and a novel form of the computed torque algorithm. The other two use the standard computed torque algorithm and a novel model following torque control system based on model following techniques. Standard tasks and performance indices are used to evaluate the performance of the controllers. Both numerical simulations and experiments are used in evaluation. The study shows that all three proposed systems lead to improved tracking performance over a conventional PD controller.
Computational Workbench for Multibody Dynamics

NASA Technical Reports Server (NTRS)

Edmonds, Karina

2007-01-01

PyCraft is a computer program that provides an interactive, workbenchlike computing environment for developing and testing algorithms for multibody dynamics. Examples of multibody dynamic systems amenable to analysis with the help of PyCraft include land vehicles, spacecraft, robots, and molecular models. PyCraft is based on the Spatial-Operator- Algebra (SOA) formulation for multibody dynamics. The SOA operators enable construction of simple and compact representations of complex multibody dynamical equations. Within the Py-Craft computational workbench, users can, essentially, use the high-level SOA operator notation to represent the variety of dynamical quantities and algorithms and to perform computations interactively. PyCraft provides a Python-language interface to underlying C++ code. Working with SOA concepts, a user can create and manipulate Python-level operator classes in order to implement and evaluate new dynamical quantities and algorithms. During use of PyCraft, virtually all SOA-based algorithms are available for computational experiments.
Multipole Algorithms for Molecular Dynamics Simulation on High Performance Computers.

NASA Astrophysics Data System (ADS)

Elliott, William Dewey

1995-01-01

A fundamental problem in modeling large molecular systems with molecular dynamics (MD) simulations is the underlying N-body problem of computing the interactions between all pairs of N atoms. The simplest algorithm to compute pair-wise atomic interactions scales in runtime {cal O}(N^2), making it impractical for interesting biomolecular systems, which can contain millions of atoms. Recently, several algorithms have become available that solve the N-body problem by computing the effects of all pair-wise interactions while scaling in runtime less than {cal O}(N^2). One algorithm, which scales {cal O}(N) for a uniform distribution of particles, is called the Greengard-Rokhlin Fast Multipole Algorithm (FMA). This work describes an FMA-like algorithm called the Molecular Dynamics Multipole Algorithm (MDMA). The algorithm contains several features that are new to N-body algorithms. MDMA uses new, efficient series expansion equations to compute general 1/r^{n } potentials to arbitrary accuracy. In particular, the 1/r Coulomb potential and the 1/r^6 portion of the Lennard-Jones potential are implemented. The new equations are based on multivariate Taylor series expansions. In addition, MDMA uses a cell-to-cell interaction region of cells that is closely tied to worst case error bounds. The worst case error bounds for MDMA are derived in this work also. These bounds apply to other multipole algorithms as well. Several implementation enhancements are described which apply to MDMA as well as other N-body algorithms such as FMA and tree codes. The mathematics of the cell -to-cell interactions are converted to the Fourier domain for reduced operation count and faster computation. A relative indexing scheme was devised to locate cells in the interaction region which allows efficient pre-computation of redundant information and prestorage of much of the cell-to-cell interaction. Also, MDMA was integrated into the MD program SIgMA to demonstrate the performance of the program over several simulation timesteps. One MD application described here highlights the utility of including long range contributions to Lennard-Jones potential in constant pressure simulations. Another application shows the time dependence of long range forces in a multiple time step MD simulation.
A new fast algorithm for computing a complex number: Theoretic transforms

NASA Technical Reports Server (NTRS)

Reed, I. S.; Liu, K. Y.; Truong, T. K.

1977-01-01

A high-radix fast Fourier transformation (FFT) algorithm for computing transforms over GF(sq q), where q is a Mersenne prime, is developed to implement fast circular convolutions. This new algorithm requires substantially fewer multiplications than the conventional FFT.
Three-dimensional photoacoustic tomography based on graphics-processing-unit-accelerated finite element method.

PubMed

Peng, Kuan; He, Ling; Zhu, Ziqiang; Tang, Jingtian; Xiao, Jiaying

2013-12-01

Compared with commonly used analytical reconstruction methods, the frequency-domain finite element method (FEM) based approach has proven to be an accurate and flexible algorithm for photoacoustic tomography. However, the FEM-based algorithm is computationally demanding, especially for three-dimensional cases. To enhance the algorithm's efficiency, in this work a parallel computational strategy is implemented in the framework of the FEM-based reconstruction algorithm using a graphic-processing-unit parallel frame named the "compute unified device architecture." A series of simulation experiments is carried out to test the accuracy and accelerating effect of the improved method. The results obtained indicate that the parallel calculation does not change the accuracy of the reconstruction algorithm, while its computational cost is significantly reduced by a factor of 38.9 with a GTX 580 graphics card using the improved method.
Fast parallel algorithms that compute transitive closure of a fuzzy relation

NASA Technical Reports Server (NTRS)

Kreinovich, Vladik YA.

1993-01-01

The notion of a transitive closure of a fuzzy relation is very useful for clustering in pattern recognition, for fuzzy databases, etc. The original algorithm proposed by L. Zadeh (1971) requires the computation time O(n(sup 4)), where n is the number of elements in the relation. In 1974, J. C. Dunn proposed a O(n(sup 2)) algorithm. Since we must compute n(n-1)/2 different values s(a, b) (a not equal to b) that represent the fuzzy relation, and we need at least one computational step to compute each of these values, we cannot compute all of them in less than O(n(sup 2)) steps. So, Dunn's algorithm is in this sense optimal. For small n, it is ok. However, for big n (e.g., for big databases), it is still a lot, so it would be desirable to decrease the computation time (this problem was formulated by J. Bezdek). Since this decrease cannot be done on a sequential computer, the only way to do it is to use a computer with several processors working in parallel. We show that on a parallel computer, transitive closure can be computed in time O((log(sub 2)(n))2).
PACCE: Perl Algorithm to Compute Continuum and Equivalent Widths

NASA Astrophysics Data System (ADS)

Riffel, Rogério; Borges Vale, Tibério

2011-05-01

PACCE (Perl Algorithm to Compute continuum and Equivalent Widths) computes continuum and equivalent widths. PACCE is able to determine mean continuum and continuum at line center values, which are helpful in stellar population studies, and is also able to compute the uncertainties in the equivalent widths using photon statistics.
Constraint treatment techniques and parallel algorithms for multibody dynamic analysis. Ph.D. Thesis

NASA Technical Reports Server (NTRS)

Chiou, Jin-Chern

1990-01-01

Computational procedures for kinematic and dynamic analysis of three-dimensional multibody dynamic (MBD) systems are developed from the differential-algebraic equations (DAE's) viewpoint. Constraint violations during the time integration process are minimized and penalty constraint stabilization techniques and partitioning schemes are developed. The governing equations of motion, a two-stage staggered explicit-implicit numerical algorithm, are treated which takes advantage of a partitioned solution procedure. A robust and parallelizable integration algorithm is developed. This algorithm uses a two-stage staggered central difference algorithm to integrate the translational coordinates and the angular velocities. The angular orientations of bodies in MBD systems are then obtained by using an implicit algorithm via the kinematic relationship between Euler parameters and angular velocities. It is shown that the combination of the present solution procedures yields a computationally more accurate solution. To speed up the computational procedures, parallel implementation of the present constraint treatment techniques, the two-stage staggered explicit-implicit numerical algorithm was efficiently carried out. The DAE's and the constraint treatment techniques were transformed into arrowhead matrices to which Schur complement form was derived. By fully exploiting the sparse matrix structural analysis techniques, a parallel preconditioned conjugate gradient numerical algorithm is used to solve the systems equations written in Schur complement form. A software testbed was designed and implemented in both sequential and parallel computers. This testbed was used to demonstrate the robustness and efficiency of the constraint treatment techniques, the accuracy of the two-stage staggered explicit-implicit numerical algorithm, and the speed up of the Schur-complement-based parallel preconditioned conjugate gradient algorithm on a parallel computer.
A new augmentation based algorithm for extracting maximal chordal subgraphs

DOE PAGES

Bhowmick, Sanjukta; Chen, Tzu-Yi; Halappanavar, Mahantesh

2014-10-18

If every cycle of a graph is chordal length greater than three then it contains an edge between non-adjacent vertices. Chordal graphs are of interest both theoretically, since they admit polynomial time solutions to a range of NP-hard graph problems, and practically, since they arise in many applications including sparse linear algebra, computer vision, and computational biology. A maximal chordal subgraph is a chordal subgraph that is not a proper subgraph of any other chordal subgraph. Existing algorithms for computing maximal chordal subgraphs depend on dynamically ordering the vertices, which is an inherently sequential process and therefore limits the algorithms’more » parallelizability. In our paper we explore techniques to develop a scalable parallel algorithm for extracting a maximal chordal subgraph. We demonstrate that an earlier attempt at developing a parallel algorithm may induce a non-optimal vertex ordering and is therefore not guaranteed to terminate with a maximal chordal subgraph. We then give a new algorithm that first computes and then repeatedly augments a spanning chordal subgraph. After proving that the algorithm terminates with a maximal chordal subgraph, we then demonstrate that this algorithm is more amenable to parallelization and that the parallel version also terminates with a maximal chordal subgraph. That said, the complexity of the new algorithm is higher than that of the previous parallel algorithm, although the earlier algorithm computes a chordal subgraph which is not guaranteed to be maximal. Finally, we experimented with our augmentation-based algorithm on both synthetic and real-world graphs. We provide scalability results and also explore the effect of different choices for the initial spanning chordal subgraph on both the running time and on the number of edges in the maximal chordal subgraph.« less
Low-complexity R-peak detection in ECG signals: a preliminary step towards ambulatory fetal monitoring.

PubMed

Rooijakkers, Michiel; Rabotti, Chiara; Bennebroek, Martijn; van Meerbergen, Jef; Mischi, Massimo

2011-01-01

Non-invasive fetal health monitoring during pregnancy has become increasingly important. Recent advances in signal processing technology have enabled fetal monitoring during pregnancy, using abdominal ECG recordings. Ubiquitous ambulatory monitoring for continuous fetal health measurement is however still unfeasible due to the computational complexity of noise robust solutions. In this paper an ECG R-peak detection algorithm for ambulatory R-peak detection is proposed, as part of a fetal ECG detection algorithm. The proposed algorithm is optimized to reduce computational complexity, while increasing the R-peak detection quality compared to existing R-peak detection schemes. Validation of the algorithm is performed on two manually annotated datasets, the MIT/BIH Arrhythmia database and an in-house abdominal database. Both R-peak detection quality and computational complexity are compared to state-of-the-art algorithms as described in the literature. With a detection error rate of 0.22% and 0.12% on the MIT/BIH Arrhythmia and in-house databases, respectively, the quality of the proposed algorithm is comparable to the best state-of-the-art algorithms, at a reduced computational complexity.

Fast adaptive diamond search algorithm for block-matching motion estimation using spatial correlation

NASA Astrophysics Data System (ADS)

Park, Sang-Gon; Jeong, Dong-Seok

2000-12-01

In this paper, we propose a fast adaptive diamond search algorithm (FADS) for block matching motion estimation. Many fast motion estimation algorithms reduce the computational complexity by the UESA (Unimodal Error Surface Assumption) where the matching error monotonically increases as the search moves away from the global minimum point. Recently, many fast BMAs (Block Matching Algorithms) make use of the fact that global minimum points in real world video sequences are centered at the position of zero motion. But these BMAs, especially in large motion, are easily trapped into the local minima and result in poor matching accuracy. So, we propose a new motion estimation algorithm using the spatial correlation among the neighboring blocks. We move the search origin according to the motion vectors of the spatially neighboring blocks and their MAEs (Mean Absolute Errors). The computer simulation shows that the proposed algorithm has almost the same computational complexity with DS (Diamond Search), but enhances PSNR. Moreover, the proposed algorithm gives almost the same PSNR as that of FS (Full Search), even for the large motion with half the computational load.
Randomized Dynamic Mode Decomposition

NASA Astrophysics Data System (ADS)

Erichson, N. Benjamin; Brunton, Steven L.; Kutz, J. Nathan

2017-11-01

The dynamic mode decomposition (DMD) is an equation-free, data-driven matrix decomposition that is capable of providing accurate reconstructions of spatio-temporal coherent structures arising in dynamical systems. We present randomized algorithms to compute the near-optimal low-rank dynamic mode decomposition for massive datasets. Randomized algorithms are simple, accurate and able to ease the computational challenges arising with `big data'. Moreover, randomized algorithms are amenable to modern parallel and distributed computing. The idea is to derive a smaller matrix from the high-dimensional input data matrix using randomness as a computational strategy. Then, the dynamic modes and eigenvalues are accurately learned from this smaller representation of the data, whereby the approximation quality can be controlled via oversampling and power iterations. Here, we present randomized DMD algorithms that are categorized by how many passes the algorithm takes through the data. Specifically, the single-pass randomized DMD does not require data to be stored for subsequent passes. Thus, it is possible to approximately decompose massive fluid flows (stored out of core memory, or not stored at all) using single-pass algorithms, which is infeasible with traditional DMD algorithms.
Symmetric log-domain diffeomorphic Registration: a demons-based approach.

PubMed

Vercauteren, Tom; Pennec, Xavier; Perchant, Aymeric; Ayache, Nicholas

2008-01-01

Modern morphometric studies use non-linear image registration to compare anatomies and perform group analysis. Recently, log-Euclidean approaches have contributed to promote the use of such computational anatomy tools by permitting simple computations of statistics on a rather large class of invertible spatial transformations. In this work, we propose a non-linear registration algorithm perfectly fit for log-Euclidean statistics on diffeomorphisms. Our algorithm works completely in the log-domain, i.e. it uses a stationary velocity field. This implies that we guarantee the invertibility of the deformation and have access to the true inverse transformation. This also means that our output can be directly used for log-Euclidean statistics without relying on the heavy computation of the log of the spatial transformation. As it is often desirable, our algorithm is symmetric with respect to the order of the input images. Furthermore, we use an alternate optimization approach related to Thirion's demons algorithm to provide a fast non-linear registration algorithm. First results show that our algorithm outperforms both the demons algorithm and the recently proposed diffeomorphic demons algorithm in terms of accuracy of the transformation while remaining computationally efficient.
Measuring exertion time, duty cycle and hand activity level for industrial tasks using computer vision.

PubMed

Akkas, Oguz; Lee, Cheng Hsien; Hu, Yu Hen; Harris Adamson, Carisa; Rempel, David; Radwin, Robert G

2017-12-01

Two computer vision algorithms were developed to automatically estimate exertion time, duty cycle (DC) and hand activity level (HAL) from videos of workers performing 50 industrial tasks. The average DC difference between manual frame-by-frame analysis and the computer vision DC was -5.8% for the Decision Tree (DT) algorithm, and 1.4% for the Feature Vector Training (FVT) algorithm. The average HAL difference was 0.5 for the DT algorithm and 0.3 for the FVT algorithm. A sensitivity analysis, conducted to examine the influence that deviations in DC have on HAL, found it remained unaffected when DC error was less than 5%. Thus, a DC error less than 10% will impact HAL less than 0.5 HAL, which is negligible. Automatic computer vision HAL estimates were therefore comparable to manual frame-by-frame estimates. Practitioner Summary: Computer vision was used to automatically estimate exertion time, duty cycle and hand activity level from videos of workers performing industrial tasks.
Parallelization of Nullspace Algorithm for the computation of metabolic pathways

PubMed Central

Jevremović, Dimitrije; Trinh, Cong T.; Srienc, Friedrich; Sosa, Carlos P.; Boley, Daniel

2011-01-01

Elementary mode analysis is a useful metabolic pathway analysis tool in understanding and analyzing cellular metabolism, since elementary modes can represent metabolic pathways with unique and minimal sets of enzyme-catalyzed reactions of a metabolic network under steady state conditions. However, computation of the elementary modes of a genome- scale metabolic network with 100–1000 reactions is very expensive and sometimes not feasible with the commonly used serial Nullspace Algorithm. In this work, we develop a distributed memory parallelization of the Nullspace Algorithm to handle efficiently the computation of the elementary modes of a large metabolic network. We give an implementation in C++ language with the support of MPI library functions for the parallel communication. Our proposed algorithm is accompanied with an analysis of the complexity and identification of major bottlenecks during computation of all possible pathways of a large metabolic network. The algorithm includes methods to achieve load balancing among the compute-nodes and specific communication patterns to reduce the communication overhead and improve efficiency. PMID:22058581
GPU-accelerated computing for Lagrangian coherent structures of multi-body gravitational regimes

NASA Astrophysics Data System (ADS)

Lin, Mingpei; Xu, Ming; Fu, Xiaoyu

2017-04-01

Based on a well-established theoretical foundation, Lagrangian Coherent Structures (LCSs) have elicited widespread research on the intrinsic structures of dynamical systems in many fields, including the field of astrodynamics. Although the application of LCSs in dynamical problems seems straightforward theoretically, its associated computational cost is prohibitive. We propose a block decomposition algorithm developed on Compute Unified Device Architecture (CUDA) platform for the computation of the LCSs of multi-body gravitational regimes. In order to take advantage of GPU's outstanding computing properties, such as Shared Memory, Constant Memory, and Zero-Copy, the algorithm utilizes a block decomposition strategy to facilitate computation of finite-time Lyapunov exponent (FTLE) fields of arbitrary size and timespan. Simulation results demonstrate that this GPU-based algorithm can satisfy double-precision accuracy requirements and greatly decrease the time needed to calculate final results, increasing speed by approximately 13 times. Additionally, this algorithm can be generalized to various large-scale computing problems, such as particle filters, constellation design, and Monte-Carlo simulation.
Computer-Based Algorithmic Determination of Muscle Movement Onset Using M-Mode Ultrasonography

DTIC Science & Technology

2017-05-01

contraction images were analyzed visually and with three different classes of algorithms: pixel standard deviation (SD), high-pass filter and Teager Kaiser...Linear relationships and agreements between computed and visual muscle onset were calculated. The top algorithms were high-pass filtered with a 30 Hz...suggest that computer automated determination using high-pass filtering is a potential objective alternative to visual determination in human
Computer architecture for efficient algorithmic executions in real-time systems: New technology for avionics systems and advanced space vehicles

NASA Technical Reports Server (NTRS)

Carroll, Chester C.; Youngblood, John N.; Saha, Aindam

1987-01-01

Improvements and advances in the development of computer architecture now provide innovative technology for the recasting of traditional sequential solutions into high-performance, low-cost, parallel system to increase system performance. Research conducted in development of specialized computer architecture for the algorithmic execution of an avionics system, guidance and control problem in real time is described. A comprehensive treatment of both the hardware and software structures of a customized computer which performs real-time computation of guidance commands with updated estimates of target motion and time-to-go is presented. An optimal, real-time allocation algorithm was developed which maps the algorithmic tasks onto the processing elements. This allocation is based on the critical path analysis. The final stage is the design and development of the hardware structures suitable for the efficient execution of the allocated task graph. The processing element is designed for rapid execution of the allocated tasks. Fault tolerance is a key feature of the overall architecture. Parallel numerical integration techniques, tasks definitions, and allocation algorithms are discussed. The parallel implementation is analytically verified and the experimental results are presented. The design of the data-driven computer architecture, customized for the execution of the particular algorithm, is discussed.
Subspace algorithms for identifying separable-in-denominator 2D systems with deterministic-stochastic inputs

NASA Astrophysics Data System (ADS)

Ramos, José A.; Mercère, Guillaume

2016-12-01

In this paper, we present an algorithm for identifying two-dimensional (2D) causal, recursive and separable-in-denominator (CRSD) state-space models in the Roesser form with deterministic-stochastic inputs. The algorithm implements the N4SID, PO-MOESP and CCA methods, which are well known in the literature on 1D system identification, but here we do so for the 2D CRSD Roesser model. The algorithm solves the 2D system identification problem by maintaining the constraint structure imposed by the problem (i.e. Toeplitz and Hankel) and computes the horizontal and vertical system orders, system parameter matrices and covariance matrices of a 2D CRSD Roesser model. From a computational point of view, the algorithm has been presented in a unified framework, where the user can select which of the three methods to use. Furthermore, the identification task is divided into three main parts: (1) computing the deterministic horizontal model parameters, (2) computing the deterministic vertical model parameters and (3) computing the stochastic components. Specific attention has been paid to the computation of a stabilised Kalman gain matrix and a positive real solution when required. The efficiency and robustness of the unified algorithm have been demonstrated via a thorough simulation example.
Computer architecture for efficient algorithmic executions in real-time systems: new technology for avionics systems and advanced space vehicles

DOE Office of Scientific and Technical Information (OSTI.GOV)

Carroll, C.C.; Youngblood, J.N.; Saha, A.

1987-12-01

Improvements and advances in the development of computer architecture now provide innovative technology for the recasting of traditional sequential solutions into high-performance, low-cost, parallel system to increase system performance. Research conducted in development of specialized computer architecture for the algorithmic execution of an avionics system, guidance and control problem in real time is described. A comprehensive treatment of both the hardware and software structures of a customized computer which performs real-time computation of guidance commands with updated estimates of target motion and time-to-go is presented. An optimal, real-time allocation algorithm was developed which maps the algorithmic tasks onto the processingmore » elements. This allocation is based on the critical path analysis. The final stage is the design and development of the hardware structures suitable for the efficient execution of the allocated task graph. The processing element is designed for rapid execution of the allocated tasks. Fault tolerance is a key feature of the overall architecture. Parallel numerical integration techniques, tasks definitions, and allocation algorithms are discussed. The parallel implementation is analytically verified and the experimental results are presented. The design of the data-driven computer architecture, customized for the execution of the particular algorithm, is discussed.« less
Using modified fruit fly optimisation algorithm to perform the function test and case studies

NASA Astrophysics Data System (ADS)

Pan, Wen-Tsao

2013-06-01

Evolutionary computation is a computing mode established by practically simulating natural evolutionary processes based on the concept of Darwinian Theory, and it is a common research method. The main contribution of this paper was to reinforce the function of searching for the optimised solution using the fruit fly optimization algorithm (FOA), in order to avoid the acquisition of local extremum solutions. The evolutionary computation has grown to include the concepts of animal foraging behaviour and group behaviour. This study discussed three common evolutionary computation methods and compared them with the modified fruit fly optimization algorithm (MFOA). It further investigated the ability of the three mathematical functions in computing extreme values, as well as the algorithm execution speed and the forecast ability of the forecasting model built using the optimised general regression neural network (GRNN) parameters. The findings indicated that there was no obvious difference between particle swarm optimization and the MFOA in regards to the ability to compute extreme values; however, they were both better than the artificial fish swarm algorithm and FOA. In addition, the MFOA performed better than the particle swarm optimization in regards to the algorithm execution speed, and the forecast ability of the forecasting model built using the MFOA's GRNN parameters was better than that of the other three forecasting models.
NIMBUS-7 ERB MATGEN Science Document

NASA Technical Reports Server (NTRS)

Soule, H. V.

1983-01-01

The ERB algorithms and computer software data flow used to convert sensor data into equivalent radiometric data are described in detail. The NIMBUS satellite location, orientation and sensor orientation algorithms are given. The computer housekeeping and data flow and sensor/data status algorithms are also given.
Novel techniques for data decomposition and load balancing for parallel processing of vision systems: Implementation and evaluation using a motion estimation system

NASA Technical Reports Server (NTRS)

Choudhary, Alok Nidhi; Leung, Mun K.; Huang, Thomas S.; Patel, Janak H.

1989-01-01

Computer vision systems employ a sequence of vision algorithms in which the output of an algorithm is the input of the next algorithm in the sequence. Algorithms that constitute such systems exhibit vastly different computational characteristics, and therefore, require different data decomposition techniques and efficient load balancing techniques for parallel implementation. However, since the input data for a task is produced as the output data of the previous task, this information can be exploited to perform knowledge based data decomposition and load balancing. Presented here are algorithms for a motion estimation system. The motion estimation is based on the point correspondence between the involved images which are a sequence of stereo image pairs. Researchers propose algorithms to obtain point correspondences by matching feature points among stereo image pairs at any two consecutive time instants. Furthermore, the proposed algorithms employ non-iterative procedures, which results in saving considerable amounts of computation time. The system consists of the following steps: (1) extraction of features; (2) stereo match of images in one time instant; (3) time match of images from consecutive time instants; (4) stereo match to compute final unambiguous points; and (5) computation of motion parameters.
Deadbeat Predictive Controllers

NASA Technical Reports Server (NTRS)

Juang, Jer-Nan; Phan, Minh

1997-01-01

Several new computational algorithms are presented to compute the deadbeat predictive control law. The first algorithm makes use of a multi-step-ahead output prediction to compute the control law without explicitly calculating the controllability matrix. The system identification must be performed first and then the predictive control law is designed. The second algorithm uses the input and output data directly to compute the feedback law. It combines the system identification and the predictive control law into one formulation. The third algorithm uses an observable-canonical form realization to design the predictive controller. The relationship between all three algorithms is established through the use of the state-space representation. All algorithms are applicable to multi-input, multi-output systems with disturbance inputs. In addition to the feedback terms, feed forward terms may also be added for disturbance inputs if they are measurable. Although the feedforward terms do not influence the stability of the closed-loop feedback law, they enhance the performance of the controlled system.
Rapid execution of fan beam image reconstruction algorithms using efficient computational techniques and special-purpose processors

NASA Astrophysics Data System (ADS)

Gilbert, B. K.; Robb, R. A.; Chu, A.; Kenue, S. K.; Lent, A. H.; Swartzlander, E. E., Jr.

1981-02-01

Rapid advances during the past ten years of several forms of computer-assisted tomography (CT) have resulted in the development of numerous algorithms to convert raw projection data into cross-sectional images. These reconstruction algorithms are either 'iterative,' in which a large matrix algebraic equation is solved by successive approximation techniques; or 'closed form'. Continuing evolution of the closed form algorithms has allowed the newest versions to produce excellent reconstructed images in most applications. This paper will review several computer software and special-purpose digital hardware implementations of closed form algorithms, either proposed during the past several years by a number of workers or actually implemented in commercial or research CT scanners. The discussion will also cover a number of recently investigated algorithmic modifications which reduce the amount of computation required to execute the reconstruction process, as well as several new special-purpose digital hardware implementations under development in laboratories at the Mayo Clinic.
Quantum speedup of Monte Carlo methods.

PubMed

Montanaro, Ashley

2015-09-08

Monte Carlo methods use random sampling to estimate numerical quantities which are hard to compute deterministically. One important example is the use in statistical physics of rapidly mixing Markov chains to approximately compute partition functions. In this work, we describe a quantum algorithm which can accelerate Monte Carlo methods in a very general setting. The algorithm estimates the expected output value of an arbitrary randomized or quantum subroutine with bounded variance, achieving a near-quadratic speedup over the best possible classical algorithm. Combining the algorithm with the use of quantum walks gives a quantum speedup of the fastest known classical algorithms with rigorous performance bounds for computing partition functions, which use multiple-stage Markov chain Monte Carlo techniques. The quantum algorithm can also be used to estimate the total variation distance between probability distributions efficiently.
A Benders based rolling horizon algorithm for a dynamic facility location problem

DOE PAGES

Marufuzzaman,, Mohammad; Gedik, Ridvan; Roni, Mohammad S.

2016-06-28

This study presents a well-known capacitated dynamic facility location problem (DFLP) that satisfies the customer demand at a minimum cost by determining the time period for opening, closing, or retaining an existing facility in a given location. To solve this challenging NP-hard problem, this paper develops a unique hybrid solution algorithm that combines a rolling horizon algorithm with an accelerated Benders decomposition algorithm. Extensive computational experiments are performed on benchmark test instances to evaluate the hybrid algorithm’s efficiency and robustness in solving the DFLP problem. Computational results indicate that the hybrid Benders based rolling horizon algorithm consistently offers high qualitymore » feasible solutions in a much shorter computational time period than the standalone rolling horizon and accelerated Benders decomposition algorithms in the experimental range.« less
Quantum speedup of Monte Carlo methods

PubMed Central

Montanaro, Ashley

2015-01-01

Monte Carlo methods use random sampling to estimate numerical quantities which are hard to compute deterministically. One important example is the use in statistical physics of rapidly mixing Markov chains to approximately compute partition functions. In this work, we describe a quantum algorithm which can accelerate Monte Carlo methods in a very general setting. The algorithm estimates the expected output value of an arbitrary randomized or quantum subroutine with bounded variance, achieving a near-quadratic speedup over the best possible classical algorithm. Combining the algorithm with the use of quantum walks gives a quantum speedup of the fastest known classical algorithms with rigorous performance bounds for computing partition functions, which use multiple-stage Markov chain Monte Carlo techniques. The quantum algorithm can also be used to estimate the total variation distance between probability distributions efficiently. PMID:26528079
Demonstration of essentiality of entanglement in a Deutsch-like quantum algorithm

NASA Astrophysics Data System (ADS)

Huang, He-Liang; Goswami, Ashutosh K.; Bao, Wan-Su; Panigrahi, Prasanta K.

2018-06-01

Quantum algorithms can be used to efficiently solve certain classically intractable problems by exploiting quantum parallelism. However, the effectiveness of quantum entanglement in quantum computing remains a question of debate. This study presents a new quantum algorithm that shows entanglement could provide advantages over both classical algorithms and quantum algo- rithms without entanglement. Experiments are implemented to demonstrate the proposed algorithm using superconducting qubits. Results show the viability of the algorithm and suggest that entanglement is essential in obtaining quantum speedup for certain problems in quantum computing. The study provides reliable and clear guidance for developing useful quantum algorithms.
Performance comparison of attitude determination, attitude estimation, and nonlinear observers algorithms

NASA Astrophysics Data System (ADS)

MOHAMMED, M. A. SI; BOUSSADIA, H.; BELLAR, A.; ADNANE, A.

2017-01-01

This paper presents a brief synthesis and useful performance analysis of different attitude filtering algorithms (attitude determination algorithms, attitude estimation algorithms, and nonlinear observers) applied to Low Earth Orbit Satellite in terms of accuracy, convergence time, amount of memory, and computation time. This latter is calculated in two ways, using a personal computer and also using On-board computer 750 (OBC 750) that is being used in many SSTL Earth observation missions. The use of this comparative study could be an aided design tool to the designer to choose from an attitude determination or attitude estimation or attitude observer algorithms. The simulation results clearly indicate that the nonlinear Observer is the more logical choice.

Hybrid massively parallel fast sweeping method for static Hamilton-Jacobi equations

NASA Astrophysics Data System (ADS)

Detrixhe, Miles; Gibou, Frédéric

2016-10-01

The fast sweeping method is a popular algorithm for solving a variety of static Hamilton-Jacobi equations. Fast sweeping algorithms for parallel computing have been developed, but are severely limited. In this work, we present a multilevel, hybrid parallel algorithm that combines the desirable traits of two distinct parallel methods. The fine and coarse grained components of the algorithm take advantage of heterogeneous computer architecture common in high performance computing facilities. We present the algorithm and demonstrate its effectiveness on a set of example problems including optimal control, dynamic games, and seismic wave propagation. We give results for convergence, parallel scaling, and show state-of-the-art speedup values for the fast sweeping method.
A genetic algorithm for solving supply chain network design model

NASA Astrophysics Data System (ADS)

Firoozi, Z.; Ismail, N.; Ariafar, S. H.; Tang, S. H.; Ariffin, M. K. M. A.

2013-09-01

Network design is by nature costly and optimization models play significant role in reducing the unnecessary cost components of a distribution network. This study proposes a genetic algorithm to solve a distribution network design model. The structure of the chromosome in the proposed algorithm is defined in a novel way that in addition to producing feasible solutions, it also reduces the computational complexity of the algorithm. Computational results are presented to show the algorithm performance.
Parallelizing flow-accumulation calculations on graphics processing units—From iterative DEM preprocessing algorithm to recursive multiple-flow-direction algorithm

NASA Astrophysics Data System (ADS)

Qin, Cheng-Zhi; Zhan, Lijun

2012-06-01

As one of the important tasks in digital terrain analysis, the calculation of flow accumulations from gridded digital elevation models (DEMs) usually involves two steps in a real application: (1) using an iterative DEM preprocessing algorithm to remove the depressions and flat areas commonly contained in real DEMs, and (2) using a recursive flow-direction algorithm to calculate the flow accumulation for every cell in the DEM. Because both algorithms are computationally intensive, quick calculation of the flow accumulations from a DEM (especially for a large area) presents a practical challenge to personal computer (PC) users. In recent years, rapid increases in hardware capacity of the graphics processing units (GPUs) provided in modern PCs have made it possible to meet this challenge in a PC environment. Parallel computing on GPUs using a compute-unified-device-architecture (CUDA) programming model has been explored to speed up the execution of the single-flow-direction algorithm (SFD). However, the parallel implementation on a GPU of the multiple-flow-direction (MFD) algorithm, which generally performs better than the SFD algorithm, has not been reported. Moreover, GPU-based parallelization of the DEM preprocessing step in the flow-accumulation calculations has not been addressed. This paper proposes a parallel approach to calculate flow accumulations (including both iterative DEM preprocessing and a recursive MFD algorithm) on a CUDA-compatible GPU. For the parallelization of an MFD algorithm (MFD-md), two different parallelization strategies using a GPU are explored. The first parallelization strategy, which has been used in the existing parallel SFD algorithm on GPU, has the problem of computing redundancy. Therefore, we designed a parallelization strategy based on graph theory. The application results show that the proposed parallel approach to calculate flow accumulations on a GPU performs much faster than either sequential algorithms or other parallel GPU-based algorithms based on existing parallelization strategies.
A High-Performance Genetic Algorithm: Using Traveling Salesman Problem as a Case

PubMed Central

Tsai, Chun-Wei; Tseng, Shih-Pang; Yang, Chu-Sing

2014-01-01

This paper presents a simple but efficient algorithm for reducing the computation time of genetic algorithm (GA) and its variants. The proposed algorithm is motivated by the observation that genes common to all the individuals of a GA have a high probability of surviving the evolution and ending up being part of the final solution; as such, they can be saved away to eliminate the redundant computations at the later generations of a GA. To evaluate the performance of the proposed algorithm, we use it not only to solve the traveling salesman problem but also to provide an extensive analysis on the impact it may have on the quality of the end result. Our experimental results indicate that the proposed algorithm can significantly reduce the computation time of GA and GA-based algorithms while limiting the degradation of the quality of the end result to a very small percentage compared to traditional GA. PMID:24892038
A high-performance genetic algorithm: using traveling salesman problem as a case.

PubMed

Tsai, Chun-Wei; Tseng, Shih-Pang; Chiang, Ming-Chao; Yang, Chu-Sing; Hong, Tzung-Pei

2014-01-01

This paper presents a simple but efficient algorithm for reducing the computation time of genetic algorithm (GA) and its variants. The proposed algorithm is motivated by the observation that genes common to all the individuals of a GA have a high probability of surviving the evolution and ending up being part of the final solution; as such, they can be saved away to eliminate the redundant computations at the later generations of a GA. To evaluate the performance of the proposed algorithm, we use it not only to solve the traveling salesman problem but also to provide an extensive analysis on the impact it may have on the quality of the end result. Our experimental results indicate that the proposed algorithm can significantly reduce the computation time of GA and GA-based algorithms while limiting the degradation of the quality of the end result to a very small percentage compared to traditional GA.
Accelerating artificial intelligence with reconfigurable computing

NASA Astrophysics Data System (ADS)

Cieszewski, Radoslaw

Reconfigurable computing is emerging as an important area of research in computer architectures and software systems. Many algorithms can be greatly accelerated by placing the computationally intense portions of an algorithm into reconfigurable hardware. Reconfigurable computing combines many benefits of both software and ASIC implementations. Like software, the mapped circuit is flexible, and can be changed over the lifetime of the system. Similar to an ASIC, reconfigurable systems provide a method to map circuits into hardware. Reconfigurable systems therefore have the potential to achieve far greater performance than software as a result of bypassing the fetch-decode-execute operations of traditional processors, and possibly exploiting a greater level of parallelism. Such a field, where there is many different algorithms which can be accelerated, is an artificial intelligence. This paper presents example hardware implementations of Artificial Neural Networks, Genetic Algorithms and Expert Systems.
Algorithm Calculates Cumulative Poisson Distribution

NASA Technical Reports Server (NTRS)

Bowerman, Paul N.; Nolty, Robert C.; Scheuer, Ernest M.

1992-01-01

Algorithm calculates accurate values of cumulative Poisson distribution under conditions where other algorithms fail because numbers are so small (underflow) or so large (overflow) that computer cannot process them. Factors inserted temporarily to prevent underflow and overflow. Implemented in CUMPOIS computer program described in "Cumulative Poisson Distribution Program" (NPO-17714).
Computational Algorithmization: Limitations in Problem Solving Skills in Computational Sciences Majors at University of Oriente

ERIC Educational Resources Information Center

Castillo, Antonio S.; Berenguer, Isabel A.; Sánchez, Alexander G.; Álvarez, Tomás R. R.

2017-01-01

This paper analyzes the results of a diagnostic study carried out with second year students of the computational sciences majors at University of Oriente, Cuba, to determine the limitations that they present in computational algorithmization. An exploratory research was developed using quantitative and qualitative methods. The results allowed…
Partitioning sparse matrices with eigenvectors of graphs

NASA Technical Reports Server (NTRS)

Pothen, Alex; Simon, Horst D.; Liou, Kang-Pu

1990-01-01

The problem of computing a small vertex separator in a graph arises in the context of computing a good ordering for the parallel factorization of sparse, symmetric matrices. An algebraic approach for computing vertex separators is considered in this paper. It is shown that lower bounds on separator sizes can be obtained in terms of the eigenvalues of the Laplacian matrix associated with a graph. The Laplacian eigenvectors of grid graphs can be computed from Kronecker products involving the eigenvectors of path graphs, and these eigenvectors can be used to compute good separators in grid graphs. A heuristic algorithm is designed to compute a vertex separator in a general graph by first computing an edge separator in the graph from an eigenvector of the Laplacian matrix, and then using a maximum matching in a subgraph to compute the vertex separator. Results on the quality of the separators computed by the spectral algorithm are presented, and these are compared with separators obtained from other algorithms for computing separators. Finally, the time required to compute the Laplacian eigenvector is reported, and the accuracy with which the eigenvector must be computed to obtain good separators is considered. The spectral algorithm has the advantage that it can be implemented on a medium-size multiprocessor in a straightforward manner.
QRS detection based ECG quality assessment.

PubMed

Hayn, Dieter; Jammerbund, Bernhard; Schreier, Günter

2012-09-01

Although immediate feedback concerning ECG signal quality during recording is useful, up to now not much literature describing quality measures is available. We have implemented and evaluated four ECG quality measures. Empty lead criterion (A), spike detection criterion (B) and lead crossing point criterion (C) were calculated from basic signal properties. Measure D quantified the robustness of QRS detection when applied to the signal. An advanced Matlab-based algorithm combining all four measures and a simplified algorithm for Android platforms, excluding measure D, were developed. Both algorithms were evaluated by taking part in the Computing in Cardiology Challenge 2011. Each measure's accuracy and computing time was evaluated separately. During the challenge, the advanced algorithm correctly classified 93.3% of the ECGs in the training-set and 91.6 % in the test-set. Scores for the simplified algorithm were 0.834 in event 2 and 0.873 in event 3. Computing time for measure D was almost five times higher than for other measures. Required accuracy levels depend on the application and are related to computing time. While our simplified algorithm may be accurate for real-time feedback during ECG self-recordings, QRS detection based measures can further increase the performance if sufficient computing power is available.
Biclustering Protein Complex Interactions with a Biclique FindingAlgorithm

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ding, Chris; Zhang, Anne Ya; Holbrook, Stephen

2006-12-01

Biclustering has many applications in text mining, web clickstream mining, and bioinformatics. When data entries are binary, the tightest biclusters become bicliques. We propose a flexible and highly efficient algorithm to compute bicliques. We first generalize the Motzkin-Straus formalism for computing the maximal clique from L{sub 1} constraint to L{sub p} constraint, which enables us to provide a generalized Motzkin-Straus formalism for computing maximal-edge bicliques. By adjusting parameters, the algorithm can favor biclusters with more rows less columns, or vice verse, thus increasing the flexibility of the targeted biclusters. We then propose an algorithm to solve the generalized Motzkin-Straus optimizationmore » problem. The algorithm is provably convergent and has a computational complexity of O(|E|) where |E| is the number of edges. It relies on a matrix vector multiplication and runs efficiently on most current computer architectures. Using this algorithm, we bicluster the yeast protein complex interaction network. We find that biclustering protein complexes at the protein level does not clearly reflect the functional linkage among protein complexes in many cases, while biclustering at protein domain level can reveal many underlying linkages. We show several new biologically significant results.« less
Characterization of robotics parallel algorithms and mapping onto a reconfigurable SIMD machine

NASA Technical Reports Server (NTRS)

Lee, C. S. G.; Lin, C. T.

1989-01-01

The kinematics, dynamics, Jacobian, and their corresponding inverse computations are six essential problems in the control of robot manipulators. Efficient parallel algorithms for these computations are discussed and analyzed. Their characteristics are identified and a scheme on the mapping of these algorithms to a reconfigurable parallel architecture is presented. Based on the characteristics including type of parallelism, degree of parallelism, uniformity of the operations, fundamental operations, data dependencies, and communication requirement, it is shown that most of the algorithms for robotic computations possess highly regular properties and some common structures, especially the linear recursive structure. Moreover, they are well-suited to be implemented on a single-instruction-stream multiple-data-stream (SIMD) computer with reconfigurable interconnection network. The model of a reconfigurable dual network SIMD machine with internal direct feedback is introduced. A systematic procedure internal direct feedback is introduced. A systematic procedure to map these computations to the proposed machine is presented. A new scheduling problem for SIMD machines is investigated and a heuristic algorithm, called neighborhood scheduling, that reorders the processing sequence of subtasks to reduce the communication time is described. Mapping results of a benchmark algorithm are illustrated and discussed.
A parallel variable metric optimization algorithm

NASA Technical Reports Server (NTRS)

Straeter, T. A.

1973-01-01

An algorithm, designed to exploit the parallel computing or vector streaming (pipeline) capabilities of computers is presented. When p is the degree of parallelism, then one cycle of the parallel variable metric algorithm is defined as follows: first, the function and its gradient are computed in parallel at p different values of the independent variable; then the metric is modified by p rank-one corrections; and finally, a single univariant minimization is carried out in the Newton-like direction. Several properties of this algorithm are established. The convergence of the iterates to the solution is proved for a quadratic functional on a real separable Hilbert space. For a finite-dimensional space the convergence is in one cycle when p equals the dimension of the space. Results of numerical experiments indicate that the new algorithm will exploit parallel or pipeline computing capabilities to effect faster convergence than serial techniques.
QRev—Software for computation and quality assurance of acoustic doppler current profiler moving-boat streamflow measurements—Technical manual for version 2.8

USGS Publications Warehouse

Mueller, David S.

2016-06-21

The software program, QRev applies common and consistent computational algorithms combined with automated filtering and quality assessment of the data to improve the quality and efficiency of streamflow measurements and helps ensure that U.S. Geological Survey streamflow measurements are consistent, accurate, and independent of the manufacturer of the instrument used to make the measurement. Software from different manufacturers uses different algorithms for various aspects of the data processing and discharge computation. The algorithms used by QRev to filter data, interpolate data, and compute discharge are documented and compared to the algorithms used in the manufacturers’ software. QRev applies consistent algorithms and creates a data structure that is independent of the data source. QRev saves an extensible markup language (XML) file that can be imported into databases or electronic field notes software. This report is the technical manual for version 2.8 of QRev.
Ndarts

NASA Technical Reports Server (NTRS)

Jain, Abhinandan

2011-01-01

Ndarts software provides algorithms for computing quantities associated with the dynamics of articulated, rigid-link, multibody systems. It is designed as a general-purpose dynamics library that can be used for the modeling of robotic platforms, space vehicles, molecular dynamics, and other such applications. The architecture and algorithms in Ndarts are based on the Spatial Operator Algebra (SOA) theory for computational multibody and robot dynamics developed at JPL. It uses minimal, internal coordinate models. The algorithms are low-order, recursive scatter/ gather algorithms. In comparison with the earlier Darts++ software, this version has a more general and cleaner design needed to support a larger class of computational dynamics needs. It includes a frames infrastructure, allows algorithms to operate on subgraphs of the system, and implements lazy and deferred computation for better efficiency. Dynamics modeling modules such as Ndarts are core building blocks of control and simulation software for space, robotic, mechanism, bio-molecular, and material systems modeling.
CAD system for footwear design based on whole real 3D data of last surface

NASA Astrophysics Data System (ADS)

Song, Wanzhong; Su, Xianyu

2000-10-01

Two major parts of application of CAD in footwear design are studied: the development of last surface; computer-aided design of planar shoe-template. A new quasi-experiential development algorithm of last surface based on triangulation approximation is presented. This development algorithm consumes less time and does not need any interactive operation for precisely development compared with other development algorithm of last surface. Based on this algorithm, a software, SHOEMAKERTM, which contains computer aided automatic measurement, automatic development of last surface and computer aide design of shoe-template has been developed.
Algorithm implementation on the Navier-Stokes computer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Krist, S.E.; Zang, T.A.

1987-03-01

The Navier-Stokes Computer is a multi-purpose parallel-processing supercomputer which is currently under development at Princeton University. It consists of multiple local memory parallel processors, called Nodes, which are interconnected in a hypercube network. Details of the procedures involved in implementing an algorithm on the Navier-Stokes computer are presented. The particular finite difference algorithm considered in this analysis was developed for simulation of laminar-turbulent transition in wall bounded shear flows. Projected timing results for implementing this algorithm indicate that operation rates in excess of 42 GFLOPS are feasible on a 128 Node machine.
Algorithm implementation on the Navier-Stokes computer

NASA Technical Reports Server (NTRS)

Krist, Steven E.; Zang, Thomas A.

1987-01-01

The Navier-Stokes Computer is a multi-purpose parallel-processing supercomputer which is currently under development at Princeton University. It consists of multiple local memory parallel processors, called Nodes, which are interconnected in a hypercube network. Details of the procedures involved in implementing an algorithm on the Navier-Stokes computer are presented. The particular finite difference algorithm considered in this analysis was developed for simulation of laminar-turbulent transition in wall bounded shear flows. Projected timing results for implementing this algorithm indicate that operation rates in excess of 42 GFLOPS are feasible on a 128 Node machine.
An efficient parallel algorithm for the solution of a tridiagonal linear system of equations

NASA Technical Reports Server (NTRS)

Stone, H. S.

1971-01-01

Tridiagonal linear systems of equations are solved on conventional serial machines in a time proportional to N, where N is the number of equations. The conventional algorithms do not lend themselves directly to parallel computations on computers of the ILLIAC IV class, in the sense that they appear to be inherently serial. An efficient parallel algorithm is presented in which computation time grows as log sub 2 N. The algorithm is based on recursive doubling solutions of linear recurrence relations, and can be used to solve recurrence relations of all orders.
Embedded assessment algorithms within home-based cognitive computer game exercises for elders.

PubMed

Jimison, Holly; Pavel, Misha

2006-01-01

With the recent consumer interest in computer-based activities designed to improve cognitive performance, there is a growing need for scientific assessment algorithms to validate the potential contributions of cognitive exercises. In this paper, we present a novel methodology for incorporating dynamic cognitive assessment algorithms within computer games designed to enhance cognitive performance. We describe how this approach works for variety of computer applications and describe cognitive monitoring results for one of the computer game exercises. The real-time cognitive assessments also provide a control signal for adapting the difficulty of the game exercises and providing tailored help for elders of varying abilities.

Star adaptation for two-algorithms used on serial computers

NASA Technical Reports Server (NTRS)

Howser, L. M.; Lambiotte, J. J., Jr.

1974-01-01

Two representative algorithms used on a serial computer and presently executed on the Control Data Corporation 6000 computer were adapted to execute efficiently on the Control Data STAR-100 computer. Gaussian elimination for the solution of simultaneous linear equations and the Gauss-Legendre quadrature formula for the approximation of an integral are the two algorithms discussed. A description is given of how the programs were adapted for STAR and why these adaptations were necessary to obtain an efficient STAR program. Some points to consider when adapting an algorithm for STAR are discussed. Program listings of the 6000 version coded in 6000 FORTRAN, the adapted STAR version coded in 6000 FORTRAN, and the STAR version coded in STAR FORTRAN are presented in the appendices.
Efficient image compression algorithm for computer-animated images

NASA Astrophysics Data System (ADS)

Yfantis, Evangelos A.; Au, Matthew Y.; Miel, G.

1992-10-01

An image compression algorithm is described. The algorithm is an extension of the run-length image compression algorithm and its implementation is relatively easy. This algorithm was implemented and compared with other existing popular compression algorithms and with the Lempel-Ziv (LZ) coding. The Lempel-Ziv algorithm is available as a utility in the UNIX operating system and is also referred to as the UNIX uncompress. Sometimes our algorithm is best in terms of saving memory space, and sometimes one of the competing algorithms is best. The algorithm is lossless, and the intent is for the algorithm to be used in computer graphics animated images. Comparisons made with the LZ algorithm indicate that the decompression time using our algorithm is faster than that using the LZ algorithm. Once the data are in memory, a relatively simple and fast transformation is applied to uncompress the file.
Parallel Algorithm Solves Coupled Differential Equations

NASA Technical Reports Server (NTRS)

Hayashi, A.

1987-01-01

Numerical methods adapted to concurrent processing. Algorithm solves set of coupled partial differential equations by numerical integration. Adapted to run on hypercube computer, algorithm separates problem into smaller problems solved concurrently. Increase in computing speed with concurrent processing over that achievable with conventional sequential processing appreciable, especially for large problems.
Universal algorithms and programs for calculating the motion parameters in the two-body problem

NASA Technical Reports Server (NTRS)

Bakhshiyan, B. T.; Sukhanov, A. A.

1979-01-01

The algorithms and FORTRAN programs for computing positions and velocities, orbital elements and first and second partial derivatives in the two-body problem are presented. The algorithms are applicable for any value of eccentricity and are convenient for computing various navigation parameters.
REACTT: an algorithm for solving spatial equilibrium problems.

Treesearch

D.J. Brooks; J. Kincaid

1987-01-01

The problem of determining equilibrium prices and quantities in spatially separated markets is reviewed. Algorithms that compute spatial equilibria are discussed. A computer program using the reactive programming algorithm for solving spatial equilibrium problems that involve multiple commodities is presented, along with detailed documentation. A sample data set,...
Computer algorithm for coding gain

NASA Technical Reports Server (NTRS)

Dodd, E. E.

1974-01-01

Development of a computer algorithm for coding gain for use in an automated communications link design system. Using an empirical formula which defines coding gain as used in space communications engineering, an algorithm is constructed on the basis of available performance data for nonsystematic convolutional encoding with soft-decision (eight-level) Viterbi decoding.
A fuzzy clustering algorithm to detect planar and quadric shapes

NASA Technical Reports Server (NTRS)

Krishnapuram, Raghu; Frigui, Hichem; Nasraoui, Olfa

1992-01-01

In this paper, we introduce a new fuzzy clustering algorithm to detect an unknown number of planar and quadric shapes in noisy data. The proposed algorithm is computationally and implementationally simple, and it overcomes many of the drawbacks of the existing algorithms that have been proposed for similar tasks. Since the clustering is performed in the original image space, and since no features need to be computed, this approach is particularly suited for sparse data. The algorithm may also be used in pattern recognition applications.
Desiderata for computable representations of electronic health records-driven phenotype algorithms

PubMed Central

Mo, Huan; Thompson, William K; Rasmussen, Luke V; Pacheco, Jennifer A; Jiang, Guoqian; Kiefer, Richard; Zhu, Qian; Xu, Jie; Montague, Enid; Carrell, David S; Lingren, Todd; Mentch, Frank D; Ni, Yizhao; Wehbe, Firas H; Peissig, Peggy L; Tromp, Gerard; Larson, Eric B; Chute, Christopher G; Pathak, Jyotishman; Speltz, Peter; Kho, Abel N; Jarvik, Gail P; Bejan, Cosmin A; Williams, Marc S; Borthwick, Kenneth; Kitchner, Terrie E; Roden, Dan M; Harris, Paul A

2015-01-01

Background Electronic health records (EHRs) are increasingly used for clinical and translational research through the creation of phenotype algorithms. Currently, phenotype algorithms are most commonly represented as noncomputable descriptive documents and knowledge artifacts that detail the protocols for querying diagnoses, symptoms, procedures, medications, and/or text-driven medical concepts, and are primarily meant for human comprehension. We present desiderata for developing a computable phenotype representation model (PheRM). Methods A team of clinicians and informaticians reviewed common features for multisite phenotype algorithms published in PheKB.org and existing phenotype representation platforms. We also evaluated well-known diagnostic criteria and clinical decision-making guidelines to encompass a broader category of algorithms. Results We propose 10 desired characteristics for a flexible, computable PheRM: (1) structure clinical data into queryable forms; (2) recommend use of a common data model, but also support customization for the variability and availability of EHR data among sites; (3) support both human-readable and computable representations of phenotype algorithms; (4) implement set operations and relational algebra for modeling phenotype algorithms; (5) represent phenotype criteria with structured rules; (6) support defining temporal relations between events; (7) use standardized terminologies and ontologies, and facilitate reuse of value sets; (8) define representations for text searching and natural language processing; (9) provide interfaces for external software algorithms; and (10) maintain backward compatibility. Conclusion A computable PheRM is needed for true phenotype portability and reliability across different EHR products and healthcare systems. These desiderata are a guide to inform the establishment and evolution of EHR phenotype algorithm authoring platforms and languages. PMID:26342218
An algorithm of discovering signatures from DNA databases on a computer cluster.

PubMed

Lee, Hsiao Ping; Sheu, Tzu-Fang

2014-10-05

Signatures are short sequences that are unique and not similar to any other sequence in a database that can be used as the basis to identify different species. Even though several signature discovery algorithms have been proposed in the past, these algorithms require the entirety of databases to be loaded in the memory, thus restricting the amount of data that they can process. It makes those algorithms unable to process databases with large amounts of data. Also, those algorithms use sequential models and have slower discovery speeds, meaning that the efficiency can be improved. In this research, we are debuting the utilization of a divide-and-conquer strategy in signature discovery and have proposed a parallel signature discovery algorithm on a computer cluster. The algorithm applies the divide-and-conquer strategy to solve the problem posed to the existing algorithms where they are unable to process large databases and uses a parallel computing mechanism to effectively improve the efficiency of signature discovery. Even when run with just the memory of regular personal computers, the algorithm can still process large databases such as the human whole-genome EST database which were previously unable to be processed by the existing algorithms. The algorithm proposed in this research is not limited by the amount of usable memory and can rapidly find signatures in large databases, making it useful in applications such as Next Generation Sequencing and other large database analysis and processing. The implementation of the proposed algorithm is available at http://www.cs.pu.edu.tw/~fang/DDCSDPrograms/DDCSD.htm.
Parallel Computational Protein Design.

PubMed

Zhou, Yichao; Donald, Bruce R; Zeng, Jianyang

2017-01-01

Computational structure-based protein design (CSPD) is an important problem in computational biology, which aims to design or improve a prescribed protein function based on a protein structure template. It provides a practical tool for real-world protein engineering applications. A popular CSPD method that guarantees to find the global minimum energy solution (GMEC) is to combine both dead-end elimination (DEE) and A* tree search algorithms. However, in this framework, the A* search algorithm can run in exponential time in the worst case, which may become the computation bottleneck of large-scale computational protein design process. To address this issue, we extend and add a new module to the OSPREY program that was previously developed in the Donald lab (Gainza et al., Methods Enzymol 523:87, 2013) to implement a GPU-based massively parallel A* algorithm for improving protein design pipeline. By exploiting the modern GPU computational framework and optimizing the computation of the heuristic function for A* search, our new program, called gOSPREY, can provide up to four orders of magnitude speedups in large protein design cases with a small memory overhead comparing to the traditional A* search algorithm implementation, while still guaranteeing the optimality. In addition, gOSPREY can be configured to run in a bounded-memory mode to tackle the problems in which the conformation space is too large and the global optimal solution cannot be computed previously. Furthermore, the GPU-based A* algorithm implemented in the gOSPREY program can be combined with the state-of-the-art rotamer pruning algorithms such as iMinDEE (Gainza et al., PLoS Comput Biol 8:e1002335, 2012) and DEEPer (Hallen et al., Proteins 81:18-39, 2013) to also consider continuous backbone and side-chain flexibility.
On Parallel Push-Relabel based Algorithms for Bipartite Maximum Matching

DOE Office of Scientific and Technical Information (OSTI.GOV)

Langguth, Johannes; Azad, Md Ariful; Halappanavar, Mahantesh

2014-07-01

We study multithreaded push-relabel based algorithms for computing maximum cardinality matching in bipartite graphs. Matching is a fundamental combinatorial (graph) problem with applications in a wide variety of problems in science and engineering. We are motivated by its use in the context of sparse linear solvers for computing maximum transversal of a matrix. We implement and test our algorithms on several multi-socket multicore systems and compare their performance to state-of-the-art augmenting path-based serial and parallel algorithms using a testset comprised of a wide range of real-world instances. Building on several heuristics for enhancing performance, we demonstrate good scaling for themore » parallel push-relabel algorithm. We show that it is comparable to the best augmenting path-based algorithms for bipartite matching. To the best of our knowledge, this is the first extensive study of multithreaded push-relabel based algorithms. In addition to a direct impact on the applications using matching, the proposed algorithmic techniques can be extended to preflow-push based algorithms for computing maximum flow in graphs.« less
Hybrid massively parallel fast sweeping method for static Hamilton–Jacobi equations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Detrixhe, Miles, E-mail: mdetrixhe@engineering.ucsb.edu; University of California Santa Barbara, Santa Barbara, CA, 93106; Gibou, Frédéric, E-mail: fgibou@engineering.ucsb.edu

The fast sweeping method is a popular algorithm for solving a variety of static Hamilton–Jacobi equations. Fast sweeping algorithms for parallel computing have been developed, but are severely limited. In this work, we present a multilevel, hybrid parallel algorithm that combines the desirable traits of two distinct parallel methods. The fine and coarse grained components of the algorithm take advantage of heterogeneous computer architecture common in high performance computing facilities. We present the algorithm and demonstrate its effectiveness on a set of example problems including optimal control, dynamic games, and seismic wave propagation. We give results for convergence, parallel scaling,more » and show state-of-the-art speedup values for the fast sweeping method.« less
Data association approaches in bearings-only multi-target tracking

NASA Astrophysics Data System (ADS)

Xu, Benlian; Wang, Zhiquan

2008-03-01

According to requirements of time computation complexity and correctness of data association of the multi-target tracking, two algorithms are suggested in this paper. The proposed Algorithm 1 is developed from the modified version of dual Simplex method, and it has the advantage of direct and explicit form of the optimal solution. The Algorithm 2 is based on the idea of Algorithm 1 and rotational sort method, it combines not only advantages of Algorithm 1, but also reduces the computational burden, whose complexity is only 1/ N times that of Algorithm 1. Finally, numerical analyses are carried out to evaluate the performance of the two data association algorithms.
A novel computer algorithm for modeling and treating mandibular fractures: A pilot study.

PubMed

Rizzi, Christopher J; Ortlip, Timothy; Greywoode, Jewel D; Vakharia, Kavita T; Vakharia, Kalpesh T

2017-02-01

To describe a novel computer algorithm that can model mandibular fracture repair. To evaluate the algorithm as a tool to model mandibular fracture reduction and hardware selection. Retrospective pilot study combined with cross-sectional survey. A computer algorithm utilizing Aquarius Net (TeraRecon, Inc, Foster City, CA) and Adobe Photoshop CS6 (Adobe Systems, Inc, San Jose, CA) was developed to model mandibular fracture repair. Ten different fracture patterns were selected from nine patients who had already undergone mandibular fracture repair. The preoperative computed tomography (CT) images were processed with the computer algorithm to create virtual images that matched the actual postoperative three-dimensional CT images. A survey comparing the true postoperative image with the virtual postoperative images was created and administered to otolaryngology resident and attending physicians. They were asked to rate on a scale from 0 to 10 (0 = completely different; 10 = identical) the similarity between the two images in terms of the fracture reduction and fixation hardware. Ten mandible fracture cases were analyzed and processed. There were 15 survey respondents. The mean score for overall similarity between the images was 8.41 ± 0.91; the mean score for similarity of fracture reduction was 8.61 ± 0.98; and the mean score for hardware appearance was 8.27 ± 0.97. There were no significant differences between attending and resident responses. There were no significant differences based on fracture location. This computer algorithm can accurately model mandibular fracture repair. Images created by the algorithm are highly similar to true postoperative images. The algorithm can potentially assist a surgeon planning mandibular fracture repair. 4. Laryngoscope, 2016 127:331-336, 2017. © 2016 The American Laryngological, Rhinological and Otological Society, Inc.
Hybrid Architectures for Evolutionary Computing Algorithms

DTIC Science & Technology

2008-01-01

other EC algorithms to FPGA Core Burns P1026/MAPLD 200532 Genetic Algorithm Hardware References S. Scott, A. Samal , and S. Seth, “HGA: A Hardware Based...on Parallel and Distributed Processing (IPPS/SPDP 󈨦), pp. 316-320, Proceedings. IEEE Computer Society 1998. [12] Scott, S. D. , Samal , A., and...Algorithm Hardware References S. Scott, A. Samal , and S. Seth, “HGA: A Hardware Based Genetic Algorithm”, Proceedings of the 1995 ACM Third
Calculus: A Computer Oriented Presentation, Part 1 [and] Part 2.

ERIC Educational Resources Information Center

Stenberg, Warren; Walker, Robert J.

Parts one and two of a one-year computer-oriented calculus course (without analytic geometry) are presented. The ideas of calculus are introduced and motivated through computer (i.e., algorithmic) concepts. An introduction to computing via algorithms and a simple flow chart language allows the book to be self-contained, except that material on…
A Decentralized Eigenvalue Computation Method for Spectrum Sensing Based on Average Consensus

NASA Astrophysics Data System (ADS)

Mohammadi, Jafar; Limmer, Steffen; Stańczak, Sławomir

2016-07-01

This paper considers eigenvalue estimation for the decentralized inference problem for spectrum sensing. We propose a decentralized eigenvalue computation algorithm based on the power method, which is referred to as generalized power method GPM; it is capable of estimating the eigenvalues of a given covariance matrix under certain conditions. Furthermore, we have developed a decentralized implementation of GPM by splitting the iterative operations into local and global computation tasks. The global tasks require data exchange to be performed among the nodes. For this task, we apply an average consensus algorithm to efficiently perform the global computations. As a special case, we consider a structured graph that is a tree with clusters of nodes at its leaves. For an accelerated distributed implementation, we propose to use computation over multiple access channel (CoMAC) as a building block of the algorithm. Numerical simulations are provided to illustrate the performance of the two algorithms.
Combinatorial Algorithms to Enable Computational Science and Engineering: Work from the CSCAPES Institute

DOE Office of Scientific and Technical Information (OSTI.GOV)

Boman, Erik G.; Catalyurek, Umit V.; Chevalier, Cedric

2015-01-16

This final progress report summarizes the work accomplished at the Combinatorial Scientific Computing and Petascale Simulations Institute. We developed Zoltan, a parallel mesh partitioning library that made use of accurate hypergraph models to provide load balancing in mesh-based computations. We developed several graph coloring algorithms for computing Jacobian and Hessian matrices and organized them into a software package called ColPack. We developed parallel algorithms for graph coloring and graph matching problems, and also designed multi-scale graph algorithms. Three PhD students graduated, six more are continuing their PhD studies, and four postdoctoral scholars were advised. Six of these students and Fellowsmore » have joined DOE Labs (Sandia, Berkeley), as staff scientists or as postdoctoral scientists. We also organized the SIAM Workshop on Combinatorial Scientific Computing (CSC) in 2007, 2009, and 2011 to continue to foster the CSC community.« less
The Construction of 3-d Neutral Density for Arbitrary Data Sets

NASA Astrophysics Data System (ADS)

Riha, S.; McDougall, T. J.; Barker, P. M.

2014-12-01

The Neutral Density variable allows inference of water pathways from thermodynamic properties in the global ocean, and is therefore an essential component of global ocean circulation analysis. The widely used algorithm for the computation of Neutral Density yields accurate results for data sets which are close to the observed climatological ocean. Long-term numerical climate simulations, however, often generate a significant drift from present-day climate, which renders the existing algorithm inaccurate. To remedy this problem, new algorithms which operate on arbitrary data have been developed, which may potentially be used to compute Neutral Density during runtime of a numerical model.We review existing approaches for the construction of Neutral Density in arbitrary data sets, detail their algorithmic structure, and present an analysis of the computational cost for implementations on a single-CPU computer. We discuss possible strategies for the implementation in state-of-the-art numerical models, with a focus on distributed computing environments.
Concurrent extensions to the FORTRAN language for parallel programming of computational fluid dynamics algorithms

NASA Technical Reports Server (NTRS)

Weeks, Cindy Lou

1986-01-01

Experiments were conducted at NASA Ames Research Center to define multi-tasking software requirements for multiple-instruction, multiple-data stream (MIMD) computer architectures. The focus was on specifying solutions for algorithms in the field of computational fluid dynamics (CFD). The program objectives were to allow researchers to produce usable parallel application software as soon as possible after acquiring MIMD computer equipment, to provide researchers with an easy-to-learn and easy-to-use parallel software language which could be implemented on several different MIMD machines, and to enable researchers to list preferred design specifications for future MIMD computer architectures. Analysis of CFD algorithms indicated that extensions of an existing programming language, adaptable to new computer architectures, provided the best solution to meeting program objectives. The CoFORTRAN Language was written in response to these objectives and to provide researchers a means to experiment with parallel software solutions to CFD algorithms on machines with parallel architectures.

Algorithms for the optimization of RBE-weighted dose in particle therapy.

PubMed

Horcicka, M; Meyer, C; Buschbacher, A; Durante, M; Krämer, M

2013-01-21

We report on various algorithms used for the nonlinear optimization of RBE-weighted dose in particle therapy. Concerning the dose calculation carbon ions are considered and biological effects are calculated by the Local Effect Model. Taking biological effects fully into account requires iterative methods to solve the optimization problem. We implemented several additional algorithms into GSI's treatment planning system TRiP98, like the BFGS-algorithm and the method of conjugated gradients, in order to investigate their computational performance. We modified textbook iteration procedures to improve the convergence speed. The performance of the algorithms is presented by convergence in terms of iterations and computation time. We found that the Fletcher-Reeves variant of the method of conjugated gradients is the algorithm with the best computational performance. With this algorithm we could speed up computation times by a factor of 4 compared to the method of steepest descent, which was used before. With our new methods it is possible to optimize complex treatment plans in a few minutes leading to good dose distributions. At the end we discuss future goals concerning dose optimization issues in particle therapy which might benefit from fast optimization solvers.
Algorithms for the optimization of RBE-weighted dose in particle therapy

NASA Astrophysics Data System (ADS)

Horcicka, M.; Meyer, C.; Buschbacher, A.; Durante, M.; Krämer, M.

2013-01-01

We report on various algorithms used for the nonlinear optimization of RBE-weighted dose in particle therapy. Concerning the dose calculation carbon ions are considered and biological effects are calculated by the Local Effect Model. Taking biological effects fully into account requires iterative methods to solve the optimization problem. We implemented several additional algorithms into GSI's treatment planning system TRiP98, like the BFGS-algorithm and the method of conjugated gradients, in order to investigate their computational performance. We modified textbook iteration procedures to improve the convergence speed. The performance of the algorithms is presented by convergence in terms of iterations and computation time. We found that the Fletcher-Reeves variant of the method of conjugated gradients is the algorithm with the best computational performance. With this algorithm we could speed up computation times by a factor of 4 compared to the method of steepest descent, which was used before. With our new methods it is possible to optimize complex treatment plans in a few minutes leading to good dose distributions. At the end we discuss future goals concerning dose optimization issues in particle therapy which might benefit from fast optimization solvers.
Semiannual Report, April 1, 1989 through September 30, 1989 (Institute for Computer Applications in Science and Engineering)

DTIC Science & Technology

1990-02-01

noise. Tobias B. Orloff Work began on developing a high quality rendering algorithm based on the radiosity method. The algorithm is similar to...previous progressive radiosity algorithms except for the following improvements: 1. At each iteration vertex radiosities are computed using a modified scan...line approach, thus eliminating the quadratic cost associated with a ray tracing computation of vortex radiosities . 2. At each iteration the scene is
A Secure Alignment Algorithm for Mapping Short Reads to Human Genome.

PubMed

Zhao, Yongan; Wang, Xiaofeng; Tang, Haixu

2018-05-09

The elastic and inexpensive computing resources such as clouds have been recognized as a useful solution to analyzing massive human genomic data (e.g., acquired by using next-generation sequencers) in biomedical researches. However, outsourcing human genome computation to public or commercial clouds was hindered due to privacy concerns: even a small number of human genome sequences contain sufficient information for identifying the donor of the genomic data. This issue cannot be directly addressed by existing security and cryptographic techniques (such as homomorphic encryption), because they are too heavyweight to carry out practical genome computation tasks on massive data. In this article, we present a secure algorithm to accomplish the read mapping, one of the most basic tasks in human genomic data analysis based on a hybrid cloud computing model. Comparing with the existing approaches, our algorithm delegates most computation to the public cloud, while only performing encryption and decryption on the private cloud, and thus makes the maximum use of the computing resource of the public cloud. Furthermore, our algorithm reports similar results as the nonsecure read mapping algorithms, including the alignment between reads and the reference genome, which can be directly used in the downstream analysis such as the inference of genomic variations. We implemented the algorithm in C++ and Python on a hybrid cloud system, in which the public cloud uses an Apache Spark system.
Efficient shortest-path-tree computation in network routing based on pulse-coupled neural networks.

PubMed

Qu, Hong; Yi, Zhang; Yang, Simon X

2013-06-01

Shortest path tree (SPT) computation is a critical issue for routers using link-state routing protocols, such as the most commonly used open shortest path first and intermediate system to intermediate system. Each router needs to recompute a new SPT rooted from itself whenever a change happens in the link state. Most commercial routers do this computation by deleting the current SPT and building a new one using static algorithms such as the Dijkstra algorithm at the beginning. Such recomputation of an entire SPT is inefficient, which may consume a considerable amount of CPU time and result in a time delay in the network. Some dynamic updating methods using the information in the updated SPT have been proposed in recent years. However, there are still many limitations in those dynamic algorithms. In this paper, a new modified model of pulse-coupled neural networks (M-PCNNs) is proposed for the SPT computation. It is rigorously proved that the proposed model is capable of solving some optimization problems, such as the SPT. A static algorithm is proposed based on the M-PCNNs to compute the SPT efficiently for large-scale problems. In addition, a dynamic algorithm that makes use of the structure of the previously computed SPT is proposed, which significantly improves the efficiency of the algorithm. Simulation results demonstrate the effective and efficient performance of the proposed approach.
A Flexible Computational Framework Using R and Map-Reduce for Permutation Tests of Massive Genetic Analysis of Complex Traits.

PubMed

Mahjani, Behrang; Toor, Salman; Nettelblad, Carl; Holmgren, Sverker

2017-01-01

In quantitative trait locus (QTL) mapping significance of putative QTL is often determined using permutation testing. The computational needs to calculate the significance level are immense, 10 4 up to 10 8 or even more permutations can be needed. We have previously introduced the PruneDIRECT algorithm for multiple QTL scan with epistatic interactions. This algorithm has specific strengths for permutation testing. Here, we present a flexible, parallel computing framework for identifying multiple interacting QTL using the PruneDIRECT algorithm which uses the map-reduce model as implemented in Hadoop. The framework is implemented in R, a widely used software tool among geneticists. This enables users to rearrange algorithmic steps to adapt genetic models, search algorithms, and parallelization steps to their needs in a flexible way. Our work underlines the maturity of accessing distributed parallel computing for computationally demanding bioinformatics applications through building workflows within existing scientific environments. We investigate the PruneDIRECT algorithm, comparing its performance to exhaustive search and DIRECT algorithm using our framework on a public cloud resource. We find that PruneDIRECT is vastly superior for permutation testing, and perform 2 ×10 5 permutations for a 2D QTL problem in 15 hours, using 100 cloud processes. We show that our framework scales out almost linearly for a 3D QTL search.
Convex optimization problem prototyping for image reconstruction in computed tomography with the Chambolle-Pock algorithm

PubMed Central

Sidky, Emil Y.; Jørgensen, Jakob H.; Pan, Xiaochuan

2012-01-01

The primal-dual optimization algorithm developed in Chambolle and Pock (CP), 2011 is applied to various convex optimization problems of interest in computed tomography (CT) image reconstruction. This algorithm allows for rapid prototyping of optimization problems for the purpose of designing iterative image reconstruction algorithms for CT. The primal-dual algorithm is briefly summarized in the article, and its potential for prototyping is demonstrated by explicitly deriving CP algorithm instances for many optimization problems relevant to CT. An example application modeling breast CT with low-intensity X-ray illumination is presented. PMID:22538474
Stochastic Evolutionary Algorithms for Planning Robot Paths

NASA Technical Reports Server (NTRS)

Fink, Wolfgang; Aghazarian, Hrand; Huntsberger, Terrance; Terrile, Richard

2006-01-01

A computer program implements stochastic evolutionary algorithms for planning and optimizing collision-free paths for robots and their jointed limbs. Stochastic evolutionary algorithms can be made to produce acceptably close approximations to exact, optimal solutions for path-planning problems while often demanding much less computation than do exhaustive-search and deterministic inverse-kinematics algorithms that have been used previously for this purpose. Hence, the present software is better suited for application aboard robots having limited computing capabilities (see figure). The stochastic aspect lies in the use of simulated annealing to (1) prevent trapping of an optimization algorithm in local minima of an energy-like error measure by which the fitness of a trial solution is evaluated while (2) ensuring that the entire multidimensional configuration and parameter space of the path-planning problem is sampled efficiently with respect to both robot joint angles and computation time. Simulated annealing is an established technique for avoiding local minima in multidimensional optimization problems, but has not, until now, been applied to planning collision-free robot paths by use of low-power computers.
Arbitrated Quantum Signature with Hamiltonian Algorithm Based on Blind Quantum Computation

NASA Astrophysics Data System (ADS)

Shi, Ronghua; Ding, Wanting; Shi, Jinjing

2018-03-01

A novel arbitrated quantum signature (AQS) scheme is proposed motivated by the Hamiltonian algorithm (HA) and blind quantum computation (BQC). The generation and verification of signature algorithm is designed based on HA, which enables the scheme to rely less on computational complexity. It is unnecessary to recover original messages when verifying signatures since the blind quantum computation is applied, which can improve the simplicity and operability of our scheme. It is proved that the scheme can be deployed securely, and the extended AQS has some extensive applications in E-payment system, E-government, E-business, etc.
Arbitrated Quantum Signature with Hamiltonian Algorithm Based on Blind Quantum Computation

NASA Astrophysics Data System (ADS)

Shi, Ronghua; Ding, Wanting; Shi, Jinjing

2018-07-01

A novel arbitrated quantum signature (AQS) scheme is proposed motivated by the Hamiltonian algorithm (HA) and blind quantum computation (BQC). The generation and verification of signature algorithm is designed based on HA, which enables the scheme to rely less on computational complexity. It is unnecessary to recover original messages when verifying signatures since the blind quantum computation is applied, which can improve the simplicity and operability of our scheme. It is proved that the scheme can be deployed securely, and the extended AQS has some extensive applications in E-payment system, E-government, E-business, etc.
Algorithms for computing the geopotential using a simple density layer

NASA Technical Reports Server (NTRS)

Morrison, F.

1976-01-01

Several algorithms have been developed for computing the potential and attraction of a simple density layer. These are numerical cubature, Taylor series, and a mixed analytic and numerical integration using a singularity-matching technique. A computer program has been written to combine these techniques for computing the disturbing acceleration on an artificial earth satellite. A total of 1640 equal-area, constant surface density blocks on an oblate spheroid are used. The singularity-matching algorithm is used in the subsatellite region, Taylor series in the surrounding zone, and numerical cubature on the rest of the earth.
Accelerating phylogenetics computing on the desktop: experiments with executing UPGMA in programmable logic.

PubMed

Davis, J P; Akella, S; Waddell, P H

2004-01-01

Having greater computational power on the desktop for processing taxa data sets has been a dream of biologists/statisticians involved in phylogenetics data analysis. Many existing algorithms have been highly optimized-one example being Felsenstein's PHYLIP code, written in C, for UPGMA and neighbor joining algorithms. However, the ability to process more than a few tens of taxa in a reasonable amount of time using conventional computers has not yielded a satisfactory speedup in data processing, making it difficult for phylogenetics practitioners to quickly explore data sets-such as might be done from a laptop computer. We discuss the application of custom computing techniques to phylogenetics. In particular, we apply this technology to speed up UPGMA algorithm execution by a factor of a hundred, against that of PHYLIP code running on the same PC. We report on these experiments and discuss how custom computing techniques can be used to not only accelerate phylogenetics algorithm performance on the desktop, but also on larger, high-performance computing engines, thus enabling the high-speed processing of data sets involving thousands of taxa.
Vectorization of transport and diffusion computations on the CDC Cyber 205

DOE Office of Scientific and Technical Information (OSTI.GOV)

Abu-Shumays, I.K.

1986-01-01

The development and testing of alternative numerical methods and computational algorithms specifically designed for the vectorization of transport and diffusion computations on a Control Data Corporation (CDC) Cyber 205 vector computer are described. Two solution methods for the discrete ordinates approximation to the transport equation are summarized and compared. Factors of 4 to 7 reduction in run times for certain large transport problems were achieved on a Cyber 205 as compared with run times on a CDC-7600. The solution of tridiagonal systems of linear equations, central to several efficient numerical methods for multidimensional diffusion computations and essential for fluid flowmore » and other physics and engineering problems, is also dealt with. Among the methods tested, a combined odd-even cyclic reduction and modified Cholesky factorization algorithm for solving linear symmetric positive definite tridiagonal systems is found to be the most effective for these systems on a Cyber 205. For large tridiagonal systems, computation with this algorithm is an order of magnitude faster on a Cyber 205 than computation with the best algorithm for tridiagonal systems on a CDC-7600.« less
BCM: toolkit for Bayesian analysis of Computational Models using samplers.

PubMed

Thijssen, Bram; Dijkstra, Tjeerd M H; Heskes, Tom; Wessels, Lodewyk F A

2016-10-21

Computational models in biology are characterized by a large degree of uncertainty. This uncertainty can be analyzed with Bayesian statistics, however, the sampling algorithms that are frequently used for calculating Bayesian statistical estimates are computationally demanding, and each algorithm has unique advantages and disadvantages. It is typically unclear, before starting an analysis, which algorithm will perform well on a given computational model. We present BCM, a toolkit for the Bayesian analysis of Computational Models using samplers. It provides efficient, multithreaded implementations of eleven algorithms for sampling from posterior probability distributions and for calculating marginal likelihoods. BCM includes tools to simplify the process of model specification and scripts for visualizing the results. The flexible architecture allows it to be used on diverse types of biological computational models. In an example inference task using a model of the cell cycle based on ordinary differential equations, BCM is significantly more efficient than existing software packages, allowing more challenging inference problems to be solved. BCM represents an efficient one-stop-shop for computational modelers wishing to use sampler-based Bayesian statistics.
GPU implementation of prior image constrained compressed sensing (PICCS)

NASA Astrophysics Data System (ADS)

Nett, Brian E.; Tang, Jie; Chen, Guang-Hong

2010-04-01

The Prior Image Constrained Compressed Sensing (PICCS) algorithm (Med. Phys. 35, pg. 660, 2008) has been applied to several computed tomography applications with both standard CT systems and flat-panel based systems designed for guiding interventional procedures and radiation therapy treatment delivery. The PICCS algorithm typically utilizes a prior image which is reconstructed via the standard Filtered Backprojection (FBP) reconstruction algorithm. The algorithm then iteratively solves for the image volume that matches the measured data, while simultaneously assuring the image is similar to the prior image. The PICCS algorithm has demonstrated utility in several applications including: improved temporal resolution reconstruction, 4D respiratory phase specific reconstructions for radiation therapy, and cardiac reconstruction from data acquired on an interventional C-arm. One disadvantage of the PICCS algorithm, just as other iterative algorithms, is the long computation times typically associated with reconstruction. In order for an algorithm to gain clinical acceptance reconstruction must be achievable in minutes rather than hours. In this work the PICCS algorithm has been implemented on the GPU in order to significantly reduce the reconstruction time of the PICCS algorithm. The Compute Unified Device Architecture (CUDA) was used in this implementation.
Automated Development of Accurate Algorithms and Efficient Codes for Computational Aeroacoustics

NASA Technical Reports Server (NTRS)

Goodrich, John W.; Dyson, Rodger W.

1999-01-01

The simulation of sound generation and propagation in three space dimensions with realistic aircraft components is a very large time dependent computation with fine details. Simulations in open domains with embedded objects require accurate and robust algorithms for propagation, for artificial inflow and outflow boundaries, and for the definition of geometrically complex objects. The development, implementation, and validation of methods for solving these demanding problems is being done to support the NASA pillar goals for reducing aircraft noise levels. Our goal is to provide algorithms which are sufficiently accurate and efficient to produce usable results rapidly enough to allow design engineers to study the effects on sound levels of design changes in propulsion systems, and in the integration of propulsion systems with airframes. There is a lack of design tools for these purposes at this time. Our technical approach to this problem combines the development of new, algorithms with the use of Mathematica and Unix utilities to automate the algorithm development, code implementation, and validation. We use explicit methods to ensure effective implementation by domain decomposition for SPMD parallel computing. There are several orders of magnitude difference in the computational efficiencies of the algorithms which we have considered. We currently have new artificial inflow and outflow boundary conditions that are stable, accurate, and unobtrusive, with implementations that match the accuracy and efficiency of the propagation methods. The artificial numerical boundary treatments have been proven to have solutions which converge to the full open domain problems, so that the error from the boundary treatments can be driven as low as is required. The purpose of this paper is to briefly present a method for developing highly accurate algorithms for computational aeroacoustics, the use of computer automation in this process, and a brief survey of the algorithms that have resulted from this work. A review of computational aeroacoustics has recently been given by Lele.
Development of Online Cognitive and Algorithm Tests as Assessment Tools in Introductory Computer Science Courses

ERIC Educational Resources Information Center

Avancena, Aimee Theresa; Nishihara, Akinori; Vergara, John Paul

2012-01-01

This paper presents the online cognitive and algorithm tests, which were developed in order to determine if certain cognitive factors and fundamental algorithms correlate with the performance of students in their introductory computer science course. The tests were implemented among Management Information Systems majors from the Philippines and…
A simple algorithm for computing positively weighted straight skeletons of monotone polygons☆

PubMed Central

Biedl, Therese; Held, Martin; Huber, Stefan; Kaaser, Dominik; Palfrader, Peter

2015-01-01

We study the characteristics of straight skeletons of monotone polygonal chains and use them to devise an algorithm for computing positively weighted straight skeletons of monotone polygons. Our algorithm runs in O(nlog⁡n) time and O(n) space, where n denotes the number of vertices of the polygon. PMID:25648376
A simple algorithm for computing positively weighted straight skeletons of monotone polygons.

PubMed

Biedl, Therese; Held, Martin; Huber, Stefan; Kaaser, Dominik; Palfrader, Peter

2015-02-01

We study the characteristics of straight skeletons of monotone polygonal chains and use them to devise an algorithm for computing positively weighted straight skeletons of monotone polygons. Our algorithm runs in [Formula: see text] time and [Formula: see text] space, where n denotes the number of vertices of the polygon.
Multirate-based fast parallel algorithms for 2-D DHT-based real-valued discrete Gabor transform.

PubMed

Tao, Liang; Kwan, Hon Keung

2012-07-01

Novel algorithms for the multirate and fast parallel implementation of the 2-D discrete Hartley transform (DHT)-based real-valued discrete Gabor transform (RDGT) and its inverse transform are presented in this paper. A 2-D multirate-based analysis convolver bank is designed for the 2-D RDGT, and a 2-D multirate-based synthesis convolver bank is designed for the 2-D inverse RDGT. The parallel channels in each of the two convolver banks have a unified structure and can apply the 2-D fast DHT algorithm to speed up their computations. The computational complexity of each parallel channel is low and is independent of the Gabor oversampling rate. All the 2-D RDGT coefficients of an image are computed in parallel during the analysis process and can be reconstructed in parallel during the synthesis process. The computational complexity and time of the proposed parallel algorithms are analyzed and compared with those of the existing fastest algorithms for 2-D discrete Gabor transforms. The results indicate that the proposed algorithms are the fastest, which make them attractive for real-time image processing.

Volumetric visualization algorithm development for an FPGA-based custom computing machine

NASA Astrophysics Data System (ADS)

Sallinen, Sami J.; Alakuijala, Jyrki; Helminen, Hannu; Laitinen, Joakim

1998-05-01

Rendering volumetric medical images is a burdensome computational task for contemporary computers due to the large size of the data sets. Custom designed reconfigurable hardware could considerably speed up volume visualization if an algorithm suitable for the platform is used. We present an algorithm and speedup techniques for visualizing volumetric medical CT and MR images with a custom-computing machine based on a Field Programmable Gate Array (FPGA). We also present simulated performance results of the proposed algorithm calculated with a software implementation running on a desktop PC. Our algorithm is capable of generating perspective projection renderings of single and multiple isosurfaces with transparency, simulated X-ray images, and Maximum Intensity Projections (MIP). Although more speedup techniques exist for parallel projection than for perspective projection, we have constrained ourselves to perspective viewing, because of its importance in the field of radiotherapy. The algorithm we have developed is based on ray casting, and the rendering is sped up by three different methods: shading speedup by gradient precalculation, a new generalized version of Ray-Acceleration by Distance Coding (RADC), and background ray elimination by speculative ray selection.
Job-shop scheduling applied to computer vision

NASA Astrophysics Data System (ADS)

Sebastian y Zuniga, Jose M.; Torres-Medina, Fernando; Aracil, Rafael; Reinoso, Oscar; Jimenez, Luis M.; Garcia, David

1997-09-01

This paper presents a method for minimizing the total elapsed time spent by n tasks running on m differents processors working in parallel. The developed algorithm not only minimizes the total elapsed time but also reduces the idle time and waiting time of in-process tasks. This condition is very important in some applications of computer vision in which the time to finish the total process is particularly critical -- quality control in industrial inspection, real- time computer vision, guided robots. The scheduling algorithm is based on the use of two matrices, obtained from the precedence relationships between tasks, and the data obtained from the two matrices. The developed scheduling algorithm has been tested in one application of quality control using computer vision. The results obtained have been satisfactory in the application of different image processing algorithms.
Parallel grid generation algorithm for distributed memory computers

NASA Technical Reports Server (NTRS)

Moitra, Stuti; Moitra, Anutosh

1994-01-01

A parallel grid-generation algorithm and its implementation on the Intel iPSC/860 computer are described. The grid-generation scheme is based on an algebraic formulation of homotopic relations. Methods for utilizing the inherent parallelism of the grid-generation scheme are described, and implementation of multiple levELs of parallelism on multiple instruction multiple data machines are indicated. The algorithm is capable of providing near orthogonality and spacing control at solid boundaries while requiring minimal interprocessor communications. Results obtained on the Intel hypercube for a blended wing-body configuration are used to demonstrate the effectiveness of the algorithm. Fortran implementations bAsed on the native programming model of the iPSC/860 computer and the Express system of software tools are reported. Computational gains in execution time speed-up ratios are given.
Total variation-based neutron computed tomography

NASA Astrophysics Data System (ADS)

Barnard, Richard C.; Bilheux, Hassina; Toops, Todd; Nafziger, Eric; Finney, Charles; Splitter, Derek; Archibald, Rick

2018-05-01

We perform the neutron computed tomography reconstruction problem via an inverse problem formulation with a total variation penalty. In the case of highly under-resolved angular measurements, the total variation penalty suppresses high-frequency artifacts which appear in filtered back projections. In order to efficiently compute solutions for this problem, we implement a variation of the split Bregman algorithm; due to the error-forgetting nature of the algorithm, the computational cost of updating can be significantly reduced via very inexact approximate linear solvers. We present the effectiveness of the algorithm in the significantly low-angular sampling case using synthetic test problems as well as data obtained from a high flux neutron source. The algorithm removes artifacts and can even roughly capture small features when an extremely low number of angles are used.
NASA Workshop on Computational Structural Mechanics 1987, part 3

NASA Technical Reports Server (NTRS)

Sykes, Nancy P. (Editor)

1989-01-01

Computational Structural Mechanics (CSM) topics are explored. Algorithms and software for nonlinear structural dynamics, concurrent algorithms for transient finite element analysis, computational methods and software systems for dynamics and control of large space structures, and the use of multi-grid for structural analysis are discussed.
Computing Gröbner Bases within Linear Algebra

NASA Astrophysics Data System (ADS)

Suzuki, Akira

In this paper, we present an alternative algorithm to compute Gröbner bases, which is based on computations on sparse linear algebra. Both of S-polynomial computations and monomial reductions are computed in linear algebra simultaneously in this algorithm. So it can be implemented to any computational system which can handle linear algebra. For a given ideal in a polynomial ring, it calculates a Gröbner basis along with the corresponding term order appropriately.
Computational mechanics analysis tools for parallel-vector supercomputers

NASA Technical Reports Server (NTRS)

Storaasli, O. O.; Nguyen, D. T.; Baddourah, M. A.; Qin, J.

1993-01-01

Computational algorithms for structural analysis on parallel-vector supercomputers are reviewed. These parallel algorithms, developed by the authors, are for the assembly of structural equations, 'out-of-core' strategies for linear equation solution, massively distributed-memory equation solution, unsymmetric equation solution, general eigen-solution, geometrically nonlinear finite element analysis, design sensitivity analysis for structural dynamics, optimization algorithm and domain decomposition. The source code for many of these algorithms is available from NASA Langley.
A Non-Intrusive Algorithm for Sensitivity Analysis of Chaotic Flow Simulations

NASA Technical Reports Server (NTRS)

Blonigan, Patrick J.; Wang, Qiqi; Nielsen, Eric J.; Diskin, Boris

2017-01-01

We demonstrate a novel algorithm for computing the sensitivity of statistics in chaotic flow simulations to parameter perturbations. The algorithm is non-intrusive but requires exposing an interface. Based on the principle of shadowing in dynamical systems, this algorithm is designed to reduce the effect of the sampling error in computing sensitivity of statistics in chaotic simulations. We compare the effectiveness of this method to that of the conventional finite difference method.
Parallel Implementation of the Wideband DOA Algorithm on the IBM Cell BE Processor

DTIC Science & Technology

2010-05-01

Abstract—The Multiple Signal Classification ( MUSIC ) algorithm is a powerful technique for determining the Direction of Arrival (DOA) of signals...Broadband Engine Processor (Cell BE). The process of adapting the serial based MUSIC algorithm to the Cell BE will be analyzed in terms of parallelism and...using Multiple Signal Classification MUSIC algorithm [4] • Computation of Focus matrix • Computation of number of sources • Separation of Signal
Parabolized Navier-Stokes Code for Computing Magneto-Hydrodynamic Flowfields

NASA Technical Reports Server (NTRS)

Mehta, Unmeel B. (Technical Monitor); Tannehill, J. C.

2003-01-01

This report consists of two published papers, 'Computation of Magnetohydrodynamic Flows Using an Iterative PNS Algorithm' and 'Numerical Simulation of Turbulent MHD Flows Using an Iterative PNS Algorithm'.
Design of automata theory of cubical complexes with applications to diagnosis and algorithmic description

NASA Technical Reports Server (NTRS)

Roth, J. P.

1972-01-01

The following problems are considered: (1) methods for development of logic design together with algorithms, so that it is possible to compute a test for any failure in the logic design, if such a test exists, and developing algorithms and heuristics for the purpose of minimizing the computation for tests; and (2) a method of design of logic for ultra LSI (large scale integration). It was discovered that the so-called quantum calculus can be extended to render it possible: (1) to describe the functional behavior of a mechanism component by component, and (2) to compute tests for failures, in the mechanism, using the diagnosis algorithm. The development of an algorithm for the multioutput two-level minimization problem is presented and the program MIN 360 was written for this algorithm. The program has options of mode (exact minimum or various approximations), cost function, cost bound, etc., providing flexibility.
Modeling node bandwidth limits and their effects on vector combining algorithms

DOE Office of Scientific and Technical Information (OSTI.GOV)

Littlefield, R.J.

Each node in a message-passing multicomputer typically has several communication links. However, the maximum aggregate communication speed of a node is often less than the sum of its individual link speeds. Such computers are called node bandwidth limited (NBL). The NBL constraint is important when choosing algorithms because it can change the relative performance of different algorithms that accomplish the same task. This paper introduces a model of communication performance for NBL computers and uses the model to analyze the overall performance of three algorithms for vector combining (global sum) on the Intel Touchstone DELTA computer. Each of the threemore » algorithms is found to be at least 33% faster than the other two for some combinations of machine size and vector length. The NBL constraint is shown to significantly affect the conditions under which each algorithm is fastest.« less
Color constancy by characterization of illumination chromaticity

NASA Astrophysics Data System (ADS)

Nikkanen, Jarno T.

2011-05-01

Computational color constancy algorithms play a key role in achieving desired color reproduction in digital cameras. Failure to estimate illumination chromaticity correctly will result in invalid overall colour cast in the image that will be easily detected by human observers. A new algorithm is presented for computational color constancy. Low computational complexity and low memory requirement make the algorithm suitable for resource-limited camera devices, such as consumer digital cameras and camera phones. Operation of the algorithm relies on characterization of the range of possible illumination chromaticities in terms of camera sensor response. The fact that only illumination chromaticity is characterized instead of the full color gamut, for example, increases robustness against variations in sensor characteristics and against failure of diagonal model of illumination change. Multiple databases are used in order to demonstrate the good performance of the algorithm in comparison to the state-of-the-art color constancy algorithms.
A parallel algorithm for the two-dimensional time fractional diffusion equation with implicit difference method.

PubMed

Gong, Chunye; Bao, Weimin; Tang, Guojian; Jiang, Yuewen; Liu, Jie

2014-01-01

It is very time consuming to solve fractional differential equations. The computational complexity of two-dimensional fractional differential equation (2D-TFDE) with iterative implicit finite difference method is O(M(x)M(y)N(2)). In this paper, we present a parallel algorithm for 2D-TFDE and give an in-depth discussion about this algorithm. A task distribution model and data layout with virtual boundary are designed for this parallel algorithm. The experimental results show that the parallel algorithm compares well with the exact solution. The parallel algorithm on single Intel Xeon X5540 CPU runs 3.16-4.17 times faster than the serial algorithm on single CPU core. The parallel efficiency of 81 processes is up to 88.24% compared with 9 processes on a distributed memory cluster system. We do think that the parallel computing technology will become a very basic method for the computational intensive fractional applications in the near future.
Visual saliency-based fast intracoding algorithm for high efficiency video coding

NASA Astrophysics Data System (ADS)

Zhou, Xin; Shi, Guangming; Zhou, Wei; Duan, Zhemin

2017-01-01

Intraprediction has been significantly improved in high efficiency video coding over H.264/AVC with quad-tree-based coding unit (CU) structure from size 64×64 to 8×8 and more prediction modes. However, these techniques cause a dramatic increase in computational complexity. An intracoding algorithm is proposed that consists of perceptual fast CU size decision algorithm and fast intraprediction mode decision algorithm. First, based on the visual saliency detection, an adaptive and fast CU size decision method is proposed to alleviate intraencoding complexity. Furthermore, a fast intraprediction mode decision algorithm with step halving rough mode decision method and early modes pruning algorithm is presented to selectively check the potential modes and effectively reduce the complexity of computation. Experimental results show that our proposed fast method reduces the computational complexity of the current HM to about 57% in encoding time with only 0.37% increases in BD rate. Meanwhile, the proposed fast algorithm has reasonable peak signal-to-noise ratio losses and nearly the same subjective perceptual quality.
A parallel simulated annealing algorithm for standard cell placement on a hypercube computer

NASA Technical Reports Server (NTRS)

Jones, Mark Howard

1987-01-01

A parallel version of a simulated annealing algorithm is presented which is targeted to run on a hypercube computer. A strategy for mapping the cells in a two dimensional area of a chip onto processors in an n-dimensional hypercube is proposed such that both small and large distance moves can be applied. Two types of moves are allowed: cell exchanges and cell displacements. The computation of the cost function in parallel among all the processors in the hypercube is described along with a distributed data structure that needs to be stored in the hypercube to support parallel cost evaluation. A novel tree broadcasting strategy is used extensively in the algorithm for updating cell locations in the parallel environment. Studies on the performance of the algorithm on example industrial circuits show that it is faster and gives better final placement results than the uniprocessor simulated annealing algorithms. An improved uniprocessor algorithm is proposed which is based on the improved results obtained from parallelization of the simulated annealing algorithm.
Performance of fusion algorithms for computer-aided detection and classification of mines in very shallow water obtained from testing in navy Fleet Battle Exercise-Hotel 2000

NASA Astrophysics Data System (ADS)

Ciany, Charles M.; Zurawski, William; Kerfoot, Ian

2001-10-01

The performance of Computer Aided Detection/Computer Aided Classification (CAD/CAC) Fusion algorithms on side-scan sonar images was evaluated using data taken at the Navy's's Fleet Battle Exercise-Hotel held in Panama City, Florida, in August 2000. A 2-of-3 binary fusion algorithm is shown to provide robust performance. The algorithm accepts the classification decisions and associated contact locations form three different CAD/CAC algorithms, clusters the contacts based on Euclidian distance, and then declares a valid target when a clustered contact is declared by at least 2 of the 3 individual algorithms. This simple binary fusion provided a 96 percent probability of correct classification at a false alarm rate of 0.14 false alarms per image per side. The performance represented a 3.8:1 reduction in false alarms over the best performing single CAD/CAC algorithm, with no loss in probability of correct classification.
Initialization and Restart in Stochastic Local Search: Computing a Most Probable Explanation in Bayesian Networks

NASA Technical Reports Server (NTRS)

Mengshoel, Ole J.; Wilkins, David C.; Roth, Dan

2010-01-01

For hard computational problems, stochastic local search has proven to be a competitive approach to finding optimal or approximately optimal problem solutions. Two key research questions for stochastic local search algorithms are: Which algorithms are effective for initialization? When should the search process be restarted? In the present work we investigate these research questions in the context of approximate computation of most probable explanations (MPEs) in Bayesian networks (BNs). We introduce a novel approach, based on the Viterbi algorithm, to explanation initialization in BNs. While the Viterbi algorithm works on sequences and trees, our approach works on BNs with arbitrary topologies. We also give a novel formalization of stochastic local search, with focus on initialization and restart, using probability theory and mixture models. Experimentally, we apply our methods to the problem of MPE computation, using a stochastic local search algorithm known as Stochastic Greedy Search. By carefully optimizing both initialization and restart, we reduce the MPE search time for application BNs by several orders of magnitude compared to using uniform at random initialization without restart. On several BNs from applications, the performance of Stochastic Greedy Search is competitive with clique tree clustering, a state-of-the-art exact algorithm used for MPE computation in BNs.
Dynamic displacement measurement of large-scale structures based on the Lucas-Kanade template tracking algorithm

NASA Astrophysics Data System (ADS)

Guo, Jie; Zhu, Chang`an

2016-01-01

The development of optics and computer technologies enables the application of the vision-based technique that uses digital cameras to the displacement measurement of large-scale structures. Compared with traditional contact measurements, vision-based technique allows for remote measurement, has a non-intrusive characteristic, and does not necessitate mass introduction. In this study, a high-speed camera system is developed to complete the displacement measurement in real time. The system consists of a high-speed camera and a notebook computer. The high-speed camera can capture images at a speed of hundreds of frames per second. To process the captured images in computer, the Lucas-Kanade template tracking algorithm in the field of computer vision is introduced. Additionally, a modified inverse compositional algorithm is proposed to reduce the computing time of the original algorithm and improve the efficiency further. The modified algorithm can rapidly accomplish one displacement extraction within 1 ms without having to install any pre-designed target panel onto the structures in advance. The accuracy and the efficiency of the system in the remote measurement of dynamic displacement are demonstrated in the experiments on motion platform and sound barrier on suspension viaduct. Experimental results show that the proposed algorithm can extract accurate displacement signal and accomplish the vibration measurement of large-scale structures.
SOI layout decomposition for double patterning lithography on high-performance computer platforms

NASA Astrophysics Data System (ADS)

Verstov, Vladimir; Zinchenko, Lyudmila; Makarchuk, Vladimir

2014-12-01

In the paper silicon on insulator layout decomposition algorithms for the double patterning lithography on high performance computing platforms are discussed. Our approach is based on the use of a contradiction graph and a modified concurrent breadth-first search algorithm. We evaluate our technique on 45 nm Nangate Open Cell Library including non-Manhattan geometry. Experimental results show that our soft computing algorithms decompose layout successfully and a minimal distance between polygons in layout is increased.

A novel iris localization algorithm using correlation filtering

NASA Astrophysics Data System (ADS)

Pohit, Mausumi; Sharma, Jitu

2015-06-01

Fast and efficient segmentation of iris from the eye images is a primary requirement for robust database independent iris recognition. In this paper we have presented a new algorithm for computing the inner and outer boundaries of the iris and locating the pupil centre. Pupil-iris boundary computation is based on correlation filtering approach, whereas iris-sclera boundary is determined through one dimensional intensity mapping. The proposed approach is computationally less extensive when compared with the existing algorithms like Hough transform.
Desiderata for computable representations of electronic health records-driven phenotype algorithms.

PubMed

Mo, Huan; Thompson, William K; Rasmussen, Luke V; Pacheco, Jennifer A; Jiang, Guoqian; Kiefer, Richard; Zhu, Qian; Xu, Jie; Montague, Enid; Carrell, David S; Lingren, Todd; Mentch, Frank D; Ni, Yizhao; Wehbe, Firas H; Peissig, Peggy L; Tromp, Gerard; Larson, Eric B; Chute, Christopher G; Pathak, Jyotishman; Denny, Joshua C; Speltz, Peter; Kho, Abel N; Jarvik, Gail P; Bejan, Cosmin A; Williams, Marc S; Borthwick, Kenneth; Kitchner, Terrie E; Roden, Dan M; Harris, Paul A

2015-11-01

Electronic health records (EHRs) are increasingly used for clinical and translational research through the creation of phenotype algorithms. Currently, phenotype algorithms are most commonly represented as noncomputable descriptive documents and knowledge artifacts that detail the protocols for querying diagnoses, symptoms, procedures, medications, and/or text-driven medical concepts, and are primarily meant for human comprehension. We present desiderata for developing a computable phenotype representation model (PheRM). A team of clinicians and informaticians reviewed common features for multisite phenotype algorithms published in PheKB.org and existing phenotype representation platforms. We also evaluated well-known diagnostic criteria and clinical decision-making guidelines to encompass a broader category of algorithms. We propose 10 desired characteristics for a flexible, computable PheRM: (1) structure clinical data into queryable forms; (2) recommend use of a common data model, but also support customization for the variability and availability of EHR data among sites; (3) support both human-readable and computable representations of phenotype algorithms; (4) implement set operations and relational algebra for modeling phenotype algorithms; (5) represent phenotype criteria with structured rules; (6) support defining temporal relations between events; (7) use standardized terminologies and ontologies, and facilitate reuse of value sets; (8) define representations for text searching and natural language processing; (9) provide interfaces for external software algorithms; and (10) maintain backward compatibility. A computable PheRM is needed for true phenotype portability and reliability across different EHR products and healthcare systems. These desiderata are a guide to inform the establishment and evolution of EHR phenotype algorithm authoring platforms and languages. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association.
High performance transcription factor-DNA docking with GPU computing

PubMed Central

2012-01-01

Background Protein-DNA docking is a very challenging problem in structural bioinformatics and has important implications in a number of applications, such as structure-based prediction of transcription factor binding sites and rational drug design. Protein-DNA docking is very computational demanding due to the high cost of energy calculation and the statistical nature of conformational sampling algorithms. More importantly, experiments show that the docking quality depends on the coverage of the conformational sampling space. It is therefore desirable to accelerate the computation of the docking algorithm, not only to reduce computing time, but also to improve docking quality. Methods In an attempt to accelerate the sampling process and to improve the docking performance, we developed a graphics processing unit (GPU)-based protein-DNA docking algorithm. The algorithm employs a potential-based energy function to describe the binding affinity of a protein-DNA pair, and integrates Monte-Carlo simulation and a simulated annealing method to search through the conformational space. Algorithmic techniques were developed to improve the computation efficiency and scalability on GPU-based high performance computing systems. Results The effectiveness of our approach is tested on a non-redundant set of 75 TF-DNA complexes and a newly developed TF-DNA docking benchmark. We demonstrated that the GPU-based docking algorithm can significantly accelerate the simulation process and thereby improving the chance of finding near-native TF-DNA complex structures. This study also suggests that further improvement in protein-DNA docking research would require efforts from two integral aspects: improvement in computation efficiency and energy function design. Conclusions We present a high performance computing approach for improving the prediction accuracy of protein-DNA docking. The GPU-based docking algorithm accelerates the search of the conformational space and thus increases the chance of finding more near-native structures. To the best of our knowledge, this is the first ad hoc effort of applying GPU or GPU clusters to the protein-DNA docking problem. PMID:22759575
Quantum Statistical Mechanics on a Quantum Computer

NASA Astrophysics Data System (ADS)

Raedt, H. D.; Hams, A. H.; Michielsen, K.; Miyashita, S.; Saito, K.

We describe a quantum algorithm to compute the density of states and thermal equilibrium properties of quantum many-body systems. We present results obtained by running this algorithm on a software implementation of a 21-qubit quantum computer for the case of an antiferromagnetic Heisenberg model on triangular lattices of different size.
Simulated parallel annealing within a neighborhood for optimization of biomechanical systems.

PubMed

Higginson, J S; Neptune, R R; Anderson, F C

2005-09-01

Optimization problems for biomechanical systems have become extremely complex. Simulated annealing (SA) algorithms have performed well in a variety of test problems and biomechanical applications; however, despite advances in computer speed, convergence to optimal solutions for systems of even moderate complexity has remained prohibitive. The objective of this study was to develop a portable parallel version of a SA algorithm for solving optimization problems in biomechanics. The algorithm for simulated parallel annealing within a neighborhood (SPAN) was designed to minimize interprocessor communication time and closely retain the heuristics of the serial SA algorithm. The computational speed of the SPAN algorithm scaled linearly with the number of processors on different computer platforms for a simple quadratic test problem and for a more complex forward dynamic simulation of human pedaling.
A Spectral Algorithm for Envelope Reduction of Sparse Matrices

NASA Technical Reports Server (NTRS)

Barnard, Stephen T.; Pothen, Alex; Simon, Horst D.

1993-01-01

The problem of reordering a sparse symmetric matrix to reduce its envelope size is considered. A new spectral algorithm for computing an envelope-reducing reordering is obtained by associating a Laplacian matrix with the given matrix and then sorting the components of a specified eigenvector of the Laplacian. This Laplacian eigenvector solves a continuous relaxation of a discrete problem related to envelope minimization called the minimum 2-sum problem. The permutation vector computed by the spectral algorithm is a closest permutation vector to the specified Laplacian eigenvector. Numerical results show that the new reordering algorithm usually computes smaller envelope sizes than those obtained from the current standard algorithms such as Gibbs-Poole-Stockmeyer (GPS) or SPARSPAK reverse Cuthill-McKee (RCM), in some cases reducing the envelope by more than a factor of two.
Algorithms Bridging Quantum Computation and Chemistry

NASA Astrophysics Data System (ADS)

McClean, Jarrod Ryan

The design of new materials and chemicals derived entirely from computation has long been a goal of computational chemistry, and the governing equation whose solution would permit this dream is known. Unfortunately, the exact solution to this equation has been far too expensive and clever approximations fail in critical situations. Quantum computers offer a novel solution to this problem. In this work, we develop not only new algorithms to use quantum computers to study hard problems in chemistry, but also explore how such algorithms can help us to better understand and improve our traditional approaches. In particular, we first introduce a new method, the variational quantum eigensolver, which is designed to maximally utilize the quantum resources available in a device to solve chemical problems. We apply this method in a real quantum photonic device in the lab to study the dissociation of the helium hydride (HeH+) molecule. We also enhance this methodology with architecture specific optimizations on ion trap computers and show how linear-scaling techniques from traditional quantum chemistry can be used to improve the outlook of similar algorithms on quantum computers. We then show how studying quantum algorithms such as these can be used to understand and enhance the development of classical algorithms. In particular we use a tool from adiabatic quantum computation, Feynman's Clock, to develop a new discrete time variational principle and further establish a connection between real-time quantum dynamics and ground state eigenvalue problems. We use these tools to develop two novel parallel-in-time quantum algorithms that outperform competitive algorithms as well as offer new insights into the connection between the fermion sign problem of ground states and the dynamical sign problem of quantum dynamics. Finally we use insights gained in the study of quantum circuits to explore a general notion of sparsity in many-body quantum systems. In particular we use developments from the field of compressed sensing to find compact representations of ground states. As an application we study electronic systems and find solutions dramatically more compact than traditional configuration interaction expansions, offering hope to extend this methodology to challenging systems in chemical and material design.
Quantum computation for solving linear systems

NASA Astrophysics Data System (ADS)

Cao, Yudong

Quantum computation is a subject born out of the combination between physics and computer science. It studies how the laws of quantum mechanics can be exploited to perform computations much more efficiently than current computers (termed classical computers as oppose to quantum computers). The thesis starts by introducing ideas from quantum physics and theoretical computer science and based on these ideas, introducing the basic concepts in quantum computing. These introductory discussions are intended for non-specialists to obtain the essential knowledge needed for understanding the new results presented in the subsequent chapters. After introducing the basics of quantum computing, we focus on the recently proposed quantum algorithm for linear systems. The new results include i) special instances of quantum circuits that can be implemented using current experimental resources; ii) detailed quantum algorithms that are suitable for a broader class of linear systems. We show that for some particular problems the quantum algorithm is able to achieve exponential speedup over their classical counterparts.
Lanczos eigensolution method for high-performance computers

NASA Technical Reports Server (NTRS)

Bostic, Susan W.

1991-01-01

The theory, computational analysis, and applications are presented of a Lanczos algorithm on high performance computers. The computationally intensive steps of the algorithm are identified as: the matrix factorization, the forward/backward equation solution, and the matrix vector multiples. These computational steps are optimized to exploit the vector and parallel capabilities of high performance computers. The savings in computational time from applying optimization techniques such as: variable band and sparse data storage and access, loop unrolling, use of local memory, and compiler directives are presented. Two large scale structural analysis applications are described: the buckling of a composite blade stiffened panel with a cutout, and the vibration analysis of a high speed civil transport. The sequential computational time for the panel problem executed on a CONVEX computer of 181.6 seconds was decreased to 14.1 seconds with the optimized vector algorithm. The best computational time of 23 seconds for the transport problem with 17,000 degs of freedom was on the the Cray-YMP using an average of 3.63 processors.
Dynamic graph cuts for efficient inference in Markov Random Fields.

PubMed

Kohli, Pushmeet; Torr, Philip H S

2007-12-01

Abstract-In this paper we present a fast new fully dynamic algorithm for the st-mincut/max-flow problem. We show how this algorithm can be used to efficiently compute MAP solutions for certain dynamically changing MRF models in computer vision such as image segmentation. Specifically, given the solution of the max-flow problem on a graph, the dynamic algorithm efficiently computes the maximum flow in a modified version of the graph. The time taken by it is roughly proportional to the total amount of change in the edge weights of the graph. Our experiments show that, when the number of changes in the graph is small, the dynamic algorithm is significantly faster than the best known static graph cut algorithm. We test the performance of our algorithm on one particular problem: the object-background segmentation problem for video. It should be noted that the application of our algorithm is not limited to the above problem, the algorithm is generic and can be used to yield similar improvements in many other cases that involve dynamic change.
An Accelerated Recursive Doubling Algorithm for Block Tridiagonal Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Seal, Sudip K

2014-01-01

Block tridiagonal systems of linear equations arise in a wide variety of scientific and engineering applications. Recursive doubling algorithm is a well-known prefix computation-based numerical algorithm that requires O(M^3(N/P + log P)) work to compute the solution of a block tridiagonal system with N block rows and block size M on P processors. In real-world applications, solutions of tridiagonal systems are most often sought with multiple, often hundreds and thousands, of different right hand sides but with the same tridiagonal matrix. Here, we show that a recursive doubling algorithm is sub-optimal when computing solutions of block tridiagonal systems with multiplemore » right hand sides and present a novel algorithm, called the accelerated recursive doubling algorithm, that delivers O(R) improvement when solving block tridiagonal systems with R distinct right hand sides. Since R is typically about 100 1000, this improvement translates to very significant speedups in practice. Detailed complexity analyses of the new algorithm with empirical confirmation of runtime improvements are presented. To the best of our knowledge, this algorithm has not been reported before in the literature.« less
One-way quantum computing in superconducting circuits

NASA Astrophysics Data System (ADS)

Albarrán-Arriagada, F.; Alvarado Barrios, G.; Sanz, M.; Romero, G.; Lamata, L.; Retamal, J. C.; Solano, E.

2018-03-01

We propose a method for the implementation of one-way quantum computing in superconducting circuits. Measurement-based quantum computing is a universal quantum computation paradigm in which an initial cluster state provides the quantum resource, while the iteration of sequential measurements and local rotations encodes the quantum algorithm. Up to now, technical constraints have limited a scalable approach to this quantum computing alternative. The initial cluster state can be generated with available controlled-phase gates, while the quantum algorithm makes use of high-fidelity readout and coherent feedforward. With current technology, we estimate that quantum algorithms with above 20 qubits may be implemented in the path toward quantum supremacy. Moreover, we propose an alternative initial state with properties of maximal persistence and maximal connectedness, reducing the required resources of one-way quantum computing protocols.
Results of the 2016 International Skin Imaging Collaboration International Symposium on Biomedical Imaging challenge: Comparison of the accuracy of computer algorithms to dermatologists for the diagnosis of melanoma from dermoscopic images.

PubMed

Marchetti, Michael A; Codella, Noel C F; Dusza, Stephen W; Gutman, David A; Helba, Brian; Kalloo, Aadi; Mishra, Nabin; Carrera, Cristina; Celebi, M Emre; DeFazio, Jennifer L; Jaimes, Natalia; Marghoob, Ashfaq A; Quigley, Elizabeth; Scope, Alon; Yélamos, Oriol; Halpern, Allan C

2018-02-01

Computer vision may aid in melanoma detection. We sought to compare melanoma diagnostic accuracy of computer algorithms to dermatologists using dermoscopic images. We conducted a cross-sectional study using 100 randomly selected dermoscopic images (50 melanomas, 44 nevi, and 6 lentigines) from an international computer vision melanoma challenge dataset (n = 379), along with individual algorithm results from 25 teams. We used 5 methods (nonlearned and machine learning) to combine individual automated predictions into "fusion" algorithms. In a companion study, 8 dermatologists classified the lesions in the 100 images as either benign or malignant. The average sensitivity and specificity of dermatologists in classification was 82% and 59%. At 82% sensitivity, dermatologist specificity was similar to the top challenge algorithm (59% vs. 62%, P = .68) but lower than the best-performing fusion algorithm (59% vs. 76%, P = .02). Receiver operating characteristic area of the top fusion algorithm was greater than the mean receiver operating characteristic area of dermatologists (0.86 vs. 0.71, P = .001). The dataset lacked the full spectrum of skin lesions encountered in clinical practice, particularly banal lesions. Readers and algorithms were not provided clinical data (eg, age or lesion history/symptoms). Results obtained using our study design cannot be extrapolated to clinical practice. Deep learning computer vision systems classified melanoma dermoscopy images with accuracy that exceeded some but not all dermatologists. Copyright © 2017 American Academy of Dermatology, Inc. Published by Elsevier Inc. All rights reserved.
Fast Katz and Commuters: Efficient Estimation of Social Relatedness in Large Networks

NASA Astrophysics Data System (ADS)

Esfandiar, Pooya; Bonchi, Francesco; Gleich, David F.; Greif, Chen; Lakshmanan, Laks V. S.; On, Byung-Won

Motivated by social network data mining problems such as link prediction and collaborative filtering, significant research effort has been devoted to computing topological measures including the Katz score and the commute time. Existing approaches typically approximate all pairwise relationships simultaneously. In this paper, we are interested in computing: the score for a single pair of nodes, and the top-k nodes with the best scores from a given source node. For the pairwise problem, we apply an iterative algorithm that computes upper and lower bounds for the measures we seek. This algorithm exploits a relationship between the Lanczos process and a quadrature rule. For the top-k problem, we propose an algorithm that only accesses a small portion of the graph and is related to techniques used in personalized PageRank computing. To test the scalability and accuracy of our algorithms we experiment with three real-world networks and find that these algorithms run in milliseconds to seconds without any preprocessing.
Fast katz and commuters : efficient estimation of social relatedness in large networks.

DOE Office of Scientific and Technical Information (OSTI.GOV)

On, Byung-Won; Lakshmanan, Laks V. S.; Greif, Chen

Motivated by social network data mining problems such as link prediction and collaborative filtering, significant research effort has been devoted to computing topological measures including the Katz score and the commute time. Existing approaches typically approximate all pairwise relationships simultaneously. In this paper, we are interested in computing: the score for a single pair of nodes, and the top-k nodes with the best scores from a given source node. For the pairwise problem, we apply an iterative algorithm that computes upper and lower bounds for the measures we seek. This algorithm exploits a relationship between the Lanczos process and amore » quadrature rule. For the top-k problem, we propose an algorithm that only accesses a small portion of the graph and is related to techniques used in personalized PageRank computing. To test the scalability and accuracy of our algorithms we experiment with three real-world networks and find that these algorithms run in milliseconds to seconds without any preprocessing.« less
Conjugate-Gradient Algorithms For Dynamics Of Manipulators

NASA Technical Reports Server (NTRS)

Fijany, Amir; Scheid, Robert E.

1993-01-01

Algorithms for serial and parallel computation of forward dynamics of multiple-link robotic manipulators by conjugate-gradient method developed. Parallel algorithms have potential for speedup of computations on multiple linked, specialized processors implemented in very-large-scale integrated circuits. Such processors used to stimulate dynamics, possibly faster than in real time, for purposes of planning and control.
Computing Maximum Likelihood Estimates of Loglinear Models from Marginal Sums with Special Attention to Loglinear Item Response Theory.

ERIC Educational Resources Information Center

Kelderman, Henk

1992-01-01

Describes algorithms used in the computer program LOGIMO for obtaining maximum likelihood estimates of the parameters in loglinear models. These algorithms are also useful for the analysis of loglinear item-response theory models. Presents modified versions of the iterative proportional fitting and Newton-Raphson algorithms. Simulated data…
On computation of Gröbner bases for linear difference systems

NASA Astrophysics Data System (ADS)

Gerdt, Vladimir P.

2006-04-01

In this paper, we present an algorithm for computing Gröbner bases of linear ideals in a difference polynomial ring over a ground difference field. The input difference polynomials generating the ideal are also assumed to be linear. The algorithm is an adaptation to difference ideals of our polynomial algorithm based on Janet-like reductions.
Computer Music

NASA Astrophysics Data System (ADS)

Cook, Perry

This chapter covers algorithms, technologies, computer languages, and systems for computer music. Computer music involves the application of computers and other digital/electronic technologies to music composition, performance, theory, history, and perception. The field combines digital signal processing, computational algorithms, computer languages, hardware and software systems, acoustics, psychoacoustics (low-level perception of sounds from the raw acoustic signal), and music cognition (higher-level perception of musical style, form, emotion, etc.). Although most people would think that analog synthesizers and electronic music substantially predate the use of computers in music, many experiments and complete computer music systems were being constructed and used as early as the 1950s.
Computational Approaches to Simulation and Optimization of Global Aircraft Trajectories

NASA Technical Reports Server (NTRS)

Ng, Hok Kwan; Sridhar, Banavar

2016-01-01

This study examines three possible approaches to improving the speed in generating wind-optimal routes for air traffic at the national or global level. They are: (a) using the resources of a supercomputer, (b) running the computations on multiple commercially available computers and (c) implementing those same algorithms into NASAs Future ATM Concepts Evaluation Tool (FACET) and compares those to a standard implementation run on a single CPU. Wind-optimal aircraft trajectories are computed using global air traffic schedules. The run time and wait time on the supercomputer for trajectory optimization using various numbers of CPUs ranging from 80 to 10,240 units are compared with the total computational time for running the same computation on a single desktop computer and on multiple commercially available computers for potential computational enhancement through parallel processing on the computer clusters. This study also re-implements the trajectory optimization algorithm for further reduction of computational time through algorithm modifications and integrates that with FACET to facilitate the use of the new features which calculate time-optimal routes between worldwide airport pairs in a wind field for use with existing FACET applications. The implementations of trajectory optimization algorithms use MATLAB, Python, and Java programming languages. The performance evaluations are done by comparing their computational efficiencies and based on the potential application of optimized trajectories. The paper shows that in the absence of special privileges on a supercomputer, a cluster of commercially available computers provides a feasible approach for national and global air traffic system studies.

Benchmark for Peak Detection Algorithms in Fiber Bragg Grating Interrogation and a New Neural Network for its Performance Improvement

PubMed Central

Negri, Lucas; Nied, Ademir; Kalinowski, Hypolito; Paterno, Aleksander

2011-01-01

This paper presents a benchmark for peak detection algorithms employed in fiber Bragg grating spectrometric interrogation systems. The accuracy, precision, and computational performance of currently used algorithms and those of a new proposed artificial neural network algorithm are compared. Centroid and gaussian fitting algorithms are shown to have the highest precision but produce systematic errors that depend on the FBG refractive index modulation profile. The proposed neural network displays relatively good precision with reduced systematic errors and improved computational performance when compared to other networks. Additionally, suitable algorithms may be chosen with the general guidelines presented. PMID:22163806
A Comparative Analysis of Community Detection Algorithms on Artificial Networks

PubMed Central

Yang, Zhao; Algesheimer, René; Tessone, Claudio J.

2016-01-01

Many community detection algorithms have been developed to uncover the mesoscopic properties of complex networks. However how good an algorithm is, in terms of accuracy and computing time, remains still open. Testing algorithms on real-world network has certain restrictions which made their insights potentially biased: the networks are usually small, and the underlying communities are not defined objectively. In this study, we employ the Lancichinetti-Fortunato-Radicchi benchmark graph to test eight state-of-the-art algorithms. We quantify the accuracy using complementary measures and algorithms’ computing time. Based on simple network properties and the aforementioned results, we provide guidelines that help to choose the most adequate community detection algorithm for a given network. Moreover, these rules allow uncovering limitations in the use of specific algorithms given macroscopic network properties. Our contribution is threefold: firstly, we provide actual techniques to determine which is the most suited algorithm in most circumstances based on observable properties of the network under consideration. Secondly, we use the mixing parameter as an easily measurable indicator of finding the ranges of reliability of the different algorithms. Finally, we study the dependency with network size focusing on both the algorithm’s predicting power and the effective computing time. PMID:27476470
Approximate Algorithms for Computing Spatial Distance Histograms with Accuracy Guarantees

PubMed Central

Grupcev, Vladimir; Yuan, Yongke; Tu, Yi-Cheng; Huang, Jin; Chen, Shaoping; Pandit, Sagar; Weng, Michael

2014-01-01

Particle simulation has become an important research tool in many scientific and engineering fields. Data generated by such simulations impose great challenges to database storage and query processing. One of the queries against particle simulation data, the spatial distance histogram (SDH) query, is the building block of many high-level analytics, and requires quadratic time to compute using a straightforward algorithm. Previous work has developed efficient algorithms that compute exact SDHs. While beating the naive solution, such algorithms are still not practical in processing SDH queries against large-scale simulation data. In this paper, we take a different path to tackle this problem by focusing on approximate algorithms with provable error bounds. We first present a solution derived from the aforementioned exact SDH algorithm, and this solution has running time that is unrelated to the system size N. We also develop a mathematical model to analyze the mechanism that leads to errors in the basic approximate algorithm. Our model provides insights on how the algorithm can be improved to achieve higher accuracy and efficiency. Such insights give rise to a new approximate algorithm with improved time/accuracy tradeoff. Experimental results confirm our analysis. PMID:24693210
Algorithms for the explicit computation of Penrose diagrams

NASA Astrophysics Data System (ADS)

Schindler, J. C.; Aguirre, A.

2018-05-01

An algorithm is given for explicitly computing Penrose diagrams for spacetimes of the form . The resulting diagram coordinates are shown to extend the metric continuously and nondegenerately across an arbitrary number of horizons. The method is extended to include piecewise approximations to dynamically evolving spacetimes using a standard hypersurface junction procedure. Examples generated by an implementation of the algorithm are shown for standard and new cases. In the appendix, this algorithm is compared to existing methods.
Computations involving differential operators and their actions on functions

NASA Technical Reports Server (NTRS)

Crouch, Peter E.; Grossman, Robert; Larson, Richard

1991-01-01

The algorithms derived by Grossmann and Larson (1989) are further developed for rewriting expressions involving differential operators. The differential operators involved arise in the local analysis of nonlinear dynamical systems. These algorithms are extended in two different directions: the algorithms are generalized so that they apply to differential operators on groups and the data structures and algorithms are developed to compute symbolically the action of differential operators on functions. Both of these generalizations are needed for applications.
The Even-Rho and Even-Epsilon Algorithms for Accelerating Convergence of a Numerical Sequence

DTIC Science & Technology

1981-12-01

equal, leading to zero or very small divisors. Computer programs implementing these algorithms are given along with sample output. An appreciable amount...calculation of the array of Shank’s transforms or, -A equivalently, of the related Padd Table. The :other, the even-rho algorithm, is closely related...leading to zero or very small divisors. Computer pro- grams implementing these algorithms are given along with sample output. An appreciable amount or
An exact computational method for performance analysis of sequential test algorithms for detecting network intrusions

NASA Astrophysics Data System (ADS)

Chen, Xinjia; Lacy, Fred; Carriere, Patrick

2015-05-01

Sequential test algorithms are playing increasingly important roles for quick detecting network intrusions such as portscanners. In view of the fact that such algorithms are usually analyzed based on intuitive approximation or asymptotic analysis, we develop an exact computational method for the performance analysis of such algorithms. Our method can be used to calculate the probability of false alarm and average detection time up to arbitrarily pre-specified accuracy.
Surpassing Humans and Computers with JellyBean: Crowd-Vision-Hybrid Counting Algorithms.

PubMed

Sarma, Akash Das; Jain, Ayush; Nandi, Arnab; Parameswaran, Aditya; Widom, Jennifer

2015-11-01

Counting objects is a fundamental image processisng primitive, and has many scientific, health, surveillance, security, and military applications. Existing supervised computer vision techniques typically require large quantities of labeled training data, and even with that, fail to return accurate results in all but the most stylized settings. Using vanilla crowd-sourcing, on the other hand, can lead to significant errors, especially on images with many objects. In this paper, we present our JellyBean suite of algorithms, that combines the best of crowds and computer vision to count objects in images, and uses judicious decomposition of images to greatly improve accuracy at low cost. Our algorithms have several desirable properties: (i) they are theoretically optimal or near-optimal , in that they ask as few questions as possible to humans (under certain intuitively reasonable assumptions that we justify in our paper experimentally); (ii) they operate under stand-alone or hybrid modes, in that they can either work independent of computer vision algorithms, or work in concert with them, depending on whether the computer vision techniques are available or useful for the given setting; (iii) they perform very well in practice, returning accurate counts on images that no individual worker or computer vision algorithm can count correctly, while not incurring a high cost.
A micro-hydrology computation ordering algorithm

NASA Astrophysics Data System (ADS)

Croley, Thomas E.

1980-11-01

Discrete-distributed-parameter models are essential for watershed modelling where practical consideration of spatial variations in watershed properties and inputs is desired. Such modelling is necessary for analysis of detailed hydrologic impacts from management strategies and land-use effects. Trade-offs between model validity and model complexity exist in resolution of the watershed. Once these are determined, the watershed is then broken into sub-areas which each have essentially spatially-uniform properties. Lumped-parameter (micro-hydrology) models are applied to these sub-areas and their outputs are combined through the use of a computation ordering technique, as illustrated by many discrete-distributed-parameter hydrology models. Manual ordering of these computations requires fore-thought, and is tedious, error prone, sometimes storage intensive and least adaptable to changes in watershed resolution. A programmable algorithm for ordering micro-hydrology computations is presented that enables automatic ordering of computations within the computer via an easily understood and easily implemented "node" definition, numbering and coding scheme. This scheme and the algorithm are detailed in logic flow-charts and an example application is presented. Extensions and modifications of the algorithm are easily made for complex geometries or differing microhydrology models. The algorithm is shown to be superior to manual ordering techniques and has potential use in high-resolution studies.
Efficient Parallel Kernel Solvers for Computational Fluid Dynamics Applications

NASA Technical Reports Server (NTRS)

Sun, Xian-He

1997-01-01

Distributed-memory parallel computers dominate today's parallel computing arena. These machines, such as Intel Paragon, IBM SP2, and Cray Origin2OO, have successfully delivered high performance computing power for solving some of the so-called "grand-challenge" problems. Despite initial success, parallel machines have not been widely accepted in production engineering environments due to the complexity of parallel programming. On a parallel computing system, a task has to be partitioned and distributed appropriately among processors to reduce communication cost and to attain load balance. More importantly, even with careful partitioning and mapping, the performance of an algorithm may still be unsatisfactory, since conventional sequential algorithms may be serial in nature and may not be implemented efficiently on parallel machines. In many cases, new algorithms have to be introduced to increase parallel performance. In order to achieve optimal performance, in addition to partitioning and mapping, a careful performance study should be conducted for a given application to find a good algorithm-machine combination. This process, however, is usually painful and elusive. The goal of this project is to design and develop efficient parallel algorithms for highly accurate Computational Fluid Dynamics (CFD) simulations and other engineering applications. The work plan is 1) developing highly accurate parallel numerical algorithms, 2) conduct preliminary testing to verify the effectiveness and potential of these algorithms, 3) incorporate newly developed algorithms into actual simulation packages. The work plan has well achieved. Two highly accurate, efficient Poisson solvers have been developed and tested based on two different approaches: (1) Adopting a mathematical geometry which has a better capacity to describe the fluid, (2) Using compact scheme to gain high order accuracy in numerical discretization. The previously developed Parallel Diagonal Dominant (PDD) algorithm and Reduced Parallel Diagonal Dominant (RPDD) algorithm have been carefully studied on different parallel platforms for different applications, and a NASA simulation code developed by Man M. Rai and his colleagues has been parallelized and implemented based on data dependency analysis. These achievements are addressed in detail in the paper.
Gradient gravitational search: An efficient metaheuristic algorithm for global optimization.

PubMed

Dash, Tirtharaj; Sahu, Prabhat K

2015-05-30

The adaptation of novel techniques developed in the field of computational chemistry to solve the concerned problems for large and flexible molecules is taking the center stage with regard to efficient algorithm, computational cost and accuracy. In this article, the gradient-based gravitational search (GGS) algorithm, using analytical gradients for a fast minimization to the next local minimum has been reported. Its efficiency as metaheuristic approach has also been compared with Gradient Tabu Search and others like: Gravitational Search, Cuckoo Search, and Back Tracking Search algorithms for global optimization. Moreover, the GGS approach has also been applied to computational chemistry problems for finding the minimal value potential energy of two-dimensional and three-dimensional off-lattice protein models. The simulation results reveal the relative stability and physical accuracy of protein models with efficient computational cost. © 2015 Wiley Periodicals, Inc.
Computationally efficient algorithm for Gaussian Process regression in case of structured samples

NASA Astrophysics Data System (ADS)

Belyaev, M.; Burnaev, E.; Kapushev, Y.

2016-04-01

Surrogate modeling is widely used in many engineering problems. Data sets often have Cartesian product structure (for instance factorial design of experiments with missing points). In such case the size of the data set can be very large. Therefore, one of the most popular algorithms for approximation-Gaussian Process regression-can be hardly applied due to its computational complexity. In this paper a computationally efficient approach for constructing Gaussian Process regression in case of data sets with Cartesian product structure is presented. Efficiency is achieved by using a special structure of the data set and operations with tensors. Proposed algorithm has low computational as well as memory complexity compared to existing algorithms. In this work we also introduce a regularization procedure allowing to take into account anisotropy of the data set and avoid degeneracy of regression model.
Computing the Envelope for Stepwise Constant Resource Allocations

NASA Technical Reports Server (NTRS)

Muscettola, Nicola; Clancy, Daniel (Technical Monitor)

2001-01-01

Estimating tight resource level is a fundamental problem in the construction of flexible plans with resource utilization. In this paper we describe an efficient algorithm that builds a resource envelope, the tightest possible such bound. The algorithm is based on transforming the temporal network of resource consuming and producing events into a flow network with noises equal to the events and edges equal to the necessary predecessor links between events. The incremental solution of a staged maximum flow problem on the network is then used to compute the time of occurrence and the height of each step of the resource envelope profile. The staged algorithm has the same computational complexity of solving a maximum flow problem on the entire flow network. This makes this method computationally feasible for use in the inner loop of search-based scheduling algorithms.
SubspaceEM: A Fast Maximum-a-posteriori Algorithm for Cryo-EM Single Particle Reconstruction

PubMed Central

Dvornek, Nicha C.; Sigworth, Fred J.; Tagare, Hemant D.

2015-01-01

Single particle reconstruction methods based on the maximum-likelihood principle and the expectation-maximization (E–M) algorithm are popular because of their ability to produce high resolution structures. However, these algorithms are computationally very expensive, requiring a network of computational servers. To overcome this computational bottleneck, we propose a new mathematical framework for accelerating maximum-likelihood reconstructions. The speedup is by orders of magnitude and the proposed algorithm produces similar quality reconstructions compared to the standard maximum-likelihood formulation. Our approach uses subspace approximations of the cryo-electron microscopy (cryo-EM) data and projection images, greatly reducing the number of image transformations and comparisons that are computed. Experiments using simulated and actual cryo-EM data show that speedup in overall execution time compared to traditional maximum-likelihood reconstruction reaches factors of over 300. PMID:25839831
Efficient convolutional sparse coding

DOEpatents

Wohlberg, Brendt

2017-06-20

Computationally efficient algorithms may be applied for fast dictionary learning solving the convolutional sparse coding problem in the Fourier domain. More specifically, efficient convolutional sparse coding may be derived within an alternating direction method of multipliers (ADMM) framework that utilizes fast Fourier transforms (FFT) to solve the main linear system in the frequency domain. Such algorithms may enable a significant reduction in computational cost over conventional approaches by implementing a linear solver for the most critical and computationally expensive component of the conventional iterative algorithm. The theoretical computational cost of the algorithm may be reduced from O(M.sup.3N) to O(MN log N), where N is the dimensionality of the data and M is the number of elements in the dictionary. This significant improvement in efficiency may greatly increase the range of problems that can practically be addressed via convolutional sparse representations.
General Quantum Meet-in-the-Middle Search Algorithm Based on Target Solution of Fixed Weight

NASA Astrophysics Data System (ADS)

Fu, Xiang-Qun; Bao, Wan-Su; Wang, Xiang; Shi, Jian-Hong

2016-10-01

Similar to the classical meet-in-the-middle algorithm, the storage and computation complexity are the key factors that decide the efficiency of the quantum meet-in-the-middle algorithm. Aiming at the target vector of fixed weight, based on the quantum meet-in-the-middle algorithm, the algorithm for searching all n-product vectors with the same weight is presented, whose complexity is better than the exhaustive search algorithm. And the algorithm can reduce the storage complexity of the quantum meet-in-the-middle search algorithm. Then based on the algorithm and the knapsack vector of the Chor-Rivest public-key crypto of fixed weight d, we present a general quantum meet-in-the-middle search algorithm based on the target solution of fixed weight, whose computational complexity is \\sumj = 0d {(O(\\sqrt {Cn - k + 1d - j }) + O(C_kj log C_k^j))} with Σd i =0 Ck i memory cost. And the optimal value of k is given. Compared to the quantum meet-in-the-middle search algorithm for knapsack problem and the quantum algorithm for searching a target solution of fixed weight, the computational complexity of the algorithm is lower. And its storage complexity is smaller than the quantum meet-in-the-middle-algorithm. Supported by the National Basic Research Program of China under Grant No. 2013CB338002 and the National Natural Science Foundation of China under Grant No. 61502526
Parallel fuzzy connected image segmentation on GPU

PubMed Central

Zhuge, Ying; Cao, Yong; Udupa, Jayaram K.; Miller, Robert W.

2011-01-01

Purpose: Image segmentation techniques using fuzzy connectedness (FC) principles have shown their effectiveness in segmenting a variety of objects in several large applications. However, one challenge in these algorithms has been their excessive computational requirements when processing large image datasets. Nowadays, commodity graphics hardware provides a highly parallel computing environment. In this paper, the authors present a parallel fuzzy connected image segmentation algorithm implementation on NVIDIA’s compute unified device Architecture (cuda) platform for segmenting medical image data sets. Methods: In the FC algorithm, there are two major computational tasks: (i) computing the fuzzy affinity relations and (ii) computing the fuzzy connectedness relations. These two tasks are implemented as cuda kernels and executed on GPU. A dramatic improvement in speed for both tasks is achieved as a result. Results: Our experiments based on three data sets of small, medium, and large data size demonstrate the efficiency of the parallel algorithm, which achieves a speed-up factor of 24.4x, 18.1x, and 10.3x, correspondingly, for the three data sets on the NVIDIA Tesla C1060 over the implementation of the algorithm on CPU, and takes 0.25, 0.72, and 15.04 s, correspondingly, for the three data sets. Conclusions: The authors developed a parallel algorithm of the widely used fuzzy connected image segmentation method on the NVIDIA GPUs, which are far more cost- and speed-effective than both cluster of workstations and multiprocessing systems. A near-interactive speed of segmentation has been achieved, even for the large data set. PMID:21859037
Parallel fuzzy connected image segmentation on GPU.

PubMed

Zhuge, Ying; Cao, Yong; Udupa, Jayaram K; Miller, Robert W

2011-07-01

Image segmentation techniques using fuzzy connectedness (FC) principles have shown their effectiveness in segmenting a variety of objects in several large applications. However, one challenge in these algorithms has been their excessive computational requirements when processing large image datasets. Nowadays, commodity graphics hardware provides a highly parallel computing environment. In this paper, the authors present a parallel fuzzy connected image segmentation algorithm implementation on NVIDIA's compute unified device Architecture (CUDA) platform for segmenting medical image data sets. In the FC algorithm, there are two major computational tasks: (i) computing the fuzzy affinity relations and (ii) computing the fuzzy connectedness relations. These two tasks are implemented as CUDA kernels and executed on GPU. A dramatic improvement in speed for both tasks is achieved as a result. Our experiments based on three data sets of small, medium, and large data size demonstrate the efficiency of the parallel algorithm, which achieves a speed-up factor of 24.4x, 18.1x, and 10.3x, correspondingly, for the three data sets on the NVIDIA Tesla C1060 over the implementation of the algorithm on CPU, and takes 0.25, 0.72, and 15.04 s, correspondingly, for the three data sets. The authors developed a parallel algorithm of the widely used fuzzy connected image segmentation method on the NVIDIA GPUs, which are far more cost- and speed-effective than both cluster of workstations and multiprocessing systems. A near-interactive speed of segmentation has been achieved, even for the large data set.
A comparison between physicians and computer algorithms for form CMS-2728 data reporting.

PubMed

Malas, Mohammed Said; Wish, Jay; Moorthi, Ranjani; Grannis, Shaun; Dexter, Paul; Duke, Jon; Moe, Sharon

2017-01-01

CMS-2728 form (Medical Evidence Report) assesses 23 comorbidities chosen to reflect poor outcomes and increased mortality risk. Previous studies questioned the validity of physician reporting on forms CMS-2728. We hypothesize that reporting of comorbidities by computer algorithms identifies more comorbidities than physician completion, and, therefore, is more reflective of underlying disease burden. We collected data from CMS-2728 forms for all 296 patients who had incident ESRD diagnosis and received chronic dialysis from 2005 through 2014 at Indiana University outpatient dialysis centers. We analyzed patients' data from electronic medical records systems that collated information from multiple health care sources. Previously utilized algorithms or natural language processing was used to extract data on 10 comorbidities for a period of up to 10 years prior to ESRD incidence. These algorithms incorporate billing codes, prescriptions, and other relevant elements. We compared the presence or unchecked status of these comorbidities on the forms to the presence or absence according to the algorithms. Computer algorithms had higher reporting of comorbidities compared to forms completion by physicians. This remained true when decreasing data span to one year and using only a single health center source. The algorithms determination was well accepted by a physician panel. Importantly, algorithms use significantly increased the expected deaths and lowered the standardized mortality ratios. Using computer algorithms showed superior identification of comorbidities for form CMS-2728 and altered standardized mortality ratios. Adapting similar algorithms in available EMR systems may offer more thorough evaluation of comorbidities and improve quality reporting. © 2016 International Society for Hemodialysis.
GPU-based parallel algorithm for blind image restoration using midfrequency-based methods

NASA Astrophysics Data System (ADS)

Xie, Lang; Luo, Yi-han; Bao, Qi-liang

2013-08-01

GPU-based general-purpose computing is a new branch of modern parallel computing, so the study of parallel algorithms specially designed for GPU hardware architecture is of great significance. In order to solve the problem of high computational complexity and poor real-time performance in blind image restoration, the midfrequency-based algorithm for blind image restoration was analyzed and improved in this paper. Furthermore, a midfrequency-based filtering method is also used to restore the image hardly with any recursion or iteration. Combining the algorithm with data intensiveness, data parallel computing and GPU execution model of single instruction and multiple threads, a new parallel midfrequency-based algorithm for blind image restoration is proposed in this paper, which is suitable for stream computing of GPU. In this algorithm, the GPU is utilized to accelerate the estimation of class-G point spread functions and midfrequency-based filtering. Aiming at better management of the GPU threads, the threads in a grid are scheduled according to the decomposition of the filtering data in frequency domain after the optimization of data access and the communication between the host and the device. The kernel parallelism structure is determined by the decomposition of the filtering data to ensure the transmission rate to get around the memory bandwidth limitation. The results show that, with the new algorithm, the operational speed is significantly increased and the real-time performance of image restoration is effectively improved, especially for high-resolution images.

Method for concurrent execution of primitive operations by dynamically assigning operations based upon computational marked graph and availability of data

NASA Technical Reports Server (NTRS)

Mielke, Roland V. (Inventor); Stoughton, John W. (Inventor)

1990-01-01

Computationally complex primitive operations of an algorithm are executed concurrently in a plurality of functional units under the control of an assignment manager. The algorithm is preferably defined as a computationally marked graph contianing data status edges (paths) corresponding to each of the data flow edges. The assignment manager assigns primitive operations to the functional units and monitors completion of the primitive operations to determine data availability using the computational marked graph of the algorithm. All data accessing of the primitive operations is performed by the functional units independently of the assignment manager.
An Accurate and Efficient Algorithm for Detection of Radio Bursts with an Unknown Dispersion Measure, for Single-dish Telescopes and Interferometers

NASA Astrophysics Data System (ADS)

Zackay, Barak; Ofek, Eran O.

2017-01-01

Astronomical radio signals are subjected to phase dispersion while traveling through the interstellar medium. To optimally detect a short-duration signal within a frequency band, we have to precisely compensate for the unknown pulse dispersion, which is a computationally demanding task. We present the “fast dispersion measure transform” algorithm for optimal detection of such signals. Our algorithm has a low theoretical complexity of 2{N}f{N}t+{N}t{N}{{Δ }}{{log}}2({N}f), where Nf, Nt, and NΔ are the numbers of frequency bins, time bins, and dispersion measure bins, respectively. Unlike previously suggested fast algorithms, our algorithm conserves the sensitivity of brute-force dedispersion. Our tests indicate that this algorithm, running on a standard desktop computer and implemented in a high-level programming language, is already faster than the state-of-the-art dedispersion codes running on graphical processing units (GPUs). We also present a variant of the algorithm that can be efficiently implemented on GPUs. The latter algorithm’s computation and data-transport requirements are similar to those of a two-dimensional fast Fourier transform, indicating that incoherent dedispersion can now be considered a nonissue while planning future surveys. We further present a fast algorithm for sensitive detection of pulses shorter than the dispersive smearing limits of incoherent dedispersion. In typical cases, this algorithm is orders of magnitude faster than enumerating dispersion measures and coherently dedispersing by convolution. We analyze the computational complexity of pulsed signal searches by radio interferometers. We conclude that, using our suggested algorithms, maximally sensitive blind searches for dispersed pulses are feasible using existing facilities. We provide an implementation of these algorithms in Python and MATLAB.
An Improved Clustering Algorithm of Tunnel Monitoring Data for Cloud Computing

PubMed Central

Zhong, Luo; Tang, KunHao; Li, Lin; Yang, Guang; Ye, JingJing

2014-01-01

With the rapid development of urban construction, the number of urban tunnels is increasing and the data they produce become more and more complex. It results in the fact that the traditional clustering algorithm cannot handle the mass data of the tunnel. To solve this problem, an improved parallel clustering algorithm based on k-means has been proposed. It is a clustering algorithm using the MapReduce within cloud computing that deals with data. It not only has the advantage of being used to deal with mass data but also is more efficient. Moreover, it is able to compute the average dissimilarity degree of each cluster in order to clean the abnormal data. PMID:24982971
The challenge of computer mathematics.

PubMed

Barendregt, Henk; Wiedijk, Freek

2005-10-15

Progress in the foundations of mathematics has made it possible to formulate all thinkable mathematical concepts, algorithms and proofs in one language and in an impeccable way. This is not in spite of, but partially based on the famous results of Gödel and Turing. In this way statements are about mathematical objects and algorithms, proofs show the correctness of statements and computations, and computations are dealing with objects and proofs. Interactive computer systems for a full integration of defining, computing and proving are based on this. The human defines concepts, constructs algorithms and provides proofs, while the machine checks that the definitions are well formed and the proofs and computations are correct. Results formalized so far demonstrate the feasibility of this 'computer mathematics'. Also there are very good applications. The challenge is to make the systems more mathematician-friendly, by building libraries and tools. The eventual goal is to help humans to learn, develop, communicate, referee and apply mathematics.
A generalized Condat's algorithm of 1D total variation regularization

NASA Astrophysics Data System (ADS)

Makovetskii, Artyom; Voronin, Sergei; Kober, Vitaly

2017-09-01

A common way for solving the denosing problem is to utilize the total variation (TV) regularization. Many efficient numerical algorithms have been developed for solving the TV regularization problem. Condat described a fast direct algorithm to compute the processed 1D signal. Also there exists a direct algorithm with a linear time for 1D TV denoising referred to as the taut string algorithm. The Condat's algorithm is based on a dual problem to the 1D TV regularization. In this paper, we propose a variant of the Condat's algorithm based on the direct 1D TV regularization problem. The usage of the Condat's algorithm with the taut string approach leads to a clear geometric description of the extremal function. Computer simulation results are provided to illustrate the performance of the proposed algorithm for restoration of degraded signals.
Range image registration based on hash map and moth-flame optimization

NASA Astrophysics Data System (ADS)

Zou, Li; Ge, Baozhen; Chen, Lei

2018-03-01

Over the past decade, evolutionary algorithms (EAs) have been introduced to solve range image registration problems because of their robustness and high precision. However, EA-based range image registration algorithms are time-consuming. To reduce the computational time, an EA-based range image registration algorithm using hash map and moth-flame optimization is proposed. In this registration algorithm, a hash map is used to avoid over-exploitation in registration process. Additionally, we present a search equation that is better at exploration and a restart mechanism to avoid being trapped in local minima. We compare the proposed registration algorithm with the registration algorithms using moth-flame optimization and several state-of-the-art EA-based registration algorithms. The experimental results show that the proposed algorithm has a lower computational cost than other algorithms and achieves similar registration precision.
Design of a Performance-Responsive Drill and Practice Algorithm for Computer-Based Training.

ERIC Educational Resources Information Center

Vazquez-Abad, Jesus; LaFleur, Marc

1990-01-01

Reviews criticisms of the use of drill and practice programs in educational computing and describes potentials for its use in instruction. Topics discussed include guidelines for developing computer-based drill and practice; scripted training courseware; item format design; item bank design; and a performance-responsive algorithm for item…
Parallel computer vision

DOE Office of Scientific and Technical Information (OSTI.GOV)

Uhr, L.

1987-01-01

This book is written by research scientists involved in the development of massively parallel, but hierarchically structured, algorithms, architectures, and programs for image processing, pattern recognition, and computer vision. The book gives an integrated picture of the programs and algorithms that are being developed, and also of the multi-computer hardware architectures for which these systems are designed.
Assignment Of Finite Elements To Parallel Processors

NASA Technical Reports Server (NTRS)

Salama, Moktar A.; Flower, Jon W.; Otto, Steve W.

1990-01-01

Elements assigned approximately optimally to subdomains. Mapping algorithm based on simulated-annealing concept used to minimize approximate time required to perform finite-element computation on hypercube computer or other network of parallel data processors. Mapping algorithm needed when shape of domain complicated or otherwise not obvious what allocation of elements to subdomains minimizes cost of computation.
ASSURED CLOUD COMPUTING UNIVERSITY CENTER OFEXCELLENCE (ACC UCOE)

DTIC Science & Technology

2018-01-18

average 1 hour per response, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed...infrastructure security -Design of algorithms and techniques for real- time assuredness in cloud computing -Map-reduce task assignment with data locality...46 DESIGN OF ALGORITHMS AND TECHNIQUES FOR REAL- TIME ASSUREDNESS IN CLOUD COMPUTING
Parallel algorithms for mapping pipelined and parallel computations

NASA Technical Reports Server (NTRS)

Nicol, David M.

1988-01-01

Many computational problems in image processing, signal processing, and scientific computing are naturally structured for either pipelined or parallel computation. When mapping such problems onto a parallel architecture it is often necessary to aggregate an obvious problem decomposition. Even in this context the general mapping problem is known to be computationally intractable, but recent advances have been made in identifying classes of problems and architectures for which optimal solutions can be found in polynomial time. Among these, the mapping of pipelined or parallel computations onto linear array, shared memory, and host-satellite systems figures prominently. This paper extends that work first by showing how to improve existing serial mapping algorithms. These improvements have significantly lower time and space complexities: in one case a published O(nm sup 3) time algorithm for mapping m modules onto n processors is reduced to an O(nm log m) time complexity, and its space requirements reduced from O(nm sup 2) to O(m). Run time complexity is further reduced with parallel mapping algorithms based on these improvements, which run on the architecture for which they create the mappings.
Optimizing Approximate Weighted Matching on Nvidia Kepler K40

DOE Office of Scientific and Technical Information (OSTI.GOV)

Naim, Md; Manne, Fredrik; Halappanavar, Mahantesh

Matching is a fundamental graph problem with numerous applications in science and engineering. While algorithms for computing optimal matchings are difficult to parallelize, approximation algorithms on the other hand generally compute high quality solutions and are amenable to parallelization. In this paper, we present efficient implementations of the current best algorithm for half-approximate weighted matching, the Suitor algorithm, on Nvidia Kepler K-40 platform. We develop four variants of the algorithm that exploit hardware features to address key challenges for a GPU implementation. We also experiment with different combinations of work assigned to a warp. Using an exhaustive set ofmore » $269$ inputs, we demonstrate that the new implementation outperforms the previous best GPU algorithm by $10$ to $$100\\times$$ for over $100$ instances, and from $100$ to $$1000\\times$$ for $15$ instances. We also demonstrate up to $$20\\times$$ speedup relative to $2$ threads, and up to $$5\\times$$ relative to $16$ threads on Intel Xeon platform with $16$ cores for the same algorithm. The new algorithms and implementations provided in this paper will have a direct impact on several applications that repeatedly use matching as a key compute kernel. Further, algorithm designs and insights provided in this paper will benefit other researchers implementing graph algorithms on modern GPU architectures.« less
A real time microcomputer implementation of sensor failure detection for turbofan engines

NASA Technical Reports Server (NTRS)

Delaat, John C.; Merrill, Walter C.

1989-01-01

An algorithm was developed which detects, isolates, and accommodates sensor failures using analytical redundancy. The performance of this algorithm was demonstrated on a full-scale F100 turbofan engine. The algorithm was implemented in real-time on a microprocessor-based controls computer which includes parallel processing and high order language programming. Parallel processing was used to achieve the required computational power for the real-time implementation. High order language programming was used in order to reduce the programming and maintenance costs of the algorithm implementation software. The sensor failure algorithm was combined with an existing multivariable control algorithm to give a complete control implementation with sensor analytical redundancy. The real-time microprocessor implementation of the algorithm which resulted in the successful completion of the algorithm engine demonstration, is described.
A joint swarm intelligence algorithm for multi-user detection in MIMO-OFDM system

NASA Astrophysics Data System (ADS)

Hu, Fengye; Du, Dakun; Zhang, Peng; Wang, Zhijun

2014-11-01

In the multi-input multi-output orthogonal frequency division multiplexing (MIMO-OFDM) system, traditional multi-user detection (MUD) algorithms that usually used to suppress multiple access interference are difficult to balance system detection performance and the complexity of the algorithm. To solve this problem, this paper proposes a joint swarm intelligence algorithm called Ant Colony and Particle Swarm Optimisation (AC-PSO) by integrating particle swarm optimisation (PSO) and ant colony optimisation (ACO) algorithms. According to simulation results, it has been shown that, with low computational complexity, the MUD for the MIMO-OFDM system based on AC-PSO algorithm gains comparable MUD performance with maximum likelihood algorithm. Thus, the proposed AC-PSO algorithm provides a satisfactory trade-off between computational complexity and detection performance.
DANoC: An Efficient Algorithm and Hardware Codesign of Deep Neural Networks on Chip.

PubMed

Zhou, Xichuan; Li, Shengli; Tang, Fang; Hu, Shengdong; Lin, Zhi; Zhang, Lei

2017-07-18

Deep neural networks (NNs) are the state-of-the-art models for understanding the content of images and videos. However, implementing deep NNs in embedded systems is a challenging task, e.g., a typical deep belief network could exhaust gigabytes of memory and result in bandwidth and computational bottlenecks. To address this challenge, this paper presents an algorithm and hardware codesign for efficient deep neural computation. A hardware-oriented deep learning algorithm, named the deep adaptive network, is proposed to explore the sparsity of neural connections. By adaptively removing the majority of neural connections and robustly representing the reserved connections using binary integers, the proposed algorithm could save up to 99.9% memory utility and computational resources without undermining classification accuracy. An efficient sparse-mapping-memory-based hardware architecture is proposed to fully take advantage of the algorithmic optimization. Different from traditional Von Neumann architecture, the deep-adaptive network on chip (DANoC) brings communication and computation in close proximity to avoid power-hungry parameter transfers between on-board memory and on-chip computational units. Experiments over different image classification benchmarks show that the DANoC system achieves competitively high accuracy and efficiency comparing with the state-of-the-art approaches.
Parallel Computing Strategies for Irregular Algorithms

NASA Technical Reports Server (NTRS)

Biswas, Rupak; Oliker, Leonid; Shan, Hongzhang; Biegel, Bryan (Technical Monitor)

2002-01-01

Parallel computing promises several orders of magnitude increase in our ability to solve realistic computationally-intensive problems, but relies on their efficient mapping and execution on large-scale multiprocessor architectures. Unfortunately, many important applications are irregular and dynamic in nature, making their effective parallel implementation a daunting task. Moreover, with the proliferation of parallel architectures and programming paradigms, the typical scientist is faced with a plethora of questions that must be answered in order to obtain an acceptable parallel implementation of the solution algorithm. In this paper, we consider three representative irregular applications: unstructured remeshing, sparse matrix computations, and N-body problems, and parallelize them using various popular programming paradigms on a wide spectrum of computer platforms ranging from state-of-the-art supercomputers to PC clusters. We present the underlying problems, the solution algorithms, and the parallel implementation strategies. Smart load-balancing, partitioning, and ordering techniques are used to enhance parallel performance. Overall results demonstrate the complexity of efficiently parallelizing irregular algorithms.
Practical Algorithms for the Longest Common Extension Problem

NASA Astrophysics Data System (ADS)

Ilie, Lucian; Tinta, Liviu

The Longest Common Extension problem considers a string s and computes, for each of a number of pairs (i,j), the longest substring of s that starts at both i and j. It appears as a subproblem in many fundamental string problems and can be solved by linear-time preprocessing of the string that allows (worst-case) constant-time computation for each pair. The two known approaches use powerful algorithms: either constant-time computation of the Lowest Common Ancestor in trees or constant-time computation of Range Minimum Queries (RMQ) in arrays. We show here that, from practical point of view, such complicated approaches are not needed. We give two very simple algorithms for this problem that require no preprocessing. The first needs only the string and is significantly faster than all previous algorithms on the average. The second combines the first with a direct RMQ computation on the Longest Common Prefix array. It takes advantage of the superior speed of the cache memory and is the fastest on virtually all inputs.
Cloud computing for comparative genomics

PubMed Central

2010-01-01

Background Large comparative genomics studies and tools are becoming increasingly more compute-expensive as the number of available genome sequences continues to rise. The capacity and cost of local computing infrastructures are likely to become prohibitive with the increase, especially as the breadth of questions continues to rise. Alternative computing architectures, in particular cloud computing environments, may help alleviate this increasing pressure and enable fast, large-scale, and cost-effective comparative genomics strategies going forward. To test this, we redesigned a typical comparative genomics algorithm, the reciprocal smallest distance algorithm (RSD), to run within Amazon's Elastic Computing Cloud (EC2). We then employed the RSD-cloud for ortholog calculations across a wide selection of fully sequenced genomes. Results We ran more than 300,000 RSD-cloud processes within the EC2. These jobs were farmed simultaneously to 100 high capacity compute nodes using the Amazon Web Service Elastic Map Reduce and included a wide mix of large and small genomes. The total computation time took just under 70 hours and cost a total of $6,302 USD. Conclusions The effort to transform existing comparative genomics algorithms from local compute infrastructures is not trivial. However, the speed and flexibility of cloud computing environments provides a substantial boost with manageable cost. The procedure designed to transform the RSD algorithm into a cloud-ready application is readily adaptable to similar comparative genomics problems. PMID:20482786
Cloud computing for comparative genomics.

PubMed

Wall, Dennis P; Kudtarkar, Parul; Fusaro, Vincent A; Pivovarov, Rimma; Patil, Prasad; Tonellato, Peter J

2010-05-18

Large comparative genomics studies and tools are becoming increasingly more compute-expensive as the number of available genome sequences continues to rise. The capacity and cost of local computing infrastructures are likely to become prohibitive with the increase, especially as the breadth of questions continues to rise. Alternative computing architectures, in particular cloud computing environments, may help alleviate this increasing pressure and enable fast, large-scale, and cost-effective comparative genomics strategies going forward. To test this, we redesigned a typical comparative genomics algorithm, the reciprocal smallest distance algorithm (RSD), to run within Amazon's Elastic Computing Cloud (EC2). We then employed the RSD-cloud for ortholog calculations across a wide selection of fully sequenced genomes. We ran more than 300,000 RSD-cloud processes within the EC2. These jobs were farmed simultaneously to 100 high capacity compute nodes using the Amazon Web Service Elastic Map Reduce and included a wide mix of large and small genomes. The total computation time took just under 70 hours and cost a total of $6,302 USD. The effort to transform existing comparative genomics algorithms from local compute infrastructures is not trivial. However, the speed and flexibility of cloud computing environments provides a substantial boost with manageable cost. The procedure designed to transform the RSD algorithm into a cloud-ready application is readily adaptable to similar comparative genomics problems.
An Implicit Upwind Algorithm for Computing Turbulent Flows on Unstructured Grids

NASA Technical Reports Server (NTRS)

Anerson, W. Kyle; Bonhaus, Daryl L.

1994-01-01

An implicit, Navier-Stokes solution algorithm is presented for the computation of turbulent flow on unstructured grids. The inviscid fluxes are computed using an upwind algorithm and the solution is advanced in time using a backward-Euler time-stepping scheme. At each time step, the linear system of equations is approximately solved with a point-implicit relaxation scheme. This methodology provides a viable and robust algorithm for computing turbulent flows on unstructured meshes. Results are shown for subsonic flow over a NACA 0012 airfoil and for transonic flow over a RAE 2822 airfoil exhibiting a strong upper-surface shock. In addition, results are shown for 3 element and 4 element airfoil configurations. For the calculations, two one equation turbulence models are utilized. For the NACA 0012 airfoil, a pressure distribution and force data are compared with other computational results as well as with experiment. Comparisons of computed pressure distributions and velocity profiles with experimental data are shown for the RAE airfoil and for the 3 element configuration. For the 4 element case, comparisons of surface pressure distributions with experiment are made. In general, the agreement between the computations and the experiment is good.

Dynamic Load-Balancing for Distributed Heterogeneous Computing of Parallel CFD Problems

NASA Technical Reports Server (NTRS)

Ecer, A.; Chien, Y. P.; Boenisch, T.; Akay, H. U.

2000-01-01

The developed methodology is aimed at improving the efficiency of executing block-structured algorithms on parallel, distributed, heterogeneous computers. The basic approach of these algorithms is to divide the flow domain into many sub- domains called blocks, and solve the governing equations over these blocks. Dynamic load balancing problem is defined as the efficient distribution of the blocks among the available processors over a period of several hours of computations. In environments with computers of different architecture, operating systems, CPU speed, memory size, load, and network speed, balancing the loads and managing the communication between processors becomes crucial. Load balancing software tools for mutually dependent parallel processes have been created to efficiently utilize an advanced computation environment and algorithms. These tools are dynamic in nature because of the chances in the computer environment during execution time. More recently, these tools were extended to a second operating system: NT. In this paper, the problems associated with this application will be discussed. Also, the developed algorithms were combined with the load sharing capability of LSF to efficiently utilize workstation clusters for parallel computing. Finally, results will be presented on running a NASA based code ADPAC to demonstrate the developed tools for dynamic load balancing.
A novel strategy for load balancing of distributed medical applications.

PubMed

Logeswaran, Rajasvaran; Chen, Li-Choo

2012-04-01

Current trends in medicine, specifically in the electronic handling of medical applications, ranging from digital imaging, paperless hospital administration and electronic medical records, telemedicine, to computer-aided diagnosis, creates a burden on the network. Distributed Service Architectures, such as Intelligent Network (IN), Telecommunication Information Networking Architecture (TINA) and Open Service Access (OSA), are able to meet this new challenge. Distribution enables computational tasks to be spread among multiple processors; hence, performance is an important issue. This paper proposes a novel approach in load balancing, the Random Sender Initiated Algorithm, for distribution of tasks among several nodes sharing the same computational object (CO) instances in Distributed Service Architectures. Simulations illustrate that the proposed algorithm produces better network performance than the benchmark load balancing algorithms-the Random Node Selection Algorithm and the Shortest Queue Algorithm, especially under medium and heavily loaded conditions.
Supercomputing resources empowering superstack with interactive and integrated systems

NASA Astrophysics Data System (ADS)

Rückemann, Claus-Peter

2012-09-01

This paper presents the results from the development and implementation of Superstack algorithms to be dynamically used with integrated systems and supercomputing resources. Processing of geophysical data, thus named geoprocessing, is an essential part of the analysis of geoscientific data. The theory of Superstack algorithms and the practical application on modern computing architectures was inspired by developments introduced with processing of seismic data on mainframes and within the last years leading to high end scientific computing applications. There are several stacking algorithms known but with low signal to noise ratio in seismic data the use of iterative algorithms like the Superstack can support analysis and interpretation. The new Superstack algorithms are in use with wave theory and optical phenomena on highly performant computing resources for huge data sets as well as for sophisticated application scenarios in geosciences and archaeology.
Design and Implementation of Hybrid CORDIC Algorithm Based on Phase Rotation Estimation for NCO

PubMed Central

Zhang, Chaozhu; Han, Jinan; Li, Ke

2014-01-01

The numerical controlled oscillator has wide application in radar, digital receiver, and software radio system. Firstly, this paper introduces the traditional CORDIC algorithm. Then in order to improve computing speed and save resources, this paper proposes a kind of hybrid CORDIC algorithm based on phase rotation estimation applied in numerical controlled oscillator (NCO). Through estimating the direction of part phase rotation, the algorithm reduces part phase rotation and add-subtract unit, so that it decreases delay. Furthermore, the paper simulates and implements the numerical controlled oscillator by Quartus II software and Modelsim software. Finally, simulation results indicate that the improvement over traditional CORDIC algorithm is achieved in terms of ease of computation, resource utilization, and computing speed/delay while maintaining the precision. It is suitable for high speed and precision digital modulation and demodulation. PMID:25110750
Hardware architecture design of image restoration based on time-frequency domain computation

NASA Astrophysics Data System (ADS)

Wen, Bo; Zhang, Jing; Jiao, Zipeng

2013-10-01

The image restoration algorithms based on time-frequency domain computation is high maturity and applied widely in engineering. To solve the high-speed implementation of these algorithms, the TFDC hardware architecture is proposed. Firstly, the main module is designed, by analyzing the common processing and numerical calculation. Then, to improve the commonality, the iteration control module is planed for iterative algorithms. In addition, to reduce the computational cost and memory requirements, the necessary optimizations are suggested for the time-consuming module, which include two-dimensional FFT/IFFT and the plural calculation. Eventually, the TFDC hardware architecture is adopted for hardware design of real-time image restoration system. The result proves that, the TFDC hardware architecture and its optimizations can be applied to image restoration algorithms based on TFDC, with good algorithm commonality, hardware realizability and high efficiency.
Is It Ethical for Patents to Be Issued for the Computer Algorithms that Affect Course Management Systems for Distance Learning?

ERIC Educational Resources Information Center

Moreau, Nancy

2008-01-01

This article discusses the impact of patents for computer algorithms in course management systems. Referring to historical documents and court cases, the positive and negative aspects of software patents are presented. The key argument is the accessibility to algorithms comprising a course management software program such as Blackboard. The…
Minimum time acceleration of aircraft turbofan engines by using an algorithm based on nonlinear programming

NASA Technical Reports Server (NTRS)

Teren, F.

1977-01-01

Minimum time accelerations of aircraft turbofan engines are presented. The calculation of these accelerations was made by using a piecewise linear engine model, and an algorithm based on nonlinear programming. Use of this model and algorithm allows such trajectories to be readily calculated on a digital computer with a minimal expenditure of computer time.
LAWS simulation: Sampling strategies and wind computation algorithms

NASA Technical Reports Server (NTRS)

Emmitt, G. D. A.; Wood, S. A.; Houston, S. H.

1989-01-01

In general, work has continued on developing and evaluating algorithms designed to manage the Laser Atmospheric Wind Sounder (LAWS) lidar pulses and to compute the horizontal wind vectors from the line-of-sight (LOS) measurements. These efforts fall into three categories: Improvements to the shot management and multi-pair algorithms (SMA/MPA); observing system simulation experiments; and ground-based simulations of LAWS.
Algorithms for computing solvents of unilateral second-order matrix polynomials over prime finite fields using lambda-matrices

NASA Astrophysics Data System (ADS)

Burtyka, Filipp

2018-01-01

The paper considers algorithms for finding diagonalizable and non-diagonalizable roots (so called solvents) of monic arbitrary unilateral second-order matrix polynomial over prime finite field. These algorithms are based on polynomial matrices (lambda-matrices). This is an extension of existing general methods for computing solvents of matrix polynomials over field of complex numbers. We analyze how techniques for complex numbers can be adapted for finite field and estimate asymptotic complexity of the obtained algorithms.
Quantum factorization of 143 on a dipolar-coupling nuclear magnetic resonance system.

PubMed

Xu, Nanyang; Zhu, Jing; Lu, Dawei; Zhou, Xianyi; Peng, Xinhua; Du, Jiangfeng

2012-03-30

Quantum algorithms could be much faster than classical ones in solving the factoring problem. Adiabatic quantum computation for this is an alternative approach other than Shor's algorithm. Here we report an improved adiabatic factoring algorithm and its experimental realization to factor the number 143 on a liquid-crystal NMR quantum processor with dipole-dipole couplings. We believe this to be the largest number factored in quantum-computation realizations, which shows the practical importance of adiabatic quantum algorithms.
Parallel Architectures and Parallel Algorithms for Integrated Vision Systems. Ph.D. Thesis

NASA Technical Reports Server (NTRS)

Choudhary, Alok Nidhi

1989-01-01

Computer vision is regarded as one of the most complex and computationally intensive problems. An integrated vision system (IVS) is a system that uses vision algorithms from all levels of processing to perform for a high level application (e.g., object recognition). An IVS normally involves algorithms from low level, intermediate level, and high level vision. Designing parallel architectures for vision systems is of tremendous interest to researchers. Several issues are addressed in parallel architectures and parallel algorithms for integrated vision systems.
A Fast Algorithm for the Convolution of Functions with Compact Support Using Fourier Extensions

DOE PAGES

Xu, Kuan; Austin, Anthony P.; Wei, Ke

2017-12-21

In this paper, we present a new algorithm for computing the convolution of two compactly supported functions. The algorithm approximates the functions to be convolved using Fourier extensions and then uses the fast Fourier transform to efficiently compute Fourier extension approximations to the pieces of the result. Finally, the complexity of the algorithm is O(N(log N) 2), where N is the number of degrees of freedom used in each of the Fourier extensions.
A Fast Algorithm for the Convolution of Functions with Compact Support Using Fourier Extensions

DOE Office of Scientific and Technical Information (OSTI.GOV)

Xu, Kuan; Austin, Anthony P.; Wei, Ke

In this paper, we present a new algorithm for computing the convolution of two compactly supported functions. The algorithm approximates the functions to be convolved using Fourier extensions and then uses the fast Fourier transform to efficiently compute Fourier extension approximations to the pieces of the result. Finally, the complexity of the algorithm is O(N(log N) 2), where N is the number of degrees of freedom used in each of the Fourier extensions.
Parallel O(log n) algorithms for open- and closed-chain rigid multibody systems based on a new mass matrix factorization technique

NASA Technical Reports Server (NTRS)

Fijany, Amir

1993-01-01

In this paper, parallel O(log n) algorithms for computation of rigid multibody dynamics are developed. These parallel algorithms are derived by parallelization of new O(n) algorithms for the problem. The underlying feature of these O(n) algorithms is a drastically different strategy for decomposition of interbody force which leads to a new factorization of the mass matrix (M). Specifically, it is shown that a factorization of the inverse of the mass matrix in the form of the Schur Complement is derived as M(exp -1) = C - B(exp *)A(exp -1)B, wherein matrices C, A, and B are block tridiagonal matrices. The new O(n) algorithm is then derived as a recursive implementation of this factorization of M(exp -1). For the closed-chain systems, similar factorizations and O(n) algorithms for computation of Operational Space Mass Matrix lambda and its inverse lambda(exp -1) are also derived. It is shown that these O(n) algorithms are strictly parallel, that is, they are less efficient than other algorithms for serial computation of the problem. But, to our knowledge, they are the only known algorithms that can be parallelized and that lead to both time- and processor-optimal parallel algorithms for the problem, i.e., parallel O(log n) algorithms with O(n) processors. The developed parallel algorithms, in addition to their theoretical significance, are also practical from an implementation point of view due to their simple architectural requirements.
A study of real-time computer graphic display technology for aeronautical applications

NASA Technical Reports Server (NTRS)

Rajala, S. A.

1981-01-01

The development, simulation, and testing of an algorithm for anti-aliasing vector drawings is discussed. The pseudo anti-aliasing line drawing algorithm is an extension to Bresenham's algorithm for computer control of a digital plotter. The algorithm produces a series of overlapping line segments where the display intensity shifts from one segment to the other in this overlap (transition region). In this algorithm the length of the overlap and the intensity shift are essentially constants because the transition region is an aid to the eye in integrating the segments into a single smooth line.
Design requirements and development of an airborne descent path definition algorithm for time navigation

NASA Technical Reports Server (NTRS)

Izumi, K. H.; Thompson, J. L.; Groce, J. L.; Schwab, R. W.

1986-01-01

The design requirements for a 4D path definition algorithm are described. These requirements were developed for the NASA ATOPS as an extension of the Local Flow Management/Profile Descent algorithm. They specify the processing flow, functional and data architectures, and system input requirements, and recommended the addition of a broad path revision (reinitialization) function capability. The document also summarizes algorithm design enhancements and the implementation status of the algorithm on an in-house PDP-11/70 computer. Finally, the requirements for the pilot-computer interfaces, the lateral path processor, and guidance and steering function are described.
Efficient FFT Algorithm for Psychoacoustic Model of the MPEG-4 AAC

NASA Astrophysics Data System (ADS)

Lee, Jae-Seong; Lee, Chang-Joon; Park, Young-Cheol; Youn, Dae-Hee

This paper proposes an efficient FFT algorithm for the Psycho-Acoustic Model (PAM) of MPEG-4 AAC. The proposed algorithm synthesizes FFT coefficients using MDCT and MDST coefficients through circular convolution. The complexity of the MDCT and MDST coefficients is approximately half of the original FFT. We also design a new PAM based on the proposed FFT algorithm, which has 15% lower computational complexity than the original PAM without degradation of sound quality. Subjective as well as objective test results are presented to confirm the efficiency of the proposed FFT computation algorithm and the PAM.
Efficient Parallel Algorithm For Direct Numerical Simulation of Turbulent Flows

NASA Technical Reports Server (NTRS)

Moitra, Stuti; Gatski, Thomas B.

1997-01-01

A distributed algorithm for a high-order-accurate finite-difference approach to the direct numerical simulation (DNS) of transition and turbulence in compressible flows is described. This work has two major objectives. The first objective is to demonstrate that parallel and distributed-memory machines can be successfully and efficiently used to solve computationally intensive and input/output intensive algorithms of the DNS class. The second objective is to show that the computational complexity involved in solving the tridiagonal systems inherent in the DNS algorithm can be reduced by algorithm innovations that obviate the need to use a parallelized tridiagonal solver.
Low-cost autonomous perceptron neural network inspired by quantum computation

NASA Astrophysics Data System (ADS)

Zidan, Mohammed; Abdel-Aty, Abdel-Haleem; El-Sadek, Alaa; Zanaty, E. A.; Abdel-Aty, Mahmoud

2017-11-01

Achieving low cost learning with reliable accuracy is one of the important goals to achieve intelligent machines to save time, energy and perform learning process over limited computational resources machines. In this paper, we propose an efficient algorithm for a perceptron neural network inspired by quantum computing composite from a single neuron to classify inspirable linear applications after a single training iteration O(1). The algorithm is applied over a real world data set and the results are outer performs the other state-of-the art algorithms.
Computing Bounds on Resource Levels for Flexible Plans

NASA Technical Reports Server (NTRS)

Muscvettola, Nicola; Rijsman, David

2009-01-01

A new algorithm efficiently computes the tightest exact bound on the levels of resources induced by a flexible activity plan (see figure). Tightness of bounds is extremely important for computations involved in planning because tight bounds can save potentially exponential amounts of search (through early backtracking and detection of solutions), relative to looser bounds. The bound computed by the new algorithm, denoted the resource-level envelope, constitutes the measure of maximum and minimum consumption of resources at any time for all fixed-time schedules in the flexible plan. At each time, the envelope guarantees that there are two fixed-time instantiations one that produces the minimum level and one that produces the maximum level. Therefore, the resource-level envelope is the tightest possible resource-level bound for a flexible plan because any tighter bound would exclude the contribution of at least one fixed-time schedule. If the resource- level envelope can be computed efficiently, one could substitute looser bounds that are currently used in the inner cores of constraint-posting scheduling algorithms, with the potential for great improvements in performance. What is needed to reduce the cost of computation is an algorithm, the measure of complexity of which is no greater than a low-degree polynomial in N (where N is the number of activities). The new algorithm satisfies this need. In this algorithm, the computation of resource-level envelopes is based on a novel combination of (1) the theory of shortest paths in the temporal-constraint network for the flexible plan and (2) the theory of maximum flows for a flow network derived from the temporal and resource constraints. The measure of asymptotic complexity of the algorithm is O(N O(maxflow(N)), where O(x) denotes an amount of computing time or a number of arithmetic operations proportional to a number of the order of x and O(maxflow(N)) is the measure of complexity (and thus of cost) of a maximumflow algorithm applied to an auxiliary flow network of 2N nodes. The algorithm is believed to be efficient in practice; experimental analysis shows the practical cost of maxflow to be as low as O(N1.5). The algorithm could be enhanced following at least two approaches. In the first approach, incremental subalgorithms for the computation of the envelope could be developed. By use of temporal scanning of the events in the temporal network, it may be possible to significantly reduce the size of the networks on which it is necessary to run the maximum-flow subalgorithm, thereby significantly reducing the time required for envelope calculation. In the second approach, the practical effectiveness of resource envelopes in the inner loops of search algorithms could be tested for multi-capacity resource scheduling. This testing would include inner-loop backtracking and termination tests and variable and value-ordering heuristics that exploit the properties of resource envelopes more directly.

Recommendations for dose calculations of lung cancer treatment plans treated with stereotactic ablative body radiotherapy (SABR)

NASA Astrophysics Data System (ADS)

Devpura, S.; Siddiqui, M. S.; Chen, D.; Liu, D.; Li, H.; Kumar, S.; Gordon, J.; Ajlouni, M.; Movsas, B.; Chetty, I. J.

2014-03-01

The purpose of this study was to systematically evaluate dose distributions computed with 5 different dose algorithms for patients with lung cancers treated using stereotactic ablative body radiotherapy (SABR). Treatment plans for 133 lung cancer patients, initially computed with a 1D-pencil beam (equivalent-path-length, EPL-1D) algorithm, were recalculated with 4 other algorithms commissioned for treatment planning, including 3-D pencil-beam (EPL-3D), anisotropic analytical algorithm (AAA), collapsed cone convolution superposition (CCC), and Monte Carlo (MC). The plan prescription dose was 48 Gy in 4 fractions normalized to the 95% isodose line. Tumors were classified according to location: peripheral tumors surrounded by lung (lung-island, N=39), peripheral tumors attached to the rib-cage or chest wall (lung-wall, N=44), and centrally-located tumors (lung-central, N=50). Relative to the EPL-1D algorithm, PTV D95 and mean dose values computed with the other 4 algorithms were lowest for "lung-island" tumors with smallest field sizes (3-5 cm). On the other hand, the smallest differences were noted for lung-central tumors treated with largest field widths (7-10 cm). Amongst all locations, dose distribution differences were most strongly correlated with tumor size for lung-island tumors. For most cases, convolution/superposition and MC algorithms were in good agreement. Mean lung dose (MLD) values computed with the EPL-1D algorithm were highly correlated with that of the other algorithms (correlation coefficient =0.99). The MLD values were found to be ~10% lower for small lung-island tumors with the model-based (conv/superposition and MC) vs. the correction-based (pencil-beam) algorithms with the model-based algorithms predicting greater low dose spread within the lungs. This study suggests that pencil beam algorithms should be avoided for lung SABR planning. For the most challenging cases, small tumors surrounded entirely by lung tissue (lung-island type), a Monte-Carlo-based algorithm may be warranted.
Computational complexity of ecological and evolutionary spatial dynamics

PubMed Central

Ibsen-Jensen, Rasmus; Chatterjee, Krishnendu; Nowak, Martin A.

2015-01-01

There are deep, yet largely unexplored, connections between computer science and biology. Both disciplines examine how information proliferates in time and space. Central results in computer science describe the complexity of algorithms that solve certain classes of problems. An algorithm is deemed efficient if it can solve a problem in polynomial time, which means the running time of the algorithm is a polynomial function of the length of the input. There are classes of harder problems for which the fastest possible algorithm requires exponential time. Another criterion is the space requirement of the algorithm. There is a crucial distinction between algorithms that can find a solution, verify a solution, or list several distinct solutions in given time and space. The complexity hierarchy that is generated in this way is the foundation of theoretical computer science. Precise complexity results can be notoriously difficult. The famous question whether polynomial time equals nondeterministic polynomial time (i.e., P = NP) is one of the hardest open problems in computer science and all of mathematics. Here, we consider simple processes of ecological and evolutionary spatial dynamics. The basic question is: What is the probability that a new invader (or a new mutant) will take over a resident population? We derive precise complexity results for a variety of scenarios. We therefore show that some fundamental questions in this area cannot be answered by simple equations (assuming that P is not equal to NP). PMID:26644569
Real-time implementation of optimized maximum noise fraction transform for feature extraction of hyperspectral images

NASA Astrophysics Data System (ADS)

Wu, Yuanfeng; Gao, Lianru; Zhang, Bing; Zhao, Haina; Li, Jun

2014-01-01

We present a parallel implementation of the optimized maximum noise fraction (G-OMNF) transform algorithm for feature extraction of hyperspectral images on commodity graphics processing units (GPUs). The proposed approach explored the algorithm data-level concurrency and optimized the computing flow. We first defined a three-dimensional grid, in which each thread calculates a sub-block data to easily facilitate the spatial and spectral neighborhood data searches in noise estimation, which is one of the most important steps involved in OMNF. Then, we optimized the processing flow and computed the noise covariance matrix before computing the image covariance matrix to reduce the original hyperspectral image data transmission. These optimization strategies can greatly improve the computing efficiency and can be applied to other feature extraction algorithms. The proposed parallel feature extraction algorithm was implemented on an Nvidia Tesla GPU using the compute unified device architecture and basic linear algebra subroutines library. Through the experiments on several real hyperspectral images, our GPU parallel implementation provides a significant speedup of the algorithm compared with the CPU implementation, especially for highly data parallelizable and arithmetically intensive algorithm parts, such as noise estimation. In order to further evaluate the effectiveness of G-OMNF, we used two different applications: spectral unmixing and classification for evaluation. Considering the sensor scanning rate and the data acquisition time, the proposed parallel implementation met the on-board real-time feature extraction.
A Scalable O(N) Algorithm for Large-Scale Parallel First-Principles Molecular Dynamics Simulations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Osei-Kuffuor, Daniel; Fattebert, Jean-Luc

2014-01-01

Traditional algorithms for first-principles molecular dynamics (FPMD) simulations only gain a modest capability increase from current petascale computers, due to their O(N 3) complexity and their heavy use of global communications. To address this issue, we are developing a truly scalable O(N) complexity FPMD algorithm, based on density functional theory (DFT), which avoids global communications. The computational model uses a general nonorthogonal orbital formulation for the DFT energy functional, which requires knowledge of selected elements of the inverse of the associated overlap matrix. We present a scalable algorithm for approximately computing selected entries of the inverse of the overlap matrix,more » based on an approximate inverse technique, by inverting local blocks corresponding to principal submatrices of the global overlap matrix. The new FPMD algorithm exploits sparsity and uses nearest neighbor communication to provide a computational scheme capable of extreme scalability. Accuracy is controlled by the mesh spacing of the finite difference discretization, the size of the localization regions in which the electronic orbitals are confined, and a cutoff beyond which the entries of the overlap matrix can be omitted when computing selected entries of its inverse. We demonstrate the algorithm's excellent parallel scaling for up to O(100K) atoms on O(100K) processors, with a wall-clock time of O(1) minute per molecular dynamics time step.« less
Interactive collision detection for deformable models using streaming AABBs.

PubMed

Zhang, Xinyu; Kim, Young J

2007-01-01

We present an interactive and accurate collision detection algorithm for deformable, polygonal objects based on the streaming computational model. Our algorithm can detect all possible pairwise primitive-level intersections between two severely deforming models at highly interactive rates. In our streaming computational model, we consider a set of axis aligned bounding boxes (AABBs) that bound each of the given deformable objects as an input stream and perform massively-parallel pairwise, overlapping tests onto the incoming streams. As a result, we are able to prevent performance stalls in the streaming pipeline that can be caused by expensive indexing mechanism required by bounding volume hierarchy-based streaming algorithms. At runtime, as the underlying models deform over time, we employ a novel, streaming algorithm to update the geometric changes in the AABB streams. Moreover, in order to get only the computed result (i.e., collision results between AABBs) without reading back the entire output streams, we propose a streaming en/decoding strategy that can be performed in a hierarchical fashion. After determining overlapped AABBs, we perform a primitive-level (e.g., triangle) intersection checking on a serial computational model such as CPUs. We implemented the entire pipeline of our algorithm using off-the-shelf graphics processors (GPUs), such as nVIDIA GeForce 7800 GTX, for streaming computations, and Intel Dual Core 3.4G processors for serial computations. We benchmarked our algorithm with different models of varying complexities, ranging from 15K up to 50K triangles, under various deformation motions, and the timings were obtained as 30 approximately 100 FPS depending on the complexity of models and their relative configurations. Finally, we made comparisons with a well-known GPU-based collision detection algorithm, CULLIDE [4] and observed about three times performance improvement over the earlier approach. We also made comparisons with a SW-based AABB culling algorithm [2] and observed about two times improvement.
Architecutres, Models, Algorithms, and Software Tools for Configurable Computing

DTIC Science & Technology

2000-03-06

and J.G. Nash. The gated interconnection network for dynamic programming. Plenum, 1988 . [18] Ju wook Jang, Heonchul Park, and Viktor K. Prasanna. A ...Sep. 1997. [2] C. Ebeling, D. C. Cronquist , P. Franklin and C. Fisher, "RaPiD - A configurable computing architecture for compute-intensive...ABSTRACT (Maximum 200 words) The Models, Algorithms, and Architectures for Reconfigurable Computing (MAARC) project developed a sound framework for
Computational path planner for product assembly in complex environments

NASA Astrophysics Data System (ADS)

Shang, Wei; Liu, Jianhua; Ning, Ruxin; Liu, Mi

2013-03-01

Assembly path planning is a crucial problem in assembly related design and manufacturing processes. Sampling based motion planning algorithms are used for computational assembly path planning. However, the performance of such algorithms may degrade much in environments with complex product structure, narrow passages or other challenging scenarios. A computational path planner for automatic assembly path planning in complex 3D environments is presented. The global planning process is divided into three phases based on the environment and specific algorithms are proposed and utilized in each phase to solve the challenging issues. A novel ray test based stochastic collision detection method is proposed to evaluate the intersection between two polyhedral objects. This method avoids fake collisions in conventional methods and degrades the geometric constraint when a part has to be removed with surface contact with other parts. A refined history based rapidly-exploring random tree (RRT) algorithm which bias the growth of the tree based on its planning history is proposed and employed in the planning phase where the path is simple but the space is highly constrained. A novel adaptive RRT algorithm is developed for the path planning problem with challenging scenarios and uncertain environment. With extending values assigned on each tree node and extending schemes applied, the tree can adapts its growth to explore complex environments more efficiently. Experiments on the key algorithms are carried out and comparisons are made between the conventional path planning algorithms and the presented ones. The comparing results show that based on the proposed algorithms, the path planner can compute assembly path in challenging complex environments more efficiently and with higher success. This research provides the references to the study of computational assembly path planning under complex environments.
Computing the Density Matrix in Electronic Structure Theory on Graphics Processing Units.

PubMed

Cawkwell, M J; Sanville, E J; Mniszewski, S M; Niklasson, Anders M N

2012-11-13

The self-consistent solution of a Schrödinger-like equation for the density matrix is a critical and computationally demanding step in quantum-based models of interatomic bonding. This step was tackled historically via the diagonalization of the Hamiltonian. We have investigated the performance and accuracy of the second-order spectral projection (SP2) algorithm for the computation of the density matrix via a recursive expansion of the Fermi operator in a series of generalized matrix-matrix multiplications. We demonstrate that owing to its simplicity, the SP2 algorithm [Niklasson, A. M. N. Phys. Rev. B2002, 66, 155115] is exceptionally well suited to implementation on graphics processing units (GPUs). The performance in double and single precision arithmetic of a hybrid GPU/central processing unit (CPU) and full GPU implementation of the SP2 algorithm exceed those of a CPU-only implementation of the SP2 algorithm and traditional matrix diagonalization when the dimensions of the matrices exceed about 2000 × 2000. Padding schemes for arrays allocated in the GPU memory that optimize the performance of the CUBLAS implementations of the level 3 BLAS DGEMM and SGEMM subroutines for generalized matrix-matrix multiplications are described in detail. The analysis of the relative performance of the hybrid CPU/GPU and full GPU implementations indicate that the transfer of arrays between the GPU and CPU constitutes only a small fraction of the total computation time. The errors measured in the self-consistent density matrices computed using the SP2 algorithm are generally smaller than those measured in matrices computed via diagonalization. Furthermore, the errors in the density matrices computed using the SP2 algorithm do not exhibit any dependence of system size, whereas the errors increase linearly with the number of orbitals when diagonalization is employed.
Cloud Computing and Its Applications in GIS

NASA Astrophysics Data System (ADS)

Kang, Cao

2011-12-01

Cloud computing is a novel computing paradigm that offers highly scalable and highly available distributed computing services. The objectives of this research are to: 1. analyze and understand cloud computing and its potential for GIS; 2. discover the feasibilities of migrating truly spatial GIS algorithms to distributed computing infrastructures; 3. explore a solution to host and serve large volumes of raster GIS data efficiently and speedily. These objectives thus form the basis for three professional articles. The first article is entitled "Cloud Computing and Its Applications in GIS". This paper introduces the concept, structure, and features of cloud computing. Features of cloud computing such as scalability, parallelization, and high availability make it a very capable computing paradigm. Unlike High Performance Computing (HPC), cloud computing uses inexpensive commodity computers. The uniform administration systems in cloud computing make it easier to use than GRID computing. Potential advantages of cloud-based GIS systems such as lower barrier to entry are consequently presented. Three cloud-based GIS system architectures are proposed: public cloud- based GIS systems, private cloud-based GIS systems and hybrid cloud-based GIS systems. Public cloud-based GIS systems provide the lowest entry barriers for users among these three architectures, but their advantages are offset by data security and privacy related issues. Private cloud-based GIS systems provide the best data protection, though they have the highest entry barriers. Hybrid cloud-based GIS systems provide a compromise between these extremes. The second article is entitled "A cloud computing algorithm for the calculation of Euclidian distance for raster GIS". Euclidean distance is a truly spatial GIS algorithm. Classical algorithms such as the pushbroom and growth ring techniques require computational propagation through the entire raster image, which makes it incompatible with the distributed nature of cloud computing. This paper presents a parallel Euclidean distance algorithm that works seamlessly with the distributed nature of cloud computing infrastructures. The mechanism of this algorithm is to subdivide a raster image into sub-images and wrap them with a one pixel deep edge layer of individually computed distance information. Each sub-image is then processed by a separate node, after which the resulting sub-images are reassembled into the final output. It is shown that while any rectangular sub-image shape can be used, those approximating squares are computationally optimal. This study also serves as a demonstration of this subdivide and layer-wrap strategy, which would enable the migration of many truly spatial GIS algorithms to cloud computing infrastructures. However, this research also indicates that certain spatial GIS algorithms such as cost distance cannot be migrated by adopting this mechanism, which presents significant challenges for the development of cloud-based GIS systems. The third article is entitled "A Distributed Storage Schema for Cloud Computing based Raster GIS Systems". This paper proposes a NoSQL Database Management System (NDDBMS) based raster GIS data storage schema. NDDBMS has good scalability and is able to use distributed commodity computers, which make it superior to Relational Database Management Systems (RDBMS) in a cloud computing environment. In order to provide optimized data service performance, the proposed storage schema analyzes the nature of commonly used raster GIS data sets. It discriminates two categories of commonly used data sets, and then designs corresponding data storage models for both categories. As a result, the proposed storage schema is capable of hosting and serving enormous volumes of raster GIS data speedily and efficiently on cloud computing infrastructures. In addition, the scheme also takes advantage of the data compression characteristics of Quadtrees, thus promoting efficient data storage. Through this assessment of cloud computing technology, the exploration of the challenges and solutions to the migration of GIS algorithms to cloud computing infrastructures, and the examination of strategies for serving large amounts of GIS data in a cloud computing infrastructure, this dissertation lends support to the feasibility of building a cloud-based GIS system. However, there are still challenges that need to be addressed before a full-scale functional cloud-based GIS system can be successfully implemented. (Abstract shortened by UMI.)
Workflow of the Grover algorithm simulation incorporating CUDA and GPGPU

NASA Astrophysics Data System (ADS)

Lu, Xiangwen; Yuan, Jiabin; Zhang, Weiwei

2013-09-01

The Grover quantum search algorithm, one of only a few representative quantum algorithms, can speed up many classical algorithms that use search heuristics. No true quantum computer has yet been developed. For the present, simulation is one effective means of verifying the search algorithm. In this work, we focus on the simulation workflow using a compute unified device architecture (CUDA). Two simulation workflow schemes are proposed. These schemes combine the characteristics of the Grover algorithm and the parallelism of general-purpose computing on graphics processing units (GPGPU). We also analyzed the optimization of memory space and memory access from this perspective. We implemented four programs on CUDA to evaluate the performance of schemes and optimization. Through experimentation, we analyzed the organization of threads suited to Grover algorithm simulations, compared the storage costs of the four programs, and validated the effectiveness of optimization. Experimental results also showed that the distinguished program on CUDA outperformed the serial program of libquantum on a CPU with a speedup of up to 23 times (12 times on average), depending on the scale of the simulation.
Cook-Levin Theorem Algorithmic-Reducibility/Completeness = Wilson Renormalization-(Semi)-Group Fixed-Points; ``Noise''-Induced Phase-Transitions (NITs) to Accelerate Algorithmics (``NIT-Picking'') REPLACING CRUTCHES!!!: Models: Turing-machine, finite-state-models, finite-automata

NASA Astrophysics Data System (ADS)

Young, Frederic; Siegel, Edward

Cook-Levin theorem theorem algorithmic computational-complexity(C-C) algorithmic-equivalence reducibility/completeness equivalence to renormalization-(semi)-group phase-transitions critical-phenomena statistical-physics universality-classes fixed-points, is exploited via Siegel FUZZYICS =CATEGORYICS = ANALOGYICS =PRAGMATYICS/CATEGORY-SEMANTICS ONTOLOGY COGNITION ANALYTICS-Aristotle ``square-of-opposition'' tabular list-format truth-table matrix analytics predicts and implements ''noise''-induced phase-transitions (NITs) to accelerate versus to decelerate Harel [Algorithmics (1987)]-Sipser[Intro.Thy. Computation(`97)] algorithmic C-C: ''NIT-picking''(!!!), to optimize optimization-problems optimally(OOPO). Versus iso-''noise'' power-spectrum quantitative-only amplitude/magnitude-only variation stochastic-resonance, ''NIT-picking'' is ''noise'' power-spectrum QUALitative-type variation via quantitative critical-exponents variation. Computer-''science''/SEANCE algorithmic C-C models: Turing-machine, finite-state-models, finite-automata,..., discrete-maths graph-theory equivalence to physics Feynman-diagrams are identified as early-days once-workable valid but limiting IMPEDING CRUTCHES(!!!), ONLY IMPEDE latter-days new-insights!!!
A GENERAL ALGORITHM FOR THE CONSTRUCTION OF CONTOUR PLOTS

NASA Technical Reports Server (NTRS)

Johnson, W.

1994-01-01

The graphical presentation of experimentally or theoretically generated data sets frequently involves the construction of contour plots. A general computer algorithm has been developed for the construction of contour plots. The algorithm provides for efficient and accurate contouring with a modular approach which allows flexibility in modifying the algorithm for special applications. The algorithm accepts as input data values at a set of points irregularly distributed over a plane. The algorithm is based on an interpolation scheme in which the points in the plane are connected by straight line segments to form a set of triangles. In general, the data is smoothed using a least-squares-error fit of the data to a bivariate polynomial. To construct the contours, interpolation along the edges of the triangles is performed, using the bivariable polynomial if data smoothing was performed. Once the contour points have been located, the contour may be drawn. This program is written in FORTRAN IV for batch execution and has been implemented on an IBM 360 series computer with a central memory requirement of approximately 100K of 8-bit bytes. This computer algorithm was developed in 1981.
New Parallel Algorithms for Landscape Evolution Model

NASA Astrophysics Data System (ADS)

Jin, Y.; Zhang, H.; Shi, Y.

2017-12-01

Most landscape evolution models (LEM) developed in the last two decades solve the diffusion equation to simulate the transportation of surface sediments. This numerical approach is difficult to parallelize due to the computation of drainage area for each node, which needs huge amount of communication if run in parallel. In order to overcome this difficulty, we developed two parallel algorithms for LEM with a stream net. One algorithm handles the partition of grid with traditional methods and applies an efficient global reduction algorithm to do the computation of drainage areas and transport rates for the stream net; the other algorithm is based on a new partition algorithm, which partitions the nodes in catchments between processes first, and then partitions the cells according to the partition of nodes. Both methods focus on decreasing communication between processes and take the advantage of massive computing techniques, and numerical experiments show that they are both adequate to handle large scale problems with millions of cells. We implemented the two algorithms in our program based on the widely used finite element library deal.II, so that it can be easily coupled with ASPECT.
A lateral guidance algorithm to reduce the post-aerobraking burn requirements for a lift-modulated orbital transfer vehicle. M.S. Thesis

NASA Technical Reports Server (NTRS)

Herman, G. C.

1986-01-01

A lateral guidance algorithm which controls the location of the line of intersection between the actual and desired orbital planes (the hinge line) is developed for the aerobraking phase of a lift-modulated orbital transfer vehicle. The on-board targeting algorithm associated with this lateral guidance algorithm is simple and concise which is very desirable since computation time and space are limited on an on-board flight computer. A variational equation which describes the movement of the hinge line is derived. Simple relationships between the plane error, the desired hinge line position, the position out-of-plane error, and the velocity out-of-plane error are found. A computer simulation is developed to test the lateral guidance algorithm for a variety of operating conditions. The algorithm does reduce the total burn magnitude needed to achieve the desired orbit by allowing the plane correction and perigee-raising burn to be combined in a single maneuver. The algorithm performs well under vacuum perigee dispersions, pot-hole density disturbance, and thick atmospheres. The results for many different operating conditions are presented.
Renewable energy in electric utility capacity planning: a decomposition approach with application to a Mexican utility

DOE Office of Scientific and Technical Information (OSTI.GOV)

Staschus, K.

1985-01-01

In this dissertation, efficient algorithms for electric-utility capacity expansion planning with renewable energy are developed. The algorithms include a deterministic phase that quickly finds a near-optimal expansion plan using derating and a linearized approximation to the time-dependent availability of nondispatchable energy sources. A probabilistic second phase needs comparatively few computer-time consuming probabilistic simulation iterations to modify this solution towards the optimal expansion plan. For the deterministic first phase, two algorithms, based on a Lagrangian Dual decomposition and a Generalized Benders Decomposition, are developed. The probabilistic second phase uses a Generalized Benders Decomposition approach. Extensive computational tests of the algorithms aremore » reported. Among the deterministic algorithms, the one based on Lagrangian Duality proves fastest. The two-phase approach is shown to save up to 80% in computing time as compared to a purely probabilistic algorithm. The algorithms are applied to determine the optimal expansion plan for the Tijuana-Mexicali subsystem of the Mexican electric utility system. A strong recommendation to push conservation programs in the desert city of Mexicali results from this implementation.« less
SAMSAN- MODERN NUMERICAL METHODS FOR CLASSICAL SAMPLED SYSTEM ANALYSIS

NASA Technical Reports Server (NTRS)

Frisch, H. P.

1994-01-01

SAMSAN was developed to aid the control system analyst by providing a self consistent set of computer algorithms that support large order control system design and evaluation studies, with an emphasis placed on sampled system analysis. Control system analysts have access to a vast array of published algorithms to solve an equally large spectrum of controls related computational problems. The analyst usually spends considerable time and effort bringing these published algorithms to an integrated operational status and often finds them less general than desired. SAMSAN reduces the burden on the analyst by providing a set of algorithms that have been well tested and documented, and that can be readily integrated for solving control system problems. Algorithm selection for SAMSAN has been biased toward numerical accuracy for large order systems with computational speed and portability being considered important but not paramount. In addition to containing relevant subroutines from EISPAK for eigen-analysis and from LINPAK for the solution of linear systems and related problems, SAMSAN contains the following not so generally available capabilities: 1) Reduction of a real non-symmetric matrix to block diagonal form via a real similarity transformation matrix which is well conditioned with respect to inversion, 2) Solution of the generalized eigenvalue problem with balancing and grading, 3) Computation of all zeros of the determinant of a matrix of polynomials, 4) Matrix exponentiation and the evaluation of integrals involving the matrix exponential, with option to first block diagonalize, 5) Root locus and frequency response for single variable transfer functions in the S, Z, and W domains, 6) Several methods of computing zeros for linear systems, and 7) The ability to generate documentation "on demand". All matrix operations in the SAMSAN algorithms assume non-symmetric matrices with real double precision elements. There is no fixed size limit on any matrix in any SAMSAN algorithm; however, it is generally agreed by experienced users, and in the numerical error analysis literature, that computation with non-symmetric matrices of order greater than about 200 should be avoided or treated with extreme care. SAMSAN attempts to support the needs of application oriented analysis by providing: 1) a methodology with unlimited growth potential, 2) a methodology to insure that associated documentation is current and available "on demand", 3) a foundation of basic computational algorithms that most controls analysis procedures are based upon, 4) a set of check out and evaluation programs which demonstrate usage of the algorithms on a series of problems which are structured to expose the limits of each algorithm's applicability, and 5) capabilities which support both a priori and a posteriori error analysis for the computational algorithms provided. The SAMSAN algorithms are coded in FORTRAN 77 for batch or interactive execution and have been implemented on a DEC VAX computer under VMS 4.7. An effort was made to assure that the FORTRAN source code was portable and thus SAMSAN may be adaptable to other machine environments. The documentation is included on the distribution tape or can be purchased separately at the price below. SAMSAN version 2.0 was developed in 1982 and updated to version 3.0 in 1988.
Lower bound on the time complexity of local adiabatic evolution

NASA Astrophysics Data System (ADS)

Chen, Zhenghao; Koh, Pang Wei; Zhao, Yan

2006-11-01

The adiabatic theorem of quantum physics has been, in recent times, utilized in the design of local search quantum algorithms, and has been proven to be equivalent to standard quantum computation, that is, the use of unitary operators [D. Aharonov in Proceedings of the 45th Annual Symposium on the Foundations of Computer Science, 2004, Rome, Italy (IEEE Computer Society Press, New York, 2004), pp. 42-51]. Hence, the study of the time complexity of adiabatic evolution algorithms gives insight into the computational power of quantum algorithms. In this paper, we present two different approaches of evaluating the time complexity for local adiabatic evolution using time-independent parameters, thus providing effective tests (not requiring the evaluation of the entire time-dependent gap function) for the time complexity of newly developed algorithms. We further illustrate our tests by displaying results from the numerical simulation of some problems, viz. specially modified instances of the Hamming weight problem.
Computing the Feasible Spaces of Optimal Power Flow Problems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Molzahn, Daniel K.

The solution to an optimal power flow (OPF) problem provides a minimum cost operating point for an electric power system. The performance of OPF solution techniques strongly depends on the problem’s feasible space. This paper presents an algorithm that is guaranteed to compute the entire feasible spaces of small OPF problems to within a specified discretization tolerance. Specifically, the feasible space is computed by discretizing certain of the OPF problem’s inequality constraints to obtain a set of power flow equations. All solutions to the power flow equations at each discretization point are obtained using the Numerical Polynomial Homotopy Continuation (NPHC)more » algorithm. To improve computational tractability, “bound tightening” and “grid pruning” algorithms use convex relaxations to preclude consideration of many discretization points that are infeasible for the OPF problem. Here, the proposed algorithm is used to generate the feasible spaces of two small test cases.« less
Approximate Computing Techniques for Iterative Graph Algorithms

DOE Office of Scientific and Technical Information (OSTI.GOV)

Panyala, Ajay R.; Subasi, Omer; Halappanavar, Mahantesh

Approximate computing enables processing of large-scale graphs by trading off quality for performance. Approximate computing techniques have become critical not only due to the emergence of parallel architectures but also the availability of large scale datasets enabling data-driven discovery. Using two prototypical graph algorithms, PageRank and community detection, we present several approximate computing heuristics to scale the performance with minimal loss of accuracy. We present several heuristics including loop perforation, data caching, incomplete graph coloring and synchronization, and evaluate their efficiency. We demonstrate performance improvements of up to 83% for PageRank and up to 450x for community detection, with lowmore » impact of accuracy for both the algorithms. We expect the proposed approximate techniques will enable scalable graph analytics on data of importance to several applications in science and their subsequent adoption to scale similar graph algorithms.« less
Image restoration for three-dimensional fluorescence microscopy using an orthonormal basis for efficient representation of depth-variant point-spread functions

PubMed Central

Patwary, Nurmohammed; Preza, Chrysanthe

2015-01-01

A depth-variant (DV) image restoration algorithm for wide field fluorescence microscopy, using an orthonormal basis decomposition of DV point-spread functions (PSFs), is investigated in this study. The efficient PSF representation is based on a previously developed principal component analysis (PCA), which is computationally intensive. We present an approach developed to reduce the number of DV PSFs required for the PCA computation, thereby making the PCA-based approach computationally tractable for thick samples. Restoration results from both synthetic and experimental images show consistency and that the proposed algorithm addresses efficiently depth-induced aberration using a small number of principal components. Comparison of the PCA-based algorithm with a previously-developed strata-based DV restoration algorithm demonstrates that the proposed method improves performance by 50% in terms of accuracy and simultaneously reduces the processing time by 64% using comparable computational resources. PMID:26504634

Computing the Feasible Spaces of Optimal Power Flow Problems

DOE PAGES

Molzahn, Daniel K.

2017-03-15

The solution to an optimal power flow (OPF) problem provides a minimum cost operating point for an electric power system. The performance of OPF solution techniques strongly depends on the problem’s feasible space. This paper presents an algorithm that is guaranteed to compute the entire feasible spaces of small OPF problems to within a specified discretization tolerance. Specifically, the feasible space is computed by discretizing certain of the OPF problem’s inequality constraints to obtain a set of power flow equations. All solutions to the power flow equations at each discretization point are obtained using the Numerical Polynomial Homotopy Continuation (NPHC)more » algorithm. To improve computational tractability, “bound tightening” and “grid pruning” algorithms use convex relaxations to preclude consideration of many discretization points that are infeasible for the OPF problem. Here, the proposed algorithm is used to generate the feasible spaces of two small test cases.« less
A Demons algorithm for image registration with locally adaptive regularization.

PubMed

Cahill, Nathan D; Noble, J Alison; Hawkes, David J

2009-01-01

Thirion's Demons is a popular algorithm for nonrigid image registration because of its linear computational complexity and ease of implementation. It approximately solves the diffusion registration problem by successively estimating force vectors that drive the deformation toward alignment and smoothing the force vectors by Gaussian convolution. In this article, we show how the Demons algorithm can be generalized to allow image-driven locally adaptive regularization in a manner that preserves both the linear complexity and ease of implementation of the original Demons algorithm. We show that the proposed algorithm exhibits lower target registration error and requires less computational effort than the original Demons algorithm on the registration of serial chest CT scans of patients with lung nodules.
Comparative analysis of different variants of the Uzawa algorithm in problems of the theory of elasticity for incompressible materials.

PubMed

Styopin, Nikita E; Vershinin, Anatoly V; Zingerman, Konstantin M; Levin, Vladimir A

2016-09-01

Different variants of the Uzawa algorithm are compared with one another. The comparison is performed for the case in which this algorithm is applied to large-scale systems of linear algebraic equations. These systems arise in the finite-element solution of the problems of elasticity theory for incompressible materials. A modification of the Uzawa algorithm is proposed. Computational experiments show that this modification improves the convergence of the Uzawa algorithm for the problems of solid mechanics. The results of computational experiments show that each variant of the Uzawa algorithm considered has its advantages and disadvantages and may be convenient in one case or another.
Low-complexity R-peak detection for ambulatory fetal monitoring.

PubMed

Rooijakkers, Michael J; Rabotti, Chiara; Oei, S Guid; Mischi, Massimo

2012-07-01

Non-invasive fetal health monitoring during pregnancy is becoming increasingly important because of the increasing number of high-risk pregnancies. Despite recent advances in signal-processing technology, which have enabled fetal monitoring during pregnancy using abdominal electrocardiogram (ECG) recordings, ubiquitous fetal health monitoring is still unfeasible due to the computational complexity of noise-robust solutions. In this paper, an ECG R-peak detection algorithm for ambulatory R-peak detection is proposed, as part of a fetal ECG detection algorithm. The proposed algorithm is optimized to reduce computational complexity, without reducing the R-peak detection performance compared to the existing R-peak detection schemes. Validation of the algorithm is performed on three manually annotated datasets. With a detection error rate of 0.23%, 1.32% and 9.42% on the MIT/BIH Arrhythmia and in-house maternal and fetal databases, respectively, the detection rate of the proposed algorithm is comparable to the best state-of-the-art algorithms, at a reduced computational complexity.
Evaluation of Semantic Web Technologies for Storing Computable Definitions of Electronic Health Records Phenotyping Algorithms.

PubMed

Papež, Václav; Denaxas, Spiros; Hemingway, Harry

2017-01-01

Electronic Health Records are electronic data generated during or as a byproduct of routine patient care. Structured, semi-structured and unstructured EHR offer researchers unprecedented phenotypic breadth and depth and have the potential to accelerate the development of precision medicine approaches at scale. A main EHR use-case is defining phenotyping algorithms that identify disease status, onset and severity. Phenotyping algorithms utilize diagnoses, prescriptions, laboratory tests, symptoms and other elements in order to identify patients with or without a specific trait. No common standardized, structured, computable format exists for storing phenotyping algorithms. The majority of algorithms are stored as human-readable descriptive text documents making their translation to code challenging due to their inherent complexity and hinders their sharing and re-use across the community. In this paper, we evaluate the two key Semantic Web Technologies, the Web Ontology Language and the Resource Description Framework, for enabling computable representations of EHR-driven phenotyping algorithms.
Algorithmic-Reducibility = Renormalization-Group Fixed-Points; ``Noise''-Induced Phase-Transitions (NITs) to Accelerate Algorithmics (``NIT-Picking'') Replacing CRUTCHES!!!: Gauss Modular/Clock-Arithmetic Congruences = Signal X Noise PRODUCTS..

NASA Astrophysics Data System (ADS)

Siegel, J.; Siegel, Edward Carl-Ludwig

2011-03-01

Cook-Levin computational-"complexity"(C-C) algorithmic-equivalence reduction-theorem reducibility equivalence to renormalization-(semi)-group phase-transitions critical-phenomena statistical-physics universality-classes fixed-points, is exploited with Gauss modular/clock-arithmetic/model congruences = signal X noise PRODUCT reinterpretation. Siegel-Baez FUZZYICS=CATEGORYICS(SON of ``TRIZ''): Category-Semantics(C-S) tabular list-format truth-table matrix analytics predicts and implements "noise"-induced phase-transitions (NITs) to accelerate versus to decelerate Harel [Algorithmics(1987)]-Sipser[Intro. Theory Computation(1997) algorithmic C-C: "NIT-picking" to optimize optimization-problems optimally(OOPO). Versus iso-"noise" power-spectrum quantitative-only amplitude/magnitude-only variation stochastic-resonance, this "NIT-picking" is "noise" power-spectrum QUALitative-type variation via quantitative critical-exponents variation. Computer-"science" algorithmic C-C models: Turing-machine, finite-state-models/automata, are identified as early-days once-workable but NOW ONLY LIMITING CRUTCHES IMPEDING latter-days new-insights!!!
A fast Fourier transform on multipoles (FFTM) algorithm for solving Helmholtz equation in acoustics analysis.

PubMed

Ong, Eng Teo; Lee, Heow Pueh; Lim, Kian Meng

2004-09-01

This article presents a fast algorithm for the efficient solution of the Helmholtz equation. The method is based on the translation theory of the multipole expansions. Here, the speedup comes from the convolution nature of the translation operators, which can be evaluated rapidly using fast Fourier transform algorithms. Also, the computations of the translation operators are accelerated by using the recursive formulas developed recently by Gumerov and Duraiswami [SIAM J. Sci. Comput. 25, 1344-1381(2003)]. It is demonstrated that the algorithm can produce good accuracy with a relatively low order of expansion. Efficiency analyses of the algorithm reveal that it has computational complexities of O(Na), where a ranges from 1.05 to 1.24. However, this method requires substantially more memory to store the translation operators as compared to the fast multipole method. Hence, despite its simplicity in implementation, this memory requirement issue may limit the application of this algorithm to solving very large-scale problems.
Comparison of algorithms for computing the two-dimensional discrete Hartley transform

NASA Technical Reports Server (NTRS)

Reichenbach, Stephen E.; Burton, John C.; Miller, Keith W.

1989-01-01

Three methods have been described for computing the two-dimensional discrete Hartley transform. Two of these employ a separable transform, the third method, the vector-radix algorithm, does not require separability. In-place computation of the vector-radix method is described. Operation counts and execution times indicate that the vector-radix method is fastest.
Computing Principal Eigenvectors of Large Web Graphs: Algorithms and Accelerations Related to PageRank and HITS

ERIC Educational Resources Information Center

Nagasinghe, Iranga

2010-01-01

This thesis investigates and develops a few acceleration techniques for the search engine algorithms used in PageRank and HITS computations. PageRank and HITS methods are two highly successful applications of modern Linear Algebra in computer science and engineering. They constitute the essential technologies accounted for the immense growth and…
Cognitive Correlates of Performance in Algorithms in a Computer Science Course for High School

ERIC Educational Resources Information Center

Avancena, Aimee Theresa; Nishihara, Akinori

2014-01-01

Computer science for high school faces many challenging issues. One of these is whether the students possess the appropriate cognitive ability for learning the fundamentals of computer science. Online tests were created based on known cognitive factors and fundamental algorithms and were implemented among the second grade students in the…
Using Linear Algebra to Introduce Computer Algebra, Numerical Analysis, Data Structures and Algorithms (and To Teach Linear Algebra, Too).

ERIC Educational Resources Information Center

Gonzalez-Vega, Laureano

1999-01-01

Using a Computer Algebra System (CAS) to help with the teaching of an elementary course in linear algebra can be one way to introduce computer algebra, numerical analysis, data structures, and algorithms. Highlights the advantages and disadvantages of this approach to the teaching of linear algebra. (Author/MM)
Efficient and Flexible Computation of Many-Electron Wave Function Overlaps.

PubMed

Plasser, Felix; Ruckenbauer, Matthias; Mai, Sebastian; Oppel, Markus; Marquetand, Philipp; González, Leticia

2016-03-08

A new algorithm for the computation of the overlap between many-electron wave functions is described. This algorithm allows for the extensive use of recurring intermediates and thus provides high computational efficiency. Because of the general formalism employed, overlaps can be computed for varying wave function types, molecular orbitals, basis sets, and molecular geometries. This paves the way for efficiently computing nonadiabatic interaction terms for dynamics simulations. In addition, other application areas can be envisaged, such as the comparison of wave functions constructed at different levels of theory. Aside from explaining the algorithm and evaluating the performance, a detailed analysis of the numerical stability of wave function overlaps is carried out, and strategies for overcoming potential severe pitfalls due to displaced atoms and truncated wave functions are presented.
Quantum adiabatic computation and adiabatic conditions

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wei Zhaohui; Ying Mingsheng

2007-08-15

Recently, quantum adiabatic computation has attracted more and more attention in the literature. It is a novel quantum computation model based on adiabatic approximation, and the analysis of a quantum adiabatic algorithm depends highly on the adiabatic conditions. However, it has been pointed out that the traditional adiabatic conditions are problematic. Thus, results obtained previously should be checked and sufficient adiabatic conditions applicable to adiabatic computation should be proposed. Based on a result of Tong et al. [Phys. Rev. Lett. 98, 150402 (2007)], we propose a modified adiabatic criterion which is more applicable to the analysis of adiabatic algorithms. Asmore » an example, we prove the validity of the local adiabatic search algorithm by employing our criterion.« less
Integration of symbolic and algorithmic hardware and software for the automation of space station subsystems

NASA Technical Reports Server (NTRS)

Gregg, Hugh; Healey, Kathleen; Hack, Edmund; Wong, Carla

1987-01-01

Expert systems that require access to data bases, complex simulations and real time instrumentation have both symbolic as well as algorithmic computing needs. These needs could both be met using a general computing workstation running both symbolic and algorithmic code, or separate, specialized computers networked together. The later approach was chosen to implement TEXSYS, the thermal expert system, developed to demonstrate the ability of an expert system to autonomously control the thermal control system of the space station. TEXSYS has been implemented on a Symbolics workstation, and will be linked to a microVAX computer that will control a thermal test bed. Integration options are explored and several possible solutions are presented.
An O(log sup 2 N) parallel algorithm for computing the eigenvalues of a symmetric tridiagonal matrix

NASA Technical Reports Server (NTRS)

Swarztrauber, Paul N.

1989-01-01

An O(log sup 2 N) parallel algorithm is presented for computing the eigenvalues of a symmetric tridiagonal matrix using a parallel algorithm for computing the zeros of the characteristic polynomial. The method is based on a quadratic recurrence in which the characteristic polynomial is constructed on a binary tree from polynomials whose degree doubles at each level. Intervals that contain exactly one zero are determined by the zeros of polynomials at the previous level which ensures that different processors compute different zeros. The exact behavior of the polynomials at the interval endpoints is used to eliminate the usual problems induced by finite precision arithmetic.
Parallel Algorithms for the Exascale Era

DOE Office of Scientific and Technical Information (OSTI.GOV)

Robey, Robert W.

New parallel algorithms are needed to reach the Exascale level of parallelism with millions of cores. We look at some of the research developed by students in projects at LANL. The research blends ideas from the early days of computing while weaving in the fresh approach brought by students new to the field of high performance computing. We look at reproducibility of global sums and why it is important to parallel computing. Next we look at how the concept of hashing has led to the development of more scalable algorithms suitable for next-generation parallel computers. Nearly all of this workmore » has been done by undergraduates and published in leading scientific journals.« less
Smell Detection Agent Based Optimization Algorithm

NASA Astrophysics Data System (ADS)

Vinod Chandra, S. S.

2016-09-01

In this paper, a novel nature-inspired optimization algorithm has been employed and the trained behaviour of dogs in detecting smell trails is adapted into computational agents for problem solving. The algorithm involves creation of a surface with smell trails and subsequent iteration of the agents in resolving a path. This algorithm can be applied in different computational constraints that incorporate path-based problems. Implementation of the algorithm can be treated as a shortest path problem for a variety of datasets. The simulated agents have been used to evolve the shortest path between two nodes in a graph. This algorithm is useful to solve NP-hard problems that are related to path discovery. This algorithm is also useful to solve many practical optimization problems. The extensive derivation of the algorithm can be enabled to solve shortest path problems.
Computing Maximum Cardinality Matchings in Parallel on Bipartite Graphs via Tree-Grafting

DOE Office of Scientific and Technical Information (OSTI.GOV)

Azad, Ariful; Buluc, Aydn; Pothen, Alex

It is difficult to obtain high performance when computing matchings on parallel processors because matching algorithms explicitly or implicitly search for paths in the graph, and when these paths become long, there is little concurrency. In spite of this limitation, we present a new algorithm and its shared-memory parallelization that achieves good performance and scalability in computing maximum cardinality matchings in bipartite graphs. This algorithm searches for augmenting paths via specialized breadth-first searches (BFS) from multiple source vertices, hence creating more parallelism than single source algorithms. Algorithms that employ multiple-source searches cannot discard a search tree once no augmenting pathmore » is discovered from the tree, unlike algorithms that rely on single-source searches. We describe a novel tree-grafting method that eliminates most of the redundant edge traversals resulting from this property of multiple-source searches. We also employ the recent direction-optimizing BFS algorithm as a subroutine to discover augmenting paths faster. Our algorithm compares favorably with the current best algorithms in terms of the number of edges traversed, the average augmenting path length, and the number of iterations. Here, we provide a proof of correctness for our algorithm. Our NUMA-aware implementation is scalable to 80 threads of an Intel multiprocessor and to 240 threads on an Intel Knights Corner coprocessor. On average, our parallel algorithm runs an order of magnitude faster than the fastest algorithms available. The performance improvement is more significant on graphs with small matching number.« less
Computing Maximum Cardinality Matchings in Parallel on Bipartite Graphs via Tree-Grafting

DOE PAGES

Azad, Ariful; Buluc, Aydn; Pothen, Alex

2016-03-24

It is difficult to obtain high performance when computing matchings on parallel processors because matching algorithms explicitly or implicitly search for paths in the graph, and when these paths become long, there is little concurrency. In spite of this limitation, we present a new algorithm and its shared-memory parallelization that achieves good performance and scalability in computing maximum cardinality matchings in bipartite graphs. This algorithm searches for augmenting paths via specialized breadth-first searches (BFS) from multiple source vertices, hence creating more parallelism than single source algorithms. Algorithms that employ multiple-source searches cannot discard a search tree once no augmenting pathmore » is discovered from the tree, unlike algorithms that rely on single-source searches. We describe a novel tree-grafting method that eliminates most of the redundant edge traversals resulting from this property of multiple-source searches. We also employ the recent direction-optimizing BFS algorithm as a subroutine to discover augmenting paths faster. Our algorithm compares favorably with the current best algorithms in terms of the number of edges traversed, the average augmenting path length, and the number of iterations. Here, we provide a proof of correctness for our algorithm. Our NUMA-aware implementation is scalable to 80 threads of an Intel multiprocessor and to 240 threads on an Intel Knights Corner coprocessor. On average, our parallel algorithm runs an order of magnitude faster than the fastest algorithms available. The performance improvement is more significant on graphs with small matching number.« less
Atrial Fibrillation Screening in Nonmetropolitan Areas Using a Telehealth Surveillance System With an Embedded Cloud-Computing Algorithm: Prospective Pilot Study

PubMed Central

Chen, Ying-Hsien; Hung, Chi-Sheng; Huang, Ching-Chang; Hung, Yu-Chien

2017-01-01

Background Atrial fibrillation (AF) is a common form of arrhythmia that is associated with increased risk of stroke and mortality. Detecting AF before the first complication occurs is a recognized priority. No previous studies have examined the feasibility of undertaking AF screening using a telehealth surveillance system with an embedded cloud-computing algorithm; we address this issue in this study. Objective The objective of this study was to evaluate the feasibility of AF screening in nonmetropolitan areas using a telehealth surveillance system with an embedded cloud-computing algorithm. Methods We conducted a prospective AF screening study in a nonmetropolitan area using a single-lead electrocardiogram (ECG) recorder. All ECG measurements were reviewed on the telehealth surveillance system and interpreted by the cloud-computing algorithm and a cardiologist. The process of AF screening was evaluated with a satisfaction questionnaire. Results Between March 11, 2016 and August 31, 2016, 967 ECGs were recorded from 922 residents in nonmetropolitan areas. A total of 22 (2.4%, 22/922) residents with AF were identified by the physician’s ECG interpretation, and only 0.2% (2/967) of ECGs contained significant artifacts. The novel cloud-computing algorithm for AF detection had a sensitivity of 95.5% (95% CI 77.2%-99.9%) and specificity of 97.7% (95% CI 96.5%-98.5%). The overall satisfaction score for the process of AF screening was 92.1%. Conclusions AF screening in nonmetropolitan areas using a telehealth surveillance system with an embedded cloud-computing algorithm is feasible. PMID:28951384

Simplification of multiple Fourier series - An example of algorithmic approach

NASA Technical Reports Server (NTRS)

Ng, E. W.

1981-01-01

This paper describes one example of multiple Fourier series which originate from a problem of spectral analysis of time series data. The example is exercised here with an algorithmic approach which can be generalized for other series manipulation on a computer. The generalized approach is presently pursued towards applications to a variety of multiple series and towards a general purpose algorithm for computer algebra implementation.
Parallel CE/SE Computations via Domain Decomposition

NASA Technical Reports Server (NTRS)

Himansu, Ananda; Jorgenson, Philip C. E.; Wang, Xiao-Yen; Chang, Sin-Chung

2000-01-01

This paper describes the parallelization strategy and achieved parallel efficiency of an explicit time-marching algorithm for solving conservation laws. The Space-Time Conservation Element and Solution Element (CE/SE) algorithm for solving the 2D and 3D Euler equations is parallelized with the aid of domain decomposition. The parallel efficiency of the resultant algorithm on a Silicon Graphics Origin 2000 parallel computer is checked.
The remote sensing image segmentation mean shift algorithm parallel processing based on MapReduce

NASA Astrophysics Data System (ADS)

Chen, Xi; Zhou, Liqing

2015-12-01

With the development of satellite remote sensing technology and the remote sensing image data, traditional remote sensing image segmentation technology cannot meet the massive remote sensing image processing and storage requirements. This article put cloud computing and parallel computing technology in remote sensing image segmentation process, and build a cheap and efficient computer cluster system that uses parallel processing to achieve MeanShift algorithm of remote sensing image segmentation based on the MapReduce model, not only to ensure the quality of remote sensing image segmentation, improved split speed, and better meet the real-time requirements. The remote sensing image segmentation MeanShift algorithm parallel processing algorithm based on MapReduce shows certain significance and a realization of value.
A charge- and energy-conserving implicit, electrostatic particle-in-cell algorithm on mapped computational meshes

NASA Astrophysics Data System (ADS)

Chacón, L.; Chen, G.; Barnes, D. C.

2013-01-01

We describe the extension of the recent charge- and energy-conserving one-dimensional electrostatic particle-in-cell algorithm in Ref. [G. Chen, L. Chacón, D.C. Barnes, An energy- and charge-conserving, implicit electrostatic particle-in-cell algorithm, Journal of Computational Physics 230 (2011) 7018-7036] to mapped (body-fitted) computational meshes. The approach maintains exact charge and energy conservation properties. Key to the algorithm is a hybrid push, where particle positions are updated in logical space, while velocities are updated in physical space. The effectiveness of the approach is demonstrated with a challenging numerical test case, the ion acoustic shock wave. The generalization of the approach to multiple dimensions is outlined.
The fractional Fourier transform and applications

NASA Technical Reports Server (NTRS)

Bailey, David H.; Swarztrauber, Paul N.

1991-01-01

This paper describes the 'fractional Fourier transform', which admits computation by an algorithm that has complexity proportional to the fast Fourier transform algorithm. Whereas the discrete Fourier transform (DFT) is based on integral roots of unity e exp -2(pi)i/n, the fractional Fourier transform is based on fractional roots of unity e exp -2(pi)i(alpha), where alpha is arbitrary. The fractional Fourier transform and the corresponding fast algorithm are useful for such applications as computing DFTs of sequences with prime lengths, computing DFTs of sparse sequences, analyzing sequences with noninteger periodicities, performing high-resolution trigonometric interpolation, detecting lines in noisy images, and detecting signals with linearly drifting frequencies. In many cases, the resulting algorithms are faster by arbitrarily large factors than conventional techniques.
The Use of Computer Vision Algorithms for Automatic Orientation of Terrestrial Laser Scanning Data

NASA Astrophysics Data System (ADS)

Markiewicz, Jakub Stefan

2016-06-01

The paper presents analysis of the orientation of terrestrial laser scanning (TLS) data. In the proposed data processing methodology, point clouds are considered as panoramic images enriched by the depth map. Computer vision (CV) algorithms are used for orientation, which are applied for testing the correctness of the detection of tie points and time of computations, and for assessing difficulties in their implementation. The BRISK, FASRT, MSER, SIFT, SURF, ASIFT and CenSurE algorithms are used to search for key-points. The source data are point clouds acquired using a Z+F 5006h terrestrial laser scanner on the ruins of Iłża Castle, Poland. Algorithms allowing combination of the photogrammetric and CV approaches are also presented.
Quantum computation in the analysis of hyperspectral data

NASA Astrophysics Data System (ADS)

Gomez, Richard B.; Ghoshal, Debabrata; Jayanna, Anil

2004-08-01

Recent research on the topic of quantum computation provides us with some quantum algorithms with higher efficiency and speedup compared to their classical counterparts. In this paper, it is our intent to provide the results of our investigation of several applications of such quantum algorithms - especially the Grover's Search algorithm - in the analysis of Hyperspectral Data. We found many parallels with Grover's method in existing data processing work that make use of classical spectral matching algorithms. Our efforts also included the study of several methods dealing with hyperspectral image analysis work where classical computation methods involving large data sets could be replaced with quantum computation methods. The crux of the problem in computation involving a hyperspectral image data cube is to convert the large amount of data in high dimensional space to real information. Currently, using the classical model, different time consuming methods and steps are necessary to analyze these data including: Animation, Minimum Noise Fraction Transform, Pixel Purity Index algorithm, N-dimensional scatter plot, Identification of Endmember spectra - are such steps. If a quantum model of computation involving hyperspectral image data can be developed and formalized - it is highly likely that information retrieval from hyperspectral image data cubes would be a much easier process and the final information content would be much more meaningful and timely. In this case, dimensionality would not be a curse, but a blessing.
Optimization Algorithm for Kalman Filter Exploiting the Numerical Characteristics of SINS/GPS Integrated Navigation Systems.

PubMed

Hu, Shaoxing; Xu, Shike; Wang, Duhu; Zhang, Aiwu

2015-11-11

Aiming at addressing the problem of high computational cost of the traditional Kalman filter in SINS/GPS, a practical optimization algorithm with offline-derivation and parallel processing methods based on the numerical characteristics of the system is presented in this paper. The algorithm exploits the sparseness and/or symmetry of matrices to simplify the computational procedure. Thus plenty of invalid operations can be avoided by offline derivation using a block matrix technique. For enhanced efficiency, a new parallel computational mechanism is established by subdividing and restructuring calculation processes after analyzing the extracted "useful" data. As a result, the algorithm saves about 90% of the CPU processing time and 66% of the memory usage needed in a classical Kalman filter. Meanwhile, the method as a numerical approach needs no precise-loss transformation/approximation of system modules and the accuracy suffers little in comparison with the filter before computational optimization. Furthermore, since no complicated matrix theories are needed, the algorithm can be easily transplanted into other modified filters as a secondary optimization method to achieve further efficiency.
Rapid Calculation of Max-Min Fair Rates for Multi-Commodity Flows in Fat-Tree Networks

DOE PAGES

Mollah, Md Atiqul; Yuan, Xin; Pakin, Scott; ...

2017-08-29

Max-min fairness is often used in the performance modeling of interconnection networks. Existing methods to compute max-min fair rates for multi-commodity flows have high complexity and are computationally infeasible for large networks. In this paper, we show that by considering topological features, this problem can be solved efficiently for the fat-tree topology that is widely used in data centers and high performance compute clusters. Several efficient new algorithms are developed for this problem, including a parallel algorithm that can take advantage of multi-core and shared-memory architectures. Using these algorithms, we demonstrate that it is possible to find the max-min fairmore » rate allocation for multi-commodity flows in fat-tree networks that support tens of thousands of nodes. We evaluate the run-time performance of the proposed algorithms and show improvement in orders of magnitude over the previously best known method. Finally, we further demonstrate a new application of max-min fair rate allocation that is only computationally feasible using our new algorithms.« less
Rapid Calculation of Max-Min Fair Rates for Multi-Commodity Flows in Fat-Tree Networks

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mollah, Md Atiqul; Yuan, Xin; Pakin, Scott

Max-min fairness is often used in the performance modeling of interconnection networks. Existing methods to compute max-min fair rates for multi-commodity flows have high complexity and are computationally infeasible for large networks. In this paper, we show that by considering topological features, this problem can be solved efficiently for the fat-tree topology that is widely used in data centers and high performance compute clusters. Several efficient new algorithms are developed for this problem, including a parallel algorithm that can take advantage of multi-core and shared-memory architectures. Using these algorithms, we demonstrate that it is possible to find the max-min fairmore » rate allocation for multi-commodity flows in fat-tree networks that support tens of thousands of nodes. We evaluate the run-time performance of the proposed algorithms and show improvement in orders of magnitude over the previously best known method. Finally, we further demonstrate a new application of max-min fair rate allocation that is only computationally feasible using our new algorithms.« less
EMILiO: a fast algorithm for genome-scale strain design.

PubMed

Yang, Laurence; Cluett, William R; Mahadevan, Radhakrishnan

2011-05-01

Systems-level design of cell metabolism is becoming increasingly important for renewable production of fuels, chemicals, and drugs. Computational models are improving in the accuracy and scope of predictions, but are also growing in complexity. Consequently, efficient and scalable algorithms are increasingly important for strain design. Previous algorithms helped to consolidate the utility of computational modeling in this field. To meet intensifying demands for high-performance strains, both the number and variety of genetic manipulations involved in strain construction are increasing. Existing algorithms have experienced combinatorial increases in computational complexity when applied toward the design of such complex strains. Here, we present EMILiO, a new algorithm that increases the scope of strain design to include reactions with individually optimized fluxes. Unlike existing approaches that would experience an explosion in complexity to solve this problem, we efficiently generated numerous alternate strain designs producing succinate, l-glutamate and l-serine. This was enabled by successive linear programming, a technique new to the area of computational strain design. Copyright © 2011 Elsevier Inc. All rights reserved.
Extending the eigCG algorithm to nonsymmetric Lanczos for linear systems with multiple right-hand sides

DOE Office of Scientific and Technical Information (OSTI.GOV)

Abdel-Rehim, A M; Stathopoulos, Andreas; Orginos, Kostas

2014-08-01

The technique that was used to build the EigCG algorithm for sparse symmetric linear systems is extended to the nonsymmetric case using the BiCG algorithm. We show that, similarly to the symmetric case, we can build an algorithm that is capable of computing a few smallest magnitude eigenvalues and their corresponding left and right eigenvectors of a nonsymmetric matrix using only a small window of the BiCG residuals while simultaneously solving a linear system with that matrix. For a system with multiple right-hand sides, we give an algorithm that computes incrementally more eigenvalues while solving the first few systems andmore » then uses the computed eigenvectors to deflate BiCGStab for the remaining systems. Our experiments on various test problems, including Lattice QCD, show the remarkable ability of EigBiCG to compute spectral approximations with accuracy comparable to that of the unrestarted, nonsymmetric Lanczos. Furthermore, our incremental EigBiCG followed by appropriately restarted and deflated BiCGStab provides a competitive method for systems with multiple right-hand sides.« less
A novel clinical decision support algorithm for constructing complete medication histories.

PubMed

Long, Ju; Yuan, Michael Juntao

2017-07-01

A patient's complete medication history is a crucial element for physicians to develop a full understanding of the patient's medical conditions and treatment options. However, due to the fragmented nature of medical data, this process can be very time-consuming and often impossible for physicians to construct a complete medication history for complex patients. In this paper, we describe an accurate, computationally efficient and scalable algorithm to construct a medication history timeline. The algorithm is developed and validated based on 1 million random prescription records from a large national prescription data aggregator. Our evaluation shows that the algorithm can be scaled horizontally on-demand, making it suitable for future delivery in a cloud-computing environment. We also propose that this cloud-based medication history computation algorithm could be integrated into Electronic Medical Records, enabling informed clinical decision-making at the point of care. Copyright © 2017 Elsevier B.V. All rights reserved.
Scalable software-defined optical networking with high-performance routing and wavelength assignment algorithms.

PubMed

Lee, Chankyun; Cao, Xiaoyuan; Yoshikane, Noboru; Tsuritani, Takehiro; Rhee, June-Koo Kevin

2015-10-19

The feasibility of software-defined optical networking (SDON) for a practical application critically depends on scalability of centralized control performance. The paper, highly scalable routing and wavelength assignment (RWA) algorithms are investigated on an OpenFlow-based SDON testbed for proof-of-concept demonstration. Efficient RWA algorithms are proposed to achieve high performance in achieving network capacity with reduced computation cost, which is a significant attribute in a scalable centralized-control SDON. The proposed heuristic RWA algorithms differ in the orders of request processes and in the procedures of routing table updates. Combined in a shortest-path-based routing algorithm, a hottest-request-first processing policy that considers demand intensity and end-to-end distance information offers both the highest throughput of networks and acceptable computation scalability. We further investigate trade-off relationship between network throughput and computation complexity in routing table update procedure by a simulation study.
Research on Image Encryption Based on DNA Sequence and Chaos Theory

NASA Astrophysics Data System (ADS)

Tian Zhang, Tian; Yan, Shan Jun; Gu, Cheng Yan; Ren, Ran; Liao, Kai Xin

2018-04-01

Nowadays encryption is a common technique to protect image data from unauthorized access. In recent years, many scientists have proposed various encryption algorithms based on DNA sequence to provide a new idea for the design of image encryption algorithm. Therefore, a new method of image encryption based on DNA computing technology is proposed in this paper, whose original image is encrypted by DNA coding and 1-D logistic chaotic mapping. First, the algorithm uses two modules as the encryption key. The first module uses the real DNA sequence, and the second module is made by one-dimensional logistic chaos mapping. Secondly, the algorithm uses DNA complementary rules to encode original image, and uses the key and DNA computing technology to compute each pixel value of the original image, so as to realize the encryption of the whole image. Simulation results show that the algorithm has good encryption effect and security.
An acceleration framework for synthetic aperture radar algorithms

NASA Astrophysics Data System (ADS)

Kim, Youngsoo; Gloster, Clay S.; Alexander, Winser E.

2017-04-01

Algorithms for radar signal processing, such as Synthetic Aperture Radar (SAR) are computationally intensive and require considerable execution time on a general purpose processor. Reconfigurable logic can be used to off-load the primary computational kernel onto a custom computing machine in order to reduce execution time by an order of magnitude as compared to kernel execution on a general purpose processor. Specifically, Field Programmable Gate Arrays (FPGAs) can be used to accelerate these kernels using hardware-based custom logic implementations. In this paper, we demonstrate a framework for algorithm acceleration. We used SAR as a case study to illustrate the potential for algorithm acceleration offered by FPGAs. Initially, we profiled the SAR algorithm and implemented a homomorphic filter using a hardware implementation of the natural logarithm. Experimental results show a linear speedup by adding reasonably small processing elements in Field Programmable Gate Array (FPGA) as opposed to using a software implementation running on a typical general purpose processor.
Computationally-Efficient Minimum-Time Aircraft Routes in the Presence of Winds

NASA Technical Reports Server (NTRS)

Jardin, Matthew R.

2004-01-01

A computationally efficient algorithm for minimizing the flight time of an aircraft in a variable wind field has been invented. The algorithm, referred to as Neighboring Optimal Wind Routing (NOWR), is based upon neighboring-optimal-control (NOC) concepts and achieves minimum-time paths by adjusting aircraft heading according to wind conditions at an arbitrary number of wind measurement points along the flight route. The NOWR algorithm may either be used in a fast-time mode to compute minimum- time routes prior to flight, or may be used in a feedback mode to adjust aircraft heading in real-time. By traveling minimum-time routes instead of direct great-circle (direct) routes, flights across the United States can save an average of about 7 minutes, and as much as one hour of flight time during periods of strong jet-stream winds. The neighboring optimal routes computed via the NOWR technique have been shown to be within 1.5 percent of the absolute minimum-time routes for flights across the continental United States. On a typical 450-MHz Sun Ultra workstation, the NOWR algorithm produces complete minimum-time routes in less than 40 milliseconds. This corresponds to a rate of 25 optimal routes per second. The closest comparable optimization technique runs approximately 10 times slower. Airlines currently use various trial-and-error search techniques to determine which of a set of commonly traveled routes will minimize flight time. These algorithms are too computationally expensive for use in real-time systems, or in systems where many optimal routes need to be computed in a short amount of time. Instead of operating in real-time, airlines will typically plan a trajectory several hours in advance using wind forecasts. If winds change significantly from forecasts, the resulting flights will no longer be minimum-time. The need for a computationally efficient wind-optimal routing algorithm is even greater in the case of new air-traffic-control automation concepts. For air-traffic-control automation, thousands of wind-optimal routes may need to be computed and checked for conflicts in just a few minutes. These factors motivated the need for a more efficient wind-optimal routing algorithm.
A POSTERIORI ERROR ANALYSIS OF TWO STAGE COMPUTATION METHODS WITH APPLICATION TO EFFICIENT DISCRETIZATION AND THE PARAREAL ALGORITHM.

PubMed

Chaudhry, Jehanzeb Hameed; Estep, Don; Tavener, Simon; Carey, Varis; Sandelin, Jeff

2016-01-01

We consider numerical methods for initial value problems that employ a two stage approach consisting of solution on a relatively coarse discretization followed by solution on a relatively fine discretization. Examples include adaptive error control, parallel-in-time solution schemes, and efficient solution of adjoint problems for computing a posteriori error estimates. We describe a general formulation of two stage computations then perform a general a posteriori error analysis based on computable residuals and solution of an adjoint problem. The analysis accommodates various variations in the two stage computation and in formulation of the adjoint problems. We apply the analysis to compute "dual-weighted" a posteriori error estimates, to develop novel algorithms for efficient solution that take into account cancellation of error, and to the Parareal Algorithm. We test the various results using several numerical examples.
Handling Big Data in Medical Imaging: Iterative Reconstruction with Large-Scale Automated Parallel Computation

PubMed Central

Lee, Jae H.; Yao, Yushu; Shrestha, Uttam; Gullberg, Grant T.; Seo, Youngho

2014-01-01

The primary goal of this project is to implement the iterative statistical image reconstruction algorithm, in this case maximum likelihood expectation maximum (MLEM) used for dynamic cardiac single photon emission computed tomography, on Spark/GraphX. This involves porting the algorithm to run on large-scale parallel computing systems. Spark is an easy-to- program software platform that can handle large amounts of data in parallel. GraphX is a graph analytic system running on top of Spark to handle graph and sparse linear algebra operations in parallel. The main advantage of implementing MLEM algorithm in Spark/GraphX is that it allows users to parallelize such computation without any expertise in parallel computing or prior knowledge in computer science. In this paper we demonstrate a successful implementation of MLEM in Spark/GraphX and present the performance gains with the goal to eventually make it useable in clinical setting. PMID:27081299
Handling Big Data in Medical Imaging: Iterative Reconstruction with Large-Scale Automated Parallel Computation.

PubMed

Lee, Jae H; Yao, Yushu; Shrestha, Uttam; Gullberg, Grant T; Seo, Youngho

2014-11-01

The primary goal of this project is to implement the iterative statistical image reconstruction algorithm, in this case maximum likelihood expectation maximum (MLEM) used for dynamic cardiac single photon emission computed tomography, on Spark/GraphX. This involves porting the algorithm to run on large-scale parallel computing systems. Spark is an easy-to- program software platform that can handle large amounts of data in parallel. GraphX is a graph analytic system running on top of Spark to handle graph and sparse linear algebra operations in parallel. The main advantage of implementing MLEM algorithm in Spark/GraphX is that it allows users to parallelize such computation without any expertise in parallel computing or prior knowledge in computer science. In this paper we demonstrate a successful implementation of MLEM in Spark/GraphX and present the performance gains with the goal to eventually make it useable in clinical setting.

The explicit computation of integration algorithms and first integrals for ordinary differential equations with polynomials coefficients using trees

NASA Technical Reports Server (NTRS)

Crouch, P. E.; Grossman, Robert

1992-01-01

This note is concerned with the explicit symbolic computation of expressions involving differential operators and their actions on functions. The derivation of specialized numerical algorithms, the explicit symbolic computation of integrals of motion, and the explicit computation of normal forms for nonlinear systems all require such computations. More precisely, if R = k(x(sub 1),...,x(sub N)), where k = R or C, F denotes a differential operator with coefficients from R, and g member of R, we describe data structures and algorithms for efficiently computing g. The basic idea is to impose a multiplicative structure on the vector space with basis the set of finite rooted trees and whose nodes are labeled with the coefficients of the differential operators. Cancellations of two trees with r + 1 nodes translates into cancellation of O(N(exp r)) expressions involving the coefficient functions and their derivatives.
Molecular simulation workflows as parallel algorithms: the execution engine of Copernicus, a distributed high-performance computing platform.

PubMed

Pronk, Sander; Pouya, Iman; Lundborg, Magnus; Rotskoff, Grant; Wesén, Björn; Kasson, Peter M; Lindahl, Erik

2015-06-09

Computational chemistry and other simulation fields are critically dependent on computing resources, but few problems scale efficiently to the hundreds of thousands of processors available in current supercomputers-particularly for molecular dynamics. This has turned into a bottleneck as new hardware generations primarily provide more processing units rather than making individual units much faster, which simulation applications are addressing by increasingly focusing on sampling with algorithms such as free-energy perturbation, Markov state modeling, metadynamics, or milestoning. All these rely on combining results from multiple simulations into a single observation. They are potentially powerful approaches that aim to predict experimental observables directly, but this comes at the expense of added complexity in selecting sampling strategies and keeping track of dozens to thousands of simulations and their dependencies. Here, we describe how the distributed execution framework Copernicus allows the expression of such algorithms in generic workflows: dataflow programs. Because dataflow algorithms explicitly state dependencies of each constituent part, algorithms only need to be described on conceptual level, after which the execution is maximally parallel. The fully automated execution facilitates the optimization of these algorithms with adaptive sampling, where undersampled regions are automatically detected and targeted without user intervention. We show how several such algorithms can be formulated for computational chemistry problems, and how they are executed efficiently with many loosely coupled simulations using either distributed or parallel resources with Copernicus.
We get the algorithms of our ground truths: Designing referential databases in digital image processing

PubMed Central

Jaton, Florian

2017-01-01

This article documents the practical efforts of a group of scientists designing an image-processing algorithm for saliency detection. By following the actors of this computer science project, the article shows that the problems often considered to be the starting points of computational models are in fact provisional results of time-consuming, collective and highly material processes that engage habits, desires, skills and values. In the project being studied, problematization processes lead to the constitution of referential databases called ‘ground truths’ that enable both the effective shaping of algorithms and the evaluation of their performances. Working as important common touchstones for research communities in image processing, the ground truths are inherited from prior problematization processes and may be imparted to subsequent ones. The ethnographic results of this study suggest two complementary analytical perspectives on algorithms: (1) an ‘axiomatic’ perspective that understands algorithms as sets of instructions designed to solve given problems computationally in the best possible way, and (2) a ‘problem-oriented’ perspective that understands algorithms as sets of instructions designed to computationally retrieve outputs designed and designated during specific problematization processes. If the axiomatic perspective on algorithms puts the emphasis on the numerical transformations of inputs into outputs, the problem-oriented perspective puts the emphasis on the definition of both inputs and outputs. PMID:28950802
An algorithmic framework for multiobjective optimization.

PubMed

Ganesan, T; Elamvazuthi, I; Shaari, Ku Zilati Ku; Vasant, P

2013-01-01

Multiobjective (MO) optimization is an emerging field which is increasingly being encountered in many fields globally. Various metaheuristic techniques such as differential evolution (DE), genetic algorithm (GA), gravitational search algorithm (GSA), and particle swarm optimization (PSO) have been used in conjunction with scalarization techniques such as weighted sum approach and the normal-boundary intersection (NBI) method to solve MO problems. Nevertheless, many challenges still arise especially when dealing with problems with multiple objectives (especially in cases more than two). In addition, problems with extensive computational overhead emerge when dealing with hybrid algorithms. This paper discusses these issues by proposing an alternative framework that utilizes algorithmic concepts related to the problem structure for generating efficient and effective algorithms. This paper proposes a framework to generate new high-performance algorithms with minimal computational overhead for MO optimization.
An Algorithmic Framework for Multiobjective Optimization

PubMed Central

Ganesan, T.; Elamvazuthi, I.; Shaari, Ku Zilati Ku; Vasant, P.

2013-01-01

Multiobjective (MO) optimization is an emerging field which is increasingly being encountered in many fields globally. Various metaheuristic techniques such as differential evolution (DE), genetic algorithm (GA), gravitational search algorithm (GSA), and particle swarm optimization (PSO) have been used in conjunction with scalarization techniques such as weighted sum approach and the normal-boundary intersection (NBI) method to solve MO problems. Nevertheless, many challenges still arise especially when dealing with problems with multiple objectives (especially in cases more than two). In addition, problems with extensive computational overhead emerge when dealing with hybrid algorithms. This paper discusses these issues by proposing an alternative framework that utilizes algorithmic concepts related to the problem structure for generating efficient and effective algorithms. This paper proposes a framework to generate new high-performance algorithms with minimal computational overhead for MO optimization. PMID:24470795
Scalable Nearest Neighbor Algorithms for High Dimensional Data.

PubMed

Muja, Marius; Lowe, David G

2014-11-01

For many computer vision and machine learning problems, large training sets are key for good performance. However, the most computationally expensive part of many computer vision and machine learning algorithms consists of finding nearest neighbor matches to high dimensional vectors that represent the training data. We propose new algorithms for approximate nearest neighbor matching and evaluate and compare them with previous algorithms. For matching high dimensional features, we find two algorithms to be the most efficient: the randomized k-d forest and a new algorithm proposed in this paper, the priority search k-means tree. We also propose a new algorithm for matching binary features by searching multiple hierarchical clustering trees and show it outperforms methods typically used in the literature. We show that the optimal nearest neighbor algorithm and its parameters depend on the data set characteristics and describe an automated configuration procedure for finding the best algorithm to search a particular data set. In order to scale to very large data sets that would otherwise not fit in the memory of a single machine, we propose a distributed nearest neighbor matching framework that can be used with any of the algorithms described in the paper. All this research has been released as an open source library called fast library for approximate nearest neighbors (FLANN), which has been incorporated into OpenCV and is now one of the most popular libraries for nearest neighbor matching.
Recursive partitioned inversion of large (1500 x 1500) symmetric matrices

NASA Technical Reports Server (NTRS)

Putney, B. H.; Brownd, J. E.; Gomez, R. A.

1976-01-01

A recursive algorithm was designed to invert large, dense, symmetric, positive definite matrices using small amounts of computer core, i.e., a small fraction of the core needed to store the complete matrix. The described algorithm is a generalized Gaussian elimination technique. Other algorithms are also discussed for the Cholesky decomposition and step inversion techniques. The purpose of the inversion algorithm is to solve large linear systems of normal equations generated by working geodetic problems. The algorithm was incorporated into a computer program called SOLVE. In the past the SOLVE program has been used in obtaining solutions published as the Goddard earth models.
A reconstruction algorithm for helical CT imaging on PI-planes.

PubMed

Liang, Hongzhu; Zhang, Cishen; Yan, Ming

2006-01-01

In this paper, a Feldkamp type approximate reconstruction algorithm is presented for helical cone-beam Computed Tomography. To effectively suppress artifacts due to large cone angle scanning, it is proposed to reconstruct the object point-wisely on unique customized tilted PI-planes which are close to the data collecting helices of the corresponding points. Such a reconstruction scheme can considerably suppress the artifacts in the cone-angle scanning. Computer simulations show that the proposed algorithm can provide improved imaging performance compared with the existing approximate cone-beam reconstruction algorithms.
Shor's quantum factoring algorithm on a photonic chip.

PubMed

Politi, Alberto; Matthews, Jonathan C F; O'Brien, Jeremy L

2009-09-04

Shor's quantum factoring algorithm finds the prime factors of a large number exponentially faster than any other known method, a task that lies at the heart of modern information security, particularly on the Internet. This algorithm requires a quantum computer, a device that harnesses the massive parallelism afforded by quantum superposition and entanglement of quantum bits (or qubits). We report the demonstration of a compiled version of Shor's algorithm on an integrated waveguide silica-on-silicon chip that guides four single-photon qubits through the computation to factor 15.
Sinogram-based adaptive iterative reconstruction for sparse view x-ray computed tomography

NASA Astrophysics Data System (ADS)

Trinca, D.; Zhong, Y.; Wang, Y.-Z.; Mamyrbayev, T.; Libin, E.

2016-10-01

With the availability of more powerful computing processors, iterative reconstruction algorithms have recently been successfully implemented as an approach to achieving significant dose reduction in X-ray CT. In this paper, we propose an adaptive iterative reconstruction algorithm for X-ray CT, that is shown to provide results comparable to those obtained by proprietary algorithms, both in terms of reconstruction accuracy and execution time. The proposed algorithm is thus provided for free to the scientific community, for regular use, and for possible further optimization.
A computational algorithm addressing how vessel length might depend on vessel diameter

Treesearch

Jing Cai; Shuoxin Zhang; Melvin T. Tyree

2010-01-01

The objective of this method paper was to examine a computational algorithm that may reveal how vessel length might depend on vessel diameter within any given stem or species. The computational method requires the assumption that vessels remain approximately constant in diameter over their entire length. When this method is applied to three species or hybrids in the...
Free energy computations employing Jarzynski identity and Wang – Landau algorithm

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kalyan, M. Suman, E-mail: maroju.sk@gmail.com; Murthy, K. P. N.; School of Physics, University of Hyderabad, Hyderabad, Telangana, India – 500046

We introduce a simple method to compute free energy differences employing Jarzynski identity in conjunction with Wang – Landau algorithm. We demonstrate this method on Ising spin system by comparing the results with those obtained from canonical sampling.
GPU-based cloud service for Smith-Waterman algorithm using frequency distance filtration scheme.

PubMed

Lee, Sheng-Ta; Lin, Chun-Yuan; Hung, Che Lun

2013-01-01

As the conventional means of analyzing the similarity between a query sequence and database sequences, the Smith-Waterman algorithm is feasible for a database search owing to its high sensitivity. However, this algorithm is still quite time consuming. CUDA programming can improve computations efficiently by using the computational power of massive computing hardware as graphics processing units (GPUs). This work presents a novel Smith-Waterman algorithm with a frequency-based filtration method on GPUs rather than merely accelerating the comparisons yet expending computational resources to handle such unnecessary comparisons. A user friendly interface is also designed for potential cloud server applications with GPUs. Additionally, two data sets, H1N1 protein sequences (query sequence set) and human protein database (database set), are selected, followed by a comparison of CUDA-SW and CUDA-SW with the filtration method, referred to herein as CUDA-SWf. Experimental results indicate that reducing unnecessary sequence alignments can improve the computational time by up to 41%. Importantly, by using CUDA-SWf as a cloud service, this application can be accessed from any computing environment of a device with an Internet connection without time constraints.
Advanced processing for high-bandwidth sensor systems

NASA Astrophysics Data System (ADS)

Szymanski, John J.; Blain, Phil C.; Bloch, Jeffrey J.; Brislawn, Christopher M.; Brumby, Steven P.; Cafferty, Maureen M.; Dunham, Mark E.; Frigo, Janette R.; Gokhale, Maya; Harvey, Neal R.; Kenyon, Garrett; Kim, Won-Ha; Layne, J.; Lavenier, Dominique D.; McCabe, Kevin P.; Mitchell, Melanie; Moore, Kurt R.; Perkins, Simon J.; Porter, Reid B.; Robinson, S.; Salazar, Alfonso; Theiler, James P.; Young, Aaron C.

2000-11-01

Compute performance and algorithm design are key problems of image processing and scientific computing in general. For example, imaging spectrometers are capable of producing data in hundreds of spectral bands with millions of pixels. These data sets show great promise for remote sensing applications, but require new and computationally intensive processing. The goal of the Deployable Adaptive Processing Systems (DAPS) project at Los Alamos National Laboratory is to develop advanced processing hardware and algorithms for high-bandwidth sensor applications. The project has produced electronics for processing multi- and hyper-spectral sensor data, as well as LIDAR data, while employing processing elements using a variety of technologies. The project team is currently working on reconfigurable computing technology and advanced feature extraction techniques, with an emphasis on their application to image and RF signal processing. This paper presents reconfigurable computing technology and advanced feature extraction algorithm work and their application to multi- and hyperspectral image processing. Related projects on genetic algorithms as applied to image processing will be introduced, as will the collaboration between the DAPS project and the DARPA Adaptive Computing Systems program. Further details are presented in other talks during this conference and in other conferences taking place during this symposium.
Parallel peak pruning for scalable SMP contour tree computation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Carr, Hamish A.; Weber, Gunther H.; Sewell, Christopher M.

As data sets grow to exascale, automated data analysis and visualisation are increasingly important, to intermediate human understanding and to reduce demands on disk storage via in situ analysis. Trends in architecture of high performance computing systems necessitate analysis algorithms to make effective use of combinations of massively multicore and distributed systems. One of the principal analytic tools is the contour tree, which analyses relationships between contours to identify features of more than local importance. Unfortunately, the predominant algorithms for computing the contour tree are explicitly serial, and founded on serial metaphors, which has limited the scalability of this formmore » of analysis. While there is some work on distributed contour tree computation, and separately on hybrid GPU-CPU computation, there is no efficient algorithm with strong formal guarantees on performance allied with fast practical performance. Here in this paper, we report the first shared SMP algorithm for fully parallel contour tree computation, withfor-mal guarantees of O(lgnlgt) parallel steps and O(n lgn) work, and implementations with up to 10x parallel speed up in OpenMP and up to 50x speed up in NVIDIA Thrust.« less
A parallel implementation of an off-lattice individual-based model of multicellular populations

NASA Astrophysics Data System (ADS)

Harvey, Daniel G.; Fletcher, Alexander G.; Osborne, James M.; Pitt-Francis, Joe

2015-07-01

As computational models of multicellular populations include ever more detailed descriptions of biophysical and biochemical processes, the computational cost of simulating such models limits their ability to generate novel scientific hypotheses and testable predictions. While developments in microchip technology continue to increase the power of individual processors, parallel computing offers an immediate increase in available processing power. To make full use of parallel computing technology, it is necessary to develop specialised algorithms. To this end, we present a parallel algorithm for a class of off-lattice individual-based models of multicellular populations. The algorithm divides the spatial domain between computing processes and comprises communication routines that ensure the model is correctly simulated on multiple processors. The parallel algorithm is shown to accurately reproduce the results of a deterministic simulation performed using a pre-existing serial implementation. We test the scaling of computation time, memory use and load balancing as more processes are used to simulate a cell population of fixed size. We find approximate linear scaling of both speed-up and memory consumption on up to 32 processor cores. Dynamic load balancing is shown to provide speed-up for non-regular spatial distributions of cells in the case of a growing population.
Multilevel Iterative Methods in Nonlinear Computational Plasma Physics

NASA Astrophysics Data System (ADS)

Knoll, D. A.; Finn, J. M.

1997-11-01

Many applications in computational plasma physics involve the implicit numerical solution of coupled systems of nonlinear partial differential equations or integro-differential equations. Such problems arise in MHD, systems of Vlasov-Fokker-Planck equations, edge plasma fluid equations. We have been developing matrix-free Newton-Krylov algorithms for such problems and have applied these algorithms to the edge plasma fluid equations [1,2] and to the Vlasov-Fokker-Planck equation [3]. Recently we have found that with increasing grid refinement, the number of Krylov iterations required per Newton iteration has grown unmanageable [4]. This has led us to the study of multigrid methods as a means of preconditioning matrix-free Newton-Krylov methods. In this poster we will give details of the general multigrid preconditioned Newton-Krylov algorithm, as well as algorithm performance details on problems of interest in the areas of magnetohydrodynamics and edge plasma physics. Work supported by US DoE 1. Knoll and McHugh, J. Comput. Phys., 116, pg. 281 (1995) 2. Knoll and McHugh, Comput. Phys. Comm., 88, pg. 141 (1995) 3. Mousseau and Knoll, J. Comput. Phys. (1997) (to appear) 4. Knoll and McHugh, SIAM J. Sci. Comput. 19, (1998) (to appear)
Are human beings humean robots?

NASA Astrophysics Data System (ADS)

Génova, Gonzalo; Quintanilla Navarro, Ignacio

2018-01-01

David Hume, the Scottish philosopher, conceives reason as the slave of the passions, which implies that human reason has predetermined objectives it cannot question. An essential element of an algorithm running on a computational machine (or Logical Computing Machine, as Alan Turing calls it) is its having a predetermined purpose: an algorithm cannot question its purpose, because it would cease to be an algorithm. Therefore, if self-determination is essential to human intelligence, then human beings are neither Humean beings, nor computational machines. We examine also some objections to the Turing Test as a model to understand human intelligence.
A Scheduling Algorithm for Cloud Computing System Based on the Driver of Dynamic Essential Path.

PubMed

Xie, Zhiqiang; Shao, Xia; Xin, Yu

2016-01-01

To solve the problem of task scheduling in the cloud computing system, this paper proposes a scheduling algorithm for cloud computing based on the driver of dynamic essential path (DDEP). This algorithm applies a predecessor-task layer priority strategy to solve the problem of constraint relations among task nodes. The strategy assigns different priority values to every task node based on the scheduling order of task node as affected by the constraint relations among task nodes, and the task node list is generated by the different priority value. To address the scheduling order problem in which task nodes have the same priority value, the dynamic essential long path strategy is proposed. This strategy computes the dynamic essential path of the pre-scheduling task nodes based on the actual computation cost and communication cost of task node in the scheduling process. The task node that has the longest dynamic essential path is scheduled first as the completion time of task graph is indirectly influenced by the finishing time of task nodes in the longest dynamic essential path. Finally, we demonstrate the proposed algorithm via simulation experiments using Matlab tools. The experimental results indicate that the proposed algorithm can effectively reduce the task Makespan in most cases and meet a high quality performance objective.
A Scheduling Algorithm for Cloud Computing System Based on the Driver of Dynamic Essential Path

PubMed Central

Xie, Zhiqiang; Shao, Xia; Xin, Yu

2016-01-01

To solve the problem of task scheduling in the cloud computing system, this paper proposes a scheduling algorithm for cloud computing based on the driver of dynamic essential path (DDEP). This algorithm applies a predecessor-task layer priority strategy to solve the problem of constraint relations among task nodes. The strategy assigns different priority values to every task node based on the scheduling order of task node as affected by the constraint relations among task nodes, and the task node list is generated by the different priority value. To address the scheduling order problem in which task nodes have the same priority value, the dynamic essential long path strategy is proposed. This strategy computes the dynamic essential path of the pre-scheduling task nodes based on the actual computation cost and communication cost of task node in the scheduling process. The task node that has the longest dynamic essential path is scheduled first as the completion time of task graph is indirectly influenced by the finishing time of task nodes in the longest dynamic essential path. Finally, we demonstrate the proposed algorithm via simulation experiments using Matlab tools. The experimental results indicate that the proposed algorithm can effectively reduce the task Makespan in most cases and meet a high quality performance objective. PMID:27490901

Algorithm for Atmospheric Corrections of Aircraft and Satellite Imagery

NASA Technical Reports Server (NTRS)

Fraser, Robert S.; Kaufman, Yoram J.; Ferrare, Richard A.; Mattoo, Shana

1989-01-01

A simple and fast atmospheric correction algorithm is described which is used to correct radiances of scattered sunlight measured by aircraft and/or satellite above a uniform surface. The atmospheric effect, the basic equations, a description of the computational procedure, and a sensitivity study are discussed. The program is designed to take the measured radiances, view and illumination directions, and the aerosol and gaseous absorption optical thickness to compute the radiance just above the surface, the irradiance on the surface, and surface reflectance. Alternatively, the program will compute the upward radiance at a specific altitude for a given surface reflectance, view and illumination directions, and aerosol and gaseous absorption optical thickness. The algorithm can be applied for any view and illumination directions and any wavelength in the range 0.48 micron to 2.2 micron. The relation between the measured radiance and surface reflectance, which is expressed as a function of atmospheric properties and measurement geometry, is computed using a radiative transfer routine. The results of the computations are presented in a table which forms the basis of the correction algorithm. The algorithm can be used for atmospheric corrections in the presence of a rural aerosol. The sensitivity of the derived surface reflectance to uncertainties in the model and input data is discussed.
Treecode with a Special-Purpose Processor

NASA Astrophysics Data System (ADS)

Makino, Junichiro

1991-08-01

We describe an implementation of the modified Barnes-Hut tree algorithm for a gravitational N-body calculation on a GRAPE (GRAvity PipE) backend processor. GRAPE is a special-purpose computer for N-body calculations. It receives the positions and masses of particles from a host computer and then calculates the gravitational force at each coordinate specified by the host. To use this GRAPE processor with the hierarchical tree algorithm, the host computer must maintain a list of all nodes that exert force on a particle. If we create this list for each particle of the system at each timestep, the number of floating-point operations on the host and that on GRAPE would become comparable, and the increased speed obtained by using GRAPE would be small. In our modified algorithm, we create a list of nodes for many particles. Thus, the amount of the work required of the host is significantly reduced. This algorithm was originally developed by Barnes in order to vectorize the force calculation on a Cyber 205. With this algorithm, the computing time of the force calculation becomes comparable to that of the tree construction, if the GRAPE backend processor is sufficiently fast. The obtained speed-up factor is 30 to 50 for a RISC-based host computer and GRAPE-1A with a peak speed of 240 Mflops.
Algorithm for atmospheric corrections of aircraft and satellite imagery

NASA Technical Reports Server (NTRS)

Fraser, R. S.; Ferrare, R. A.; Kaufman, Y. J.; Markham, B. L.; Mattoo, S.

1992-01-01

A simple and fast atmospheric correction algorithm is described which is used to correct radiances of scattered sunlight measured by aircraft and/or satellite above a uniform surface. The atmospheric effect, the basic equations, a description of the computational procedure, and a sensitivity study are discussed. The program is designed to take the measured radiances, view and illumination directions, and the aerosol and gaseous absorption optical thickness to compute the radiance just above the surface, the irradiance on the surface, and surface reflectance. Alternatively, the program will compute the upward radiance at a specific altitude for a given surface reflectance, view and illumination directions, and aerosol and gaseous absorption optical thickness. The algorithm can be applied for any view and illumination directions and any wavelength in the range 0.48 micron to 2.2 microns. The relation between the measured radiance and surface reflectance, which is expressed as a function of atmospheric properties and measurement geometry, is computed using a radiative transfer routine. The results of the computations are presented in a table which forms the basis of the correction algorithm. The algorithm can be used for atmospheric corrections in the presence of a rural aerosol. The sensitivity of the derived surface reflectance to uncertainties in the model and input data is discussed.
A Hierarchical Algorithm for Fast Debye Summation with Applications to Small Angle Scattering

PubMed Central

Gumerov, Nail A.; Berlin, Konstantin; Fushman, David; Duraiswami, Ramani

2012-01-01

Debye summation, which involves the summation of sinc functions of distances between all pair of atoms in three dimensional space, arises in computations performed in crystallography, small/wide angle X-ray scattering (SAXS/WAXS) and small angle neutron scattering (SANS). Direct evaluation of Debye summation has quadratic complexity, which results in computational bottleneck when determining crystal properties, or running structure refinement protocols that involve SAXS or SANS, even for moderately sized molecules. We present a fast approximation algorithm that efficiently computes the summation to any prescribed accuracy ε in linear time. The algorithm is similar to the fast multipole method (FMM), and is based on a hierarchical spatial decomposition of the molecule coupled with local harmonic expansions and translation of these expansions. An even more efficient implementation is possible when the scattering profile is all that is required, as in small angle scattering reconstruction (SAS) of macromolecules. We examine the relationship of the proposed algorithm to existing approximate methods for profile computations, and show that these methods may result in inaccurate profile computations, unless an error bound derived in this paper is used. Our theoretical and computational results show orders of magnitude improvement in computation complexity over existing methods, while maintaining prescribed accuracy. PMID:22707386
Computer algorithms and applications used to assist the evaluation and treatment of adolescent idiopathic scoliosis: a review of published articles 2000-2009.

PubMed

Phan, Philippe; Mezghani, Neila; Aubin, Carl-Éric; de Guise, Jacques A; Labelle, Hubert

2011-07-01

Adolescent idiopathic scoliosis (AIS) is a complex spinal deformity whose assessment and treatment present many challenges. Computer applications have been developed to assist clinicians. A literature review on computer applications used in AIS evaluation and treatment has been undertaken. The algorithms used, their accuracy and clinical usability were analyzed. Computer applications have been used to create new classifications for AIS based on 2D and 3D features, assess scoliosis severity or risk of progression and assist bracing and surgical treatment. It was found that classification accuracy could be improved using computer algorithms that AIS patient follow-up and screening could be done using surface topography thereby limiting radiation and that bracing and surgical treatment could be optimized using simulations. Yet few computer applications are routinely used in clinics. With the development of 3D imaging and databases, huge amounts of clinical and geometrical data need to be taken into consideration when researching and managing AIS. Computer applications based on advanced algorithms will be able to handle tasks that could otherwise not be done which can possibly improve AIS patients' management. Clinically oriented applications and evidence that they can improve current care will be required for their integration in the clinical setting.
Improved Collaborative Filtering Algorithm via Information Transformation

NASA Astrophysics Data System (ADS)

Liu, Jian-Guo; Wang, Bing-Hong; Guo, Qiang

In this paper, we propose a spreading activation approach for collaborative filtering (SA-CF). By using the opinion spreading process, the similarity between any users can be obtained. The algorithm has remarkably higher accuracy than the standard collaborative filtering using the Pearson correlation. Furthermore, we introduce a free parameter β to regulate the contributions of objects to user-user correlations. The numerical results indicate that decreasing the influence of popular objects can further improve the algorithmic accuracy and personality. We argue that a better algorithm should simultaneously require less computation and generate higher accuracy. Accordingly, we further propose an algorithm involving only the top-N similar neighbors for each target user, which has both less computational complexity and higher algorithmic accuracy.
Parametric inference for biological sequence analysis.

PubMed

Pachter, Lior; Sturmfels, Bernd

2004-11-16

One of the major successes in computational biology has been the unification, by using the graphical model formalism, of a multitude of algorithms for annotating and comparing biological sequences. Graphical models that have been applied to these problems include hidden Markov models for annotation, tree models for phylogenetics, and pair hidden Markov models for alignment. A single algorithm, the sum-product algorithm, solves many of the inference problems that are associated with different statistical models. This article introduces the polytope propagation algorithm for computing the Newton polytope of an observation from a graphical model. This algorithm is a geometric version of the sum-product algorithm and is used to analyze the parametric behavior of maximum a posteriori inference calculations for graphical models.
Research and implementation of the algorithm for unwrapped and distortion correction basing on CORDIC for panoramic image

NASA Astrophysics Data System (ADS)

Zhang, Zhenhai; Li, Kejie; Wu, Xiaobing; Zhang, Shujiang

2008-03-01

The unwrapped and correcting algorithm based on Coordinate Rotation Digital Computer (CORDIC) and bilinear interpolation algorithm was presented in this paper, with the purpose of processing dynamic panoramic annular image. An original annular panoramic image captured by panoramic annular lens (PAL) can be unwrapped and corrected to conventional rectangular image without distortion, which is much more coincident with people's vision. The algorithm for panoramic image processing is modeled by VHDL and implemented in FPGA. The experimental results show that the proposed panoramic image algorithm for unwrapped and distortion correction has the lower computation complexity and the architecture for dynamic panoramic image processing has lower hardware cost and power consumption. And the proposed algorithm is valid.
An Agent Inspired Reconfigurable Computing Implementation of a Genetic Algorithm

NASA Technical Reports Server (NTRS)

Weir, John M.; Wells, B. Earl

2003-01-01

Many software systems have been successfully implemented using an agent paradigm which employs a number of independent entities that communicate with one another to achieve a common goal. The distributed nature of such a paradigm makes it an excellent candidate for use in high speed reconfigurable computing hardware environments such as those present in modem FPGA's. In this paper, a distributed genetic algorithm that can be applied to the agent based reconfigurable hardware model is introduced. The effectiveness of this new algorithm is evaluated by comparing the quality of the solutions found by the new algorithm with those found by traditional genetic algorithms. The performance of a reconfigurable hardware implementation of the new algorithm on an FPGA is compared to traditional single processor implementations.
Computation of multi-dimensional viscous supersonic jet flow

NASA Technical Reports Server (NTRS)

Kim, Y. N.; Buggeln, R. C.; Mcdonald, H.

1986-01-01

A new method has been developed for two- and three-dimensional computations of viscous supersonic flows with embedded subsonic regions adjacent to solid boundaries. The approach employs a reduced form of the Navier-Stokes equations which allows solution as an initial-boundary value problem in space, using an efficient noniterative forward marching algorithm. Numerical instability associated with forward marching algorithms for flows with embedded subsonic regions is avoided by approximation of the reduced form of the Navier-Stokes equations in the subsonic regions of the boundary layers. Supersonic and subsonic portions of the flow field are simultaneously calculated by a consistently split linearized block implicit computational algorithm. The results of computations for a series of test cases relevant to internal supersonic flow is presented and compared with data. Comparison between data and computation are in general excellent thus indicating that the computational technique has great promise as a tool for calculating supersonic flow with embedded subsonic regions. Finally, a User's Manual is presented for the computer code used to perform the calculations.
Computational mechanics analysis tools for parallel-vector supercomputers

NASA Technical Reports Server (NTRS)

Storaasli, Olaf O.; Nguyen, Duc T.; Baddourah, Majdi; Qin, Jiangning

1993-01-01

Computational algorithms for structural analysis on parallel-vector supercomputers are reviewed. These parallel algorithms, developed by the authors, are for the assembly of structural equations, 'out-of-core' strategies for linear equation solution, massively distributed-memory equation solution, unsymmetric equation solution, general eigensolution, geometrically nonlinear finite element analysis, design sensitivity analysis for structural dynamics, optimization search analysis and domain decomposition. The source code for many of these algorithms is available.
A novel combined SLAM based on RBPF-SLAM and EIF-SLAM for mobile system sensing in a large scale environment.

PubMed

He, Bo; Zhang, Shujing; Yan, Tianhong; Zhang, Tao; Liang, Yan; Zhang, Hongjin

2011-01-01

Mobile autonomous systems are very important for marine scientific investigation and military applications. Many algorithms have been studied to deal with the computational efficiency problem required for large scale simultaneous localization and mapping (SLAM) and its related accuracy and consistency. Among these methods, submap-based SLAM is a more effective one. By combining the strength of two popular mapping algorithms, the Rao-Blackwellised particle filter (RBPF) and extended information filter (EIF), this paper presents a combined SLAM-an efficient submap-based solution to the SLAM problem in a large scale environment. RBPF-SLAM is used to produce local maps, which are periodically fused into an EIF-SLAM algorithm. RBPF-SLAM can avoid linearization of the robot model during operating and provide a robust data association, while EIF-SLAM can improve the whole computational speed, and avoid the tendency of RBPF-SLAM to be over-confident. In order to further improve the computational speed in a real time environment, a binary-tree-based decision-making strategy is introduced. Simulation experiments show that the proposed combined SLAM algorithm significantly outperforms currently existing algorithms in terms of accuracy and consistency, as well as the computing efficiency. Finally, the combined SLAM algorithm is experimentally validated in a real environment by using the Victoria Park dataset.
Implementation of a fully-balanced periodic tridiagonal solver on a parallel distributed memory architecture

NASA Technical Reports Server (NTRS)

Eidson, T. M.; Erlebacher, G.

1994-01-01

While parallel computers offer significant computational performance, it is generally necessary to evaluate several programming strategies. Two programming strategies for a fairly common problem - a periodic tridiagonal solver - are developed and evaluated. Simple model calculations as well as timing results are presented to evaluate the various strategies. The particular tridiagonal solver evaluated is used in many computational fluid dynamic simulation codes. The feature that makes this algorithm unique is that these simulation codes usually require simultaneous solutions for multiple right-hand-sides (RHS) of the system of equations. Each RHS solutions is independent and thus can be computed in parallel. Thus a Gaussian elimination type algorithm can be used in a parallel computation and the more complicated approaches such as cyclic reduction are not required. The two strategies are a transpose strategy and a distributed solver strategy. For the transpose strategy, the data is moved so that a subset of all the RHS problems is solved on each of the several processors. This usually requires significant data movement between processor memories across a network. The second strategy attempts to have the algorithm allow the data across processor boundaries in a chained manner. This usually requires significantly less data movement. An approach to accomplish this second strategy in a near-perfect load-balanced manner is developed. In addition, an algorithm will be shown to directly transform a sequential Gaussian elimination type algorithm into the parallel chained, load-balanced algorithm.
Computational Discovery of Materials Using the Firefly Algorithm

NASA Astrophysics Data System (ADS)

Avendaño-Franco, Guillermo; Romero, Aldo

Our current ability to model physical phenomena accurately, the increase computational power and better algorithms are the driving forces behind the computational discovery and design of novel materials, allowing for virtual characterization before their realization in the laboratory. We present the implementation of a novel firefly algorithm, a population-based algorithm for global optimization for searching the structure/composition space. This novel computation-intensive approach naturally take advantage of concurrency, targeted exploration and still keeping enough diversity. We apply the new method in both periodic and non-periodic structures and we present the implementation challenges and solutions to improve efficiency. The implementation makes use of computational materials databases and network analysis to optimize the search and get insights about the geometric structure of local minima on the energy landscape. The method has been implemented in our software PyChemia, an open-source package for materials discovery. We acknowledge the support of DMREF-NSF 1434897 and the Donors of the American Chemical Society Petroleum Research Fund for partial support of this research under Contract 54075-ND10.
Multigrid optimal mass transport for image registration and morphing

NASA Astrophysics Data System (ADS)

Rehman, Tauseef ur; Tannenbaum, Allen

2007-02-01

In this paper we present a computationally efficient Optimal Mass Transport algorithm. This method is based on the Monge-Kantorovich theory and is used for computing elastic registration and warping maps in image registration and morphing applications. This is a parameter free method which utilizes all of the grayscale data in an image pair in a symmetric fashion. No landmarks need to be specified for correspondence. In our work, we demonstrate significant improvement in computation time when our algorithm is applied as compared to the originally proposed method by Haker et al [1]. The original algorithm was based on a gradient descent method for removing the curl from an initial mass preserving map regarded as 2D vector field. This involves inverting the Laplacian in each iteration which is now computed using full multigrid technique resulting in an improvement in computational time by a factor of two. Greater improvement is achieved by decimating the curl in a multi-resolutional framework. The algorithm was applied to 2D short axis cardiac MRI images and brain MRI images for testing and comparison.
Benchmarking neuromorphic vision: lessons learnt from computer vision

PubMed Central

Tan, Cheston; Lallee, Stephane; Orchard, Garrick

2015-01-01

Neuromorphic Vision sensors have improved greatly since the first silicon retina was presented almost three decades ago. They have recently matured to the point where they are commercially available and can be operated by laymen. However, despite improved availability of sensors, there remains a lack of good datasets, while algorithms for processing spike-based visual data are still in their infancy. On the other hand, frame-based computer vision algorithms are far more mature, thanks in part to widely accepted datasets which allow direct comparison between algorithms and encourage competition. We are presented with a unique opportunity to shape the development of Neuromorphic Vision benchmarks and challenges by leveraging what has been learnt from the use of datasets in frame-based computer vision. Taking advantage of this opportunity, in this paper we review the role that benchmarks and challenges have played in the advancement of frame-based computer vision, and suggest guidelines for the creation of Neuromorphic Vision benchmarks and challenges. We also discuss the unique challenges faced when benchmarking Neuromorphic Vision algorithms, particularly when attempting to provide direct comparison with frame-based computer vision. PMID:26528120
Fast parallel molecular algorithms for DNA-based computation: solving the elliptic curve discrete logarithm problem over GF2.

PubMed

Li, Kenli; Zou, Shuting; Xv, Jin

2008-01-01

Elliptic curve cryptographic algorithms convert input data to unrecognizable encryption and the unrecognizable data back again into its original decrypted form. The security of this form of encryption hinges on the enormous difficulty that is required to solve the elliptic curve discrete logarithm problem (ECDLP), especially over GF(2(n)), n in Z+. This paper describes an effective method to find solutions to the ECDLP by means of a molecular computer. We propose that this research accomplishment would represent a breakthrough for applied biological computation and this paper demonstrates that in principle this is possible. Three DNA-based algorithms: a parallel adder, a parallel multiplier, and a parallel inverse over GF(2(n)) are described. The biological operation time of all of these algorithms is polynomial with respect to n. Considering this analysis, cryptography using a public key might be less secure. In this respect, a principal contribution of this paper is to provide enhanced evidence of the potential of molecular computing to tackle such ambitious computations.
Fast Parallel Molecular Algorithms for DNA-Based Computation: Solving the Elliptic Curve Discrete Logarithm Problem over GF(2n)

PubMed Central

Li, Kenli; Zou, Shuting; Xv, Jin

2008-01-01

Elliptic curve cryptographic algorithms convert input data to unrecognizable encryption and the unrecognizable data back again into its original decrypted form. The security of this form of encryption hinges on the enormous difficulty that is required to solve the elliptic curve discrete logarithm problem (ECDLP), especially over GF(2n), n ∈ Z+. This paper describes an effective method to find solutions to the ECDLP by means of a molecular computer. We propose that this research accomplishment would represent a breakthrough for applied biological computation and this paper demonstrates that in principle this is possible. Three DNA-based algorithms: a parallel adder, a parallel multiplier, and a parallel inverse over GF(2n) are described. The biological operation time of all of these algorithms is polynomial with respect to n. Considering this analysis, cryptography using a public key might be less secure. In this respect, a principal contribution of this paper is to provide enhanced evidence of the potential of molecular computing to tackle such ambitious computations. PMID:18431451
Automatic Blocking Of QR and LU Factorizations for Locality

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yi, Q; Kennedy, K; You, H

2004-03-26

QR and LU factorizations for dense matrices are important linear algebra computations that are widely used in scientific applications. To efficiently perform these computations on modern computers, the factorization algorithms need to be blocked when operating on large matrices to effectively exploit the deep cache hierarchy prevalent in today's computer memory systems. Because both QR (based on Householder transformations) and LU factorization algorithms contain complex loop structures, few compilers can fully automate the blocking of these algorithms. Though linear algebra libraries such as LAPACK provides manually blocked implementations of these algorithms, by automatically generating blocked versions of the computations, moremore » benefit can be gained such as automatic adaptation of different blocking strategies. This paper demonstrates how to apply an aggressive loop transformation technique, dependence hoisting, to produce efficient blockings for both QR and LU with partial pivoting. We present different blocking strategies that can be generated by our optimizer and compare the performance of auto-blocked versions with manually tuned versions in LAPACK, both using reference BLAS, ATLAS BLAS and native BLAS specially tuned for the underlying machine architectures.« less
Parallel Algorithms for Computational Models of Geophysical Systems

NASA Astrophysics Data System (ADS)

Carrillo Ledesma, A.; Herrera, I.; de la Cruz, L. M.; Hernández, G.; Grupo de Modelacion Matematica y Computacional

2013-05-01

Mathematical models of many systems of interest, including very important continuous systems of Earth Sciences and Engineering, lead to a great variety of partial differential equations (PDEs) whose solution methods are based on the computational processing of large-scale algebraic systems. Furthermore, the incredible expansion experienced by the existing computational hardware and software has made amenable to effective treatment problems of an ever increasing diversity and complexity, posed by scientific and engineering applications. Parallel computing is outstanding among the new computational tools and, in order to effectively use the most advanced computers available today, massively parallel software is required. Domain decomposition methods (DDMs) have been developed precisely for effectively treating PDEs in paralle. Ideally, the main objective of domain decomposition research is to produce algorithms capable of 'obtaining the global solution by exclusively solving local problems', but up-to-now this has only been an aspiration; that is, a strong desire for achieving such a property and so we call it 'the DDM-paradigm'. In recent times, numerically competitive DDM-algorithms are non-overlapping, preconditioned and necessarily incorporate constraints which pose an additional challenge for achieving the DDM-paradigm. Recently a group of four algorithms, referred to as the 'DVS-algorithms', which fulfill the DDM-paradigm, was developed. To derive them a new discretization method, which uses a non-overlapping system of nodes (the derived-nodes), was introduced. This discretization procedure can be applied to any boundary-value problem, or system of such equations. In turn, the resulting system of discrete equations can be treated using any available DDM-algorithm. In particular, two of the four DVS-algorithms mentioned above were obtained by application of the well-known and very effective algorithms BDDC and FETI-DP; these will be referred to as the DVS-BDDC and DVS-FETI-DP algorithms. The other two, which will be referred to as the DVS-PRIMAL and DVS-DUAL algorithms, were obtained by application of two new algorithms that had not been previously reported in the literature. As said before, the four DVS-algorithms constitute a group of preconditioned and constrained algorithms that, for the first time, fulfill the DDM-paradigm. Both, BDDC and FETI-DP, are very well-known; and both are highly efficient. Recently, it was established that these two methods are closely related and its numerical performance is quite similar. On the other hand, through numerical experiments, we have established that the numerical performances of each one of the members of DVS-algorithms group (DVS-BDDC, DVS-FETI-DP, DVS-PRIMAL and DVS-DUAL) are very similar too. Furthermore, we have carried out comparisons of the performances of the standard versions of BDDC and FETI-DP with DVS-BDDC and DVS-FETI-DP, and in all such numerical experiments the DVS algorithms have performed significantly better.

Strategic Control Algorithm Development : Volume 3. Strategic Algorithm Report.

DOT National Transportation Integrated Search

1974-08-01

The strategic algorithm report presents a detailed description of the functional basic strategic control arrival algorithm. This description is independent of a particular computer or language. Contained in this discussion are the geometrical and env...
Convergence issues in domain decomposition parallel computation of hovering rotor

NASA Astrophysics Data System (ADS)

Xiao, Zhongyun; Liu, Gang; Mou, Bin; Jiang, Xiong

2018-05-01

Implicit LU-SGS time integration algorithm has been widely used in parallel computation in spite of its lack of information from adjacent domains. When applied to parallel computation of hovering rotor flows in a rotating frame, it brings about convergence issues. To remedy the problem, three LU factorization-based implicit schemes (consisting of LU-SGS, DP-LUR and HLU-SGS) are investigated comparatively. A test case of pure grid rotation is designed to verify these algorithms, which show that LU-SGS algorithm introduces errors on boundary cells. When partition boundaries are circumferential, errors arise in proportion to grid speed, accumulating along with the rotation, and leading to computational failure in the end. Meanwhile, DP-LUR and HLU-SGS methods show good convergence owing to boundary treatment which are desirable in domain decomposition parallel computations.
Computing the Envelope for Stepwise-Constant Resource Allocations

NASA Technical Reports Server (NTRS)

Muscettola, Nicola; Clancy, Daniel (Technical Monitor)

2002-01-01

Computing tight resource-level bounds is a fundamental problem in the construction of flexible plans with resource utilization. In this paper we describe an efficient algorithm that builds a resource envelope, the tightest possible such bound. The algorithm is based on transforming the temporal network of resource consuming and producing events into a flow network with nodes equal to the events and edges equal to the necessary predecessor links between events. A staged maximum flow problem on the network is then used to compute the time of occurrence and the height of each step of the resource envelope profile. Each stage has the same computational complexity of solving a maximum flow problem on the entire flow network. This makes this method computationally feasible and promising for use in the inner loop of flexible-time scheduling algorithms.
A multiresolution approach to iterative reconstruction algorithms in X-ray computed tomography.

PubMed

De Witte, Yoni; Vlassenbroeck, Jelle; Van Hoorebeke, Luc

2010-09-01

In computed tomography, the application of iterative reconstruction methods in practical situations is impeded by their high computational demands. Especially in high resolution X-ray computed tomography, where reconstruction volumes contain a high number of volume elements (several giga voxels), this computational burden prevents their actual breakthrough. Besides the large amount of calculations, iterative algorithms require the entire volume to be kept in memory during reconstruction, which quickly becomes cumbersome for large data sets. To overcome this obstacle, we present a novel multiresolution reconstruction, which greatly reduces the required amount of memory without significantly affecting the reconstructed image quality. It is shown that, combined with an efficient implementation on a graphical processing unit, the multiresolution approach enables the application of iterative algorithms in the reconstruction of large volumes at an acceptable speed using only limited resources.
High-performance computing on GPUs for resistivity logging of oil and gas wells

NASA Astrophysics Data System (ADS)

Glinskikh, V.; Dudaev, A.; Nechaev, O.; Surodina, I.

2017-10-01

We developed and implemented into software an algorithm for high-performance simulation of electrical logs from oil and gas wells using high-performance heterogeneous computing. The numerical solution of the 2D forward problem is based on the finite-element method and the Cholesky decomposition for solving a system of linear algebraic equations (SLAE). Software implementations of the algorithm used the NVIDIA CUDA technology and computing libraries are made, allowing us to perform decomposition of SLAE and find its solution on central processor unit (CPU) and graphics processor unit (GPU). The calculation time is analyzed depending on the matrix size and number of its non-zero elements. We estimated the computing speed on CPU and GPU, including high-performance heterogeneous CPU-GPU computing. Using the developed algorithm, we simulated resistivity data in realistic models.
A different Deutsch-Jozsa

NASA Astrophysics Data System (ADS)

Bera, Debajyoti

2015-06-01

One of the early achievements of quantum computing was demonstrated by Deutsch and Jozsa (Proc R Soc Lond A Math Phys Sci 439(1907):553, 1992) regarding classification of a particular type of Boolean functions. Their solution demonstrated an exponential speedup compared to classical approaches to the same problem; however, their solution was the only known quantum algorithm for that specific problem so far. This paper demonstrates another quantum algorithm for the same problem, with the same exponential advantage compared to classical algorithms. The novelty of this algorithm is the use of quantum amplitude amplification, a technique that is the key component of another celebrated quantum algorithm developed by Grover (Proceedings of the twenty-eighth annual ACM symposium on theory of computing, ACM Press, New York, 1996). A lower bound for randomized (classical) algorithms is also presented which establishes a sound gap between the effectiveness of our quantum algorithm and that of any randomized algorithm with similar efficiency.
Multiple feature fusion via covariance matrix for visual tracking

NASA Astrophysics Data System (ADS)

Jin, Zefenfen; Hou, Zhiqiang; Yu, Wangsheng; Wang, Xin; Sun, Hui

2018-04-01

Aiming at the problem of complicated dynamic scenes in visual target tracking, a multi-feature fusion tracking algorithm based on covariance matrix is proposed to improve the robustness of the tracking algorithm. In the frame-work of quantum genetic algorithm, this paper uses the region covariance descriptor to fuse the color, edge and texture features. It also uses a fast covariance intersection algorithm to update the model. The low dimension of region covariance descriptor, the fast convergence speed and strong global optimization ability of quantum genetic algorithm, and the fast computation of fast covariance intersection algorithm are used to improve the computational efficiency of fusion, matching, and updating process, so that the algorithm achieves a fast and effective multi-feature fusion tracking. The experiments prove that the proposed algorithm can not only achieve fast and robust tracking but also effectively handle interference of occlusion, rotation, deformation, motion blur and so on.
A proximity algorithm accelerated by Gauss-Seidel iterations for L1/TV denoising models

NASA Astrophysics Data System (ADS)

Li, Qia; Micchelli, Charles A.; Shen, Lixin; Xu, Yuesheng

2012-09-01

Our goal in this paper is to improve the computational performance of the proximity algorithms for the L1/TV denoising model. This leads us to a new characterization of all solutions to the L1/TV model via fixed-point equations expressed in terms of the proximity operators. Based upon this observation we develop an algorithm for solving the model and establish its convergence. Furthermore, we demonstrate that the proposed algorithm can be accelerated through the use of the componentwise Gauss-Seidel iteration so that the CPU time consumed is significantly reduced. Numerical experiments using the proposed algorithm for impulsive noise removal are included, with a comparison to three recently developed algorithms. The numerical results show that while the proposed algorithm enjoys a high quality of the restored images, as the other three known algorithms do, it performs significantly better in terms of computational efficiency measured in the CPU time consumed.
Empirical study of parallel LRU simulation algorithms

NASA Technical Reports Server (NTRS)

Carr, Eric; Nicol, David M.

1994-01-01

This paper reports on the performance of five parallel algorithms for simulating a fully associative cache operating under the LRU (Least-Recently-Used) replacement policy. Three of the algorithms are SIMD, and are implemented on the MasPar MP-2 architecture. Two other algorithms are parallelizations of an efficient serial algorithm on the Intel Paragon. One SIMD algorithm is quite simple, but its cost is linear in the cache size. The two other SIMD algorithm are more complex, but have costs that are independent on the cache size. Both the second and third SIMD algorithms compute all stack distances; the second SIMD algorithm is completely general, whereas the third SIMD algorithm presumes and takes advantage of bounds on the range of reference tags. Both MIMD algorithm implemented on the Paragon are general and compute all stack distances; they differ in one step that may affect their respective scalability. We assess the strengths and weaknesses of these algorithms as a function of problem size and characteristics, and compare their performance on traces derived from execution of three SPEC benchmark programs.
A Fast Synthetic Aperture Radar Raw Data Simulation Using Cloud Computing.

PubMed

Li, Zhixin; Su, Dandan; Zhu, Haijiang; Li, Wei; Zhang, Fan; Li, Ruirui

2017-01-08

Synthetic Aperture Radar (SAR) raw data simulation is a fundamental problem in radar system design and imaging algorithm research. The growth of surveying swath and resolution results in a significant increase in data volume and simulation period, which can be considered to be a comprehensive data intensive and computing intensive issue. Although several high performance computing (HPC) methods have demonstrated their potential for accelerating simulation, the input/output (I/O) bottleneck of huge raw data has not been eased. In this paper, we propose a cloud computing based SAR raw data simulation algorithm, which employs the MapReduce model to accelerate the raw data computing and the Hadoop distributed file system (HDFS) for fast I/O access. The MapReduce model is designed for the irregular parallel accumulation of raw data simulation, which greatly reduces the parallel efficiency of graphics processing unit (GPU) based simulation methods. In addition, three kinds of optimization strategies are put forward from the aspects of programming model, HDFS configuration and scheduling. The experimental results show that the cloud computing based algorithm achieves 4_ speedup over the baseline serial approach in an 8-node cloud environment, and each optimization strategy can improve about 20%. This work proves that the proposed cloud algorithm is capable of solving the computing intensive and data intensive issues in SAR raw data simulation, and is easily extended to large scale computing to achieve higher acceleration.
Flowfield computation of entry vehicles

NASA Technical Reports Server (NTRS)

Prabhu, Dinesh K.

1990-01-01

The equations governing the multidimensional flow of a reacting mixture of thermally perfect gasses were derived. The modeling procedures for the various terms of the conservation laws are discussed. A numerical algorithm, based on the finite-volume approach, to solve these conservation equations was developed. The advantages and disadvantages of the present numerical scheme are discussed from the point of view of accuracy, computer time, and memory requirements. A simple one-dimensional model problem was solved to prove the feasibility and accuracy of the algorithm. A computer code implementing the above algorithm was developed and is presently being applied to simple geometries and conditions. Once the code is completely debugged and validated, it will be used to compute the complete unsteady flow field around the Aeroassist Flight Experiment (AFE) body.
Ramsey numbers and adiabatic quantum computing.

PubMed

Gaitan, Frank; Clark, Lane

2012-01-06

The graph-theoretic Ramsey numbers are notoriously difficult to calculate. In fact, for the two-color Ramsey numbers R(m,n) with m, n≥3, only nine are currently known. We present a quantum algorithm for the computation of the Ramsey numbers R(m,n). We show how the computation of R(m,n) can be mapped to a combinatorial optimization problem whose solution can be found using adiabatic quantum evolution. We numerically simulate this adiabatic quantum algorithm and show that it correctly determines the Ramsey numbers R(3,3) and R(2,s) for 5≤s≤7. We then discuss the algorithm's experimental implementation, and close by showing that Ramsey number computation belongs to the quantum complexity class quantum Merlin Arthur.
Redundancy management for efficient fault recovery in NASA's distributed computing system

NASA Technical Reports Server (NTRS)

Malek, Miroslaw; Pandya, Mihir; Yau, Kitty

1991-01-01

The management of redundancy in computer systems was studied and guidelines were provided for the development of NASA's fault-tolerant distributed systems. Fault recovery and reconfiguration mechanisms were examined. A theoretical foundation was laid for redundancy management by efficient reconfiguration methods and algorithmic diversity. Algorithms were developed to optimize the resources for embedding of computational graphs of tasks in the system architecture and reconfiguration of these tasks after a failure has occurred. The computational structure represented by a path and the complete binary tree was considered and the mesh and hypercube architectures were targeted for their embeddings. The innovative concept of Hybrid Algorithm Technique was introduced. This new technique provides a mechanism for obtaining fault tolerance while exhibiting improved performance.
Computational algorithms for simulations in atmospheric optics.

PubMed

Konyaev, P A; Lukin, V P

2016-04-20

A computer simulation technique for atmospheric and adaptive optics based on parallel programing is discussed. A parallel propagation algorithm is designed and a modified spectral-phase method for computer generation of 2D time-variant random fields is developed. Temporal power spectra of Laguerre-Gaussian beam fluctuations are considered as an example to illustrate the applications discussed. Implementation of the proposed algorithms using Intel MKL and IPP libraries and NVIDIA CUDA technology is shown to be very fast and accurate. The hardware system for the computer simulation is an off-the-shelf desktop with an Intel Core i7-4790K CPU operating at a turbo-speed frequency up to 5 GHz and an NVIDIA GeForce GTX-960 graphics accelerator with 1024 1.5 GHz processors.
Robust Constrained Blackbox Optimization with Surrogates

DTIC Science & Technology

2015-05-21

algorithms with OPAL . Mathematical Programming Computation, 6(3):233–254, 2014. 6. M.S. Ouali, H. Aoudjit, and C. Audet. Replacement scheduling of a fleet of...Orban. Optimization of Algorithms with OPAL . Mathematical Programming Computation, 6(3), 233-254, September 2014. DISTRIBUTION A: Distribution
Efficient volume computation for three-dimensional hexahedral cells

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dukowicz, J.K.

1988-02-01

Currently, algorithms for computing the volume of hexahedral cells with ''ruled'' surfaces require a minimum of 122 FLOPs (floating point operations) per cell. A new algorithm is described which reduces the operation count to 57 FLOPs per cell. copyright 1988 Academic Press, Inc.
GPU accelerated fuzzy connected image segmentation by using CUDA.

PubMed

Zhuge, Ying; Cao, Yong; Miller, Robert W

2009-01-01

Image segmentation techniques using fuzzy connectedness principles have shown their effectiveness in segmenting a variety of objects in several large applications in recent years. However, one problem of these algorithms has been their excessive computational requirements when processing large image datasets. Nowadays commodity graphics hardware provides high parallel computing power. In this paper, we present a parallel fuzzy connected image segmentation algorithm on Nvidia's Compute Unified Device Architecture (CUDA) platform for segmenting large medical image data sets. Our experiments based on three data sets with small, medium, and large data size demonstrate the efficiency of the parallel algorithm, which achieves a speed-up factor of 7.2x, 7.3x, and 14.4x, correspondingly, for the three data sets over the sequential implementation of fuzzy connected image segmentation algorithm on CPU.
Singular value decomposition for collaborative filtering on a GPU

NASA Astrophysics Data System (ADS)

Kato, Kimikazu; Hosino, Tikara

2010-06-01

A collaborative filtering predicts customers' unknown preferences from known preferences. In a computation of the collaborative filtering, a singular value decomposition (SVD) is needed to reduce the size of a large scale matrix so that the burden for the next phase computation will be decreased. In this application, SVD means a roughly approximated factorization of a given matrix into smaller sized matrices. Webb (a.k.a. Simon Funk) showed an effective algorithm to compute SVD toward a solution of an open competition called "Netflix Prize". The algorithm utilizes an iterative method so that the error of approximation improves in each step of the iteration. We give a GPU version of Webb's algorithm. Our algorithm is implemented in the CUDA and it is shown to be efficient by an experiment.
Assessment of the information content of patterns: an algorithm

NASA Astrophysics Data System (ADS)

Daemi, M. Farhang; Beurle, R. L.

1991-12-01

A preliminary investigation confirmed the possibility of assessing the translational and rotational information content of simple artificial images. The calculation is tedious, and for more realistic patterns it is essential to implement the method on a computer. This paper describes an algorithm developed for this purpose which confirms the results of the preliminary investigation. Use of the algorithm facilitates much more comprehensive analysis of the combined effect of continuous rotation and fine translation, and paves the way for analysis of more realistic patterns. Owing to the volume of calculation involved in these algorithms, extensive computing facilities were necessary. The major part of the work was carried out using an ICL 3900 series mainframe computer as well as other powerful workstations such as a RISC architecture MIPS machine.
Tomography by iterative convolution - Empirical study and application to interferometry

NASA Technical Reports Server (NTRS)

Vest, C. M.; Prikryl, I.

1984-01-01

An algorithm for computer tomography has been developed that is applicable to reconstruction from data having incomplete projections because an opaque object blocks some of the probing radiation as it passes through the object field. The algorithm is based on iteration between the object domain and the projection (Radon transform) domain. Reconstructions are computed during each iteration by the well-known convolution method. Although it is demonstrated that this algorithm does not converge, an empirically justified criterion for terminating the iteration when the most accurate estimate has been computed is presented. The algorithm has been studied by using it to reconstruct several different object fields with several different opaque regions. It also has been used to reconstruct aerodynamic density fields from interferometric data recorded in wind tunnel tests.

Solving the Cauchy-Riemann equations on parallel computers

NASA Technical Reports Server (NTRS)

Fatoohi, Raad A.; Grosch, Chester E.

1987-01-01

Discussed is the implementation of a single algorithm on three parallel-vector computers. The algorithm is a relaxation scheme for the solution of the Cauchy-Riemann equations; a set of coupled first order partial differential equations. The computers were chosen so as to encompass a variety of architectures. They are: the MPP, and SIMD machine with 16K bit serial processors; FLEX/32, an MIMD machine with 20 processors; and CRAY/2, an MIMD machine with four vector processors. The machine architectures are briefly described. The implementation of the algorithm is discussed in relation to these architectures and measures of the performance on each machine are given. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for this algorithm on these architectures. Conclusions are presented.
Efficient Pricing Technique for Resource Allocation Problem in Downlink OFDM Cognitive Radio Networks

NASA Astrophysics Data System (ADS)

Abdulghafoor, O. B.; Shaat, M. M. R.; Ismail, M.; Nordin, R.; Yuwono, T.; Alwahedy, O. N. A.

2017-05-01

In this paper, the problem of resource allocation in OFDM-based downlink cognitive radio (CR) networks has been proposed. The purpose of this research is to decrease the computational complexity of the resource allocation algorithm for downlink CR network while concerning the interference constraint of primary network. The objective has been secured by adopting pricing scheme to develop power allocation algorithm with the following concerns: (i) reducing the complexity of the proposed algorithm and (ii) providing firm power control to the interference introduced to primary users (PUs). The performance of the proposed algorithm is tested for OFDM- CRNs. The simulation results show that the performance of the proposed algorithm approached the performance of the optimal algorithm at a lower computational complexity, i.e., O(NlogN), which makes the proposed algorithm suitable for more practical applications.
Active control of impulsive noise with symmetric α-stable distribution based on an improved step-size normalized adaptive algorithm

NASA Astrophysics Data System (ADS)

Zhou, Yali; Zhang, Qizhi; Yin, Yixin

2015-05-01

In this paper, active control of impulsive noise with symmetric α-stable (SαS) distribution is studied. A general step-size normalized filtered-x Least Mean Square (FxLMS) algorithm is developed based on the analysis of existing algorithms, and the Gaussian distribution function is used to normalize the step size. Compared with existing algorithms, the proposed algorithm needs neither the parameter selection and thresholds estimation nor the process of cost function selection and complex gradient computation. Computer simulations have been carried out to suggest that the proposed algorithm is effective for attenuating SαS impulsive noise, and then the proposed algorithm has been implemented in an experimental ANC system. Experimental results show that the proposed scheme has good performance for SαS impulsive noise attenuation.
Realization of preconditioned Lanczos and conjugate gradient algorithms on optical linear algebra processors.

PubMed

Ghosh, A

1988-08-01

Lanczos and conjugate gradient algorithms are important in computational linear algebra. In this paper, a parallel pipelined realization of these algorithms on a ring of optical linear algebra processors is described. The flow of data is designed to minimize the idle times of the optical multiprocessor and the redundancy of computations. The effects of optical round-off errors on the solutions obtained by the optical Lanczos and conjugate gradient algorithms are analyzed, and it is shown that optical preconditioning can improve the accuracy of these algorithms substantially. Algorithms for optical preconditioning and results of numerical experiments on solving linear systems of equations arising from partial differential equations are discussed. Since the Lanczos algorithm is used mostly with sparse matrices, a folded storage scheme to represent sparse matrices on spatial light modulators is also described.
Software For Genetic Algorithms

NASA Technical Reports Server (NTRS)

Wang, Lui; Bayer, Steve E.

1992-01-01

SPLICER computer program is genetic-algorithm software tool used to solve search and optimization problems. Provides underlying framework and structure for building genetic-algorithm application program. Written in Think C.
Atrial Fibrillation Screening in Nonmetropolitan Areas Using a Telehealth Surveillance System With an Embedded Cloud-Computing Algorithm: Prospective Pilot Study.

PubMed

Chen, Ying-Hsien; Hung, Chi-Sheng; Huang, Ching-Chang; Hung, Yu-Chien; Hwang, Juey-Jen; Ho, Yi-Lwun

2017-09-26

Atrial fibrillation (AF) is a common form of arrhythmia that is associated with increased risk of stroke and mortality. Detecting AF before the first complication occurs is a recognized priority. No previous studies have examined the feasibility of undertaking AF screening using a telehealth surveillance system with an embedded cloud-computing algorithm; we address this issue in this study. The objective of this study was to evaluate the feasibility of AF screening in nonmetropolitan areas using a telehealth surveillance system with an embedded cloud-computing algorithm. We conducted a prospective AF screening study in a nonmetropolitan area using a single-lead electrocardiogram (ECG) recorder. All ECG measurements were reviewed on the telehealth surveillance system and interpreted by the cloud-computing algorithm and a cardiologist. The process of AF screening was evaluated with a satisfaction questionnaire. Between March 11, 2016 and August 31, 2016, 967 ECGs were recorded from 922 residents in nonmetropolitan areas. A total of 22 (2.4%, 22/922) residents with AF were identified by the physician's ECG interpretation, and only 0.2% (2/967) of ECGs contained significant artifacts. The novel cloud-computing algorithm for AF detection had a sensitivity of 95.5% (95% CI 77.2%-99.9%) and specificity of 97.7% (95% CI 96.5%-98.5%). The overall satisfaction score for the process of AF screening was 92.1%. AF screening in nonmetropolitan areas using a telehealth surveillance system with an embedded cloud-computing algorithm is feasible. ©Ying-Hsien Chen, Chi-Sheng Hung, Ching-Chang Huang, Yu-Chien Hung, Juey-Jen Hwang, Yi-Lwun Ho. Originally published in JMIR Mhealth and Uhealth (http://mhealth.jmir.org), 26.09.2017.
Evaluation of six TPS algorithms in computing entrance and exit doses.

PubMed

Tan, Yun I; Metwaly, Mohamed; Glegg, Martin; Baggarley, Shaun; Elliott, Alex

2014-05-08

Entrance and exit doses are commonly measured in in vivo dosimetry for comparison with expected values, usually generated by the treatment planning system (TPS), to verify accuracy of treatment delivery. This report aims to evaluate the accuracy of six TPS algorithms in computing entrance and exit doses for a 6 MV beam. The algorithms tested were: pencil beam convolution (Eclipse PBC), analytical anisotropic algorithm (Eclipse AAA), AcurosXB (Eclipse AXB), FFT convolution (XiO Convolution), multigrid superposition (XiO Superposition), and Monte Carlo photon (Monaco MC). Measurements with ionization chamber (IC) and diode detector in water phantoms were used as a reference. Comparisons were done in terms of central axis point dose, 1D relative profiles, and 2D absolute gamma analysis. Entrance doses computed by all TPS algorithms agreed to within 2% of the measured values. Exit doses computed by XiO Convolution, XiO Superposition, Eclipse AXB, and Monaco MC agreed with the IC measured doses to within 2%-3%. Meanwhile, Eclipse PBC and Eclipse AAA computed exit doses were higher than the IC measured doses by up to 5.3% and 4.8%, respectively. Both algorithms assume that full backscatter exists even at the exit level, leading to an overestimation of exit doses. Despite good agreements at the central axis for Eclipse AXB and Monaco MC, 1D relative comparisons showed profiles mismatched at depths beyond 11.5 cm. Overall, the 2D absolute gamma (3%/3 mm) pass rates were better for Monaco MC, while Eclipse AXB failed mostly at the outer 20% of the field area. The findings of this study serve as a useful baseline for the implementation of entrance and exit in vivo dosimetry in clinical departments utilizing any of these six common TPS algorithms for reference comparison.
BWM*: A Novel, Provable, Ensemble-based Dynamic Programming Algorithm for Sparse Approximations of Computational Protein Design.

PubMed

Jou, Jonathan D; Jain, Swati; Georgiev, Ivelin S; Donald, Bruce R

2016-06-01

Sparse energy functions that ignore long range interactions between residue pairs are frequently used by protein design algorithms to reduce computational cost. Current dynamic programming algorithms that fully exploit the optimal substructure produced by these energy functions only compute the GMEC. This disproportionately favors the sequence of a single, static conformation and overlooks better binding sequences with multiple low-energy conformations. Provable, ensemble-based algorithms such as A* avoid this problem, but A* cannot guarantee better performance than exhaustive enumeration. We propose a novel, provable, dynamic programming algorithm called Branch-Width Minimization* (BWM*) to enumerate a gap-free ensemble of conformations in order of increasing energy. Given a branch-decomposition of branch-width w for an n-residue protein design with at most q discrete side-chain conformations per residue, BWM* returns the sparse GMEC in O([Formula: see text]) time and enumerates each additional conformation in merely O([Formula: see text]) time. We define a new measure, Total Effective Search Space (TESS), which can be computed efficiently a priori before BWM* or A* is run. We ran BWM* on 67 protein design problems and found that TESS discriminated between BWM*-efficient and A*-efficient cases with 100% accuracy. As predicted by TESS and validated experimentally, BWM* outperforms A* in 73% of the cases and computes the full ensemble or a close approximation faster than A*, enumerating each additional conformation in milliseconds. Unlike A*, the performance of BWM* can be predicted in polynomial time before running the algorithm, which gives protein designers the power to choose the most efficient algorithm for their particular design problem.
Implementation of Tree and Butterfly Barriers with Optimistic Time Management Algorithms for Discrete Event Simulation

NASA Astrophysics Data System (ADS)

Rizvi, Syed S.; Shah, Dipali; Riasat, Aasia

The Time Wrap algorithm [3] offers a run time recovery mechanism that deals with the causality errors. These run time recovery mechanisms consists of rollback, anti-message, and Global Virtual Time (GVT) techniques. For rollback, there is a need to compute GVT which is used in discrete-event simulation to reclaim the memory, commit the output, detect the termination, and handle the errors. However, the computation of GVT requires dealing with transient message problem and the simultaneous reporting problem. These problems can be dealt in an efficient manner by the Samadi's algorithm [8] which works fine in the presence of causality errors. However, the performance of both Time Wrap and Samadi's algorithms depends on the latency involve in GVT computation. Both algorithms give poor latency for large simulation systems especially in the presence of causality errors. To improve the latency and reduce the processor ideal time, we implement tree and butterflies barriers with the optimistic algorithm. Our analysis shows that the use of synchronous barriers such as tree and butterfly with the optimistic algorithm not only minimizes the GVT latency but also minimizes the processor idle time.
Sublattice parallel replica dynamics.

PubMed

Martínez, Enrique; Uberuaga, Blas P; Voter, Arthur F

2014-06-01

Exascale computing presents a challenge for the scientific community as new algorithms must be developed to take full advantage of the new computing paradigm. Atomistic simulation methods that offer full fidelity to the underlying potential, i.e., molecular dynamics (MD) and parallel replica dynamics, fail to use the whole machine speedup, leaving a region in time and sample size space that is unattainable with current algorithms. In this paper, we present an extension of the parallel replica dynamics algorithm [A. F. Voter, Phys. Rev. B 57, R13985 (1998)] by combining it with the synchronous sublattice approach of Shim and Amar [ and , Phys. Rev. B 71, 125432 (2005)], thereby exploiting event locality to improve the algorithm scalability. This algorithm is based on a domain decomposition in which events happen independently in different regions in the sample. We develop an analytical expression for the speedup given by this sublattice parallel replica dynamics algorithm and compare it with parallel MD and traditional parallel replica dynamics. We demonstrate how this algorithm, which introduces a slight additional approximation of event locality, enables the study of physical systems unreachable with traditional methodologies and promises to better utilize the resources of current high performance and future exascale computers.
Binary mesh partitioning for cache-efficient visualization.

PubMed

Tchiboukdjian, Marc; Danjean, Vincent; Raffin, Bruno

2010-01-01

One important bottleneck when visualizing large data sets is the data transfer between processor and memory. Cache-aware (CA) and cache-oblivious (CO) algorithms take into consideration the memory hierarchy to design cache efficient algorithms. CO approaches have the advantage to adapt to unknown and varying memory hierarchies. Recent CA and CO algorithms developed for 3D mesh layouts significantly improve performance of previous approaches, but they lack of theoretical performance guarantees. We present in this paper a {\\schmi O}(N\\log N) algorithm to compute a CO layout for unstructured but well shaped meshes. We prove that a coherent traversal of a N-size mesh in dimension d induces less than N/B+{\\schmi O}(N/M;{1/d}) cache-misses where B and M are the block size and the cache size, respectively. Experiments show that our layout computation is faster and significantly less memory consuming than the best known CO algorithm. Performance is comparable to this algorithm for classical visualization algorithm access patterns, or better when the BSP tree produced while computing the layout is used as an acceleration data structure adjusted to the layout. We also show that cache oblivious approaches lead to significant performance increases on recent GPU architectures.
A hardware-oriented concurrent TZ search algorithm for High-Efficiency Video Coding

NASA Astrophysics Data System (ADS)

Doan, Nghia; Kim, Tae Sung; Rhee, Chae Eun; Lee, Hyuk-Jae

2017-12-01

High-Efficiency Video Coding (HEVC) is the latest video coding standard, in which the compression performance is double that of its predecessor, the H.264/AVC standard, while the video quality remains unchanged. In HEVC, the test zone (TZ) search algorithm is widely used for integer motion estimation because it effectively searches the good-quality motion vector with a relatively small amount of computation. However, the complex computation structure of the TZ search algorithm makes it difficult to implement it in the hardware. This paper proposes a new integer motion estimation algorithm which is designed for hardware execution by modifying the conventional TZ search to allow parallel motion estimations of all prediction unit (PU) partitions. The algorithm consists of the three phases of zonal, raster, and refinement searches. At the beginning of each phase, the algorithm obtains the search points required by the original TZ search for all PU partitions in a coding unit (CU). Then, all redundant search points are removed prior to the estimation of the motion costs, and the best search points are then selected for all PUs. Compared to the conventional TZ search algorithm, experimental results show that the proposed algorithm significantly decreases the Bjøntegaard Delta bitrate (BD-BR) by 0.84%, and it also reduces the computational complexity by 54.54%.
Multi-period project portfolio selection under risk considerations and stochastic income

NASA Astrophysics Data System (ADS)

Tofighian, Ali Asghar; Moezzi, Hamid; Khakzar Barfuei, Morteza; Shafiee, Mahmood

2018-02-01

This paper deals with multi-period project portfolio selection problem. In this problem, the available budget is invested on the best portfolio of projects in each period such that the net profit is maximized. We also consider more realistic assumptions to cover wider range of applications than those reported in previous studies. A novel mathematical model is presented to solve the problem, considering risks, stochastic incomes, and possibility of investing extra budget in each time period. Due to the complexity of the problem, an effective meta-heuristic method hybridized with a local search procedure is presented to solve the problem. The algorithm is based on genetic algorithm (GA), which is a prominent method to solve this type of problems. The GA is enhanced by a new solution representation and well selected operators. It also is hybridized with a local search mechanism to gain better solution in shorter time. The performance of the proposed algorithm is then compared with well-known algorithms, like basic genetic algorithm (GA), particle swarm optimization (PSO), and electromagnetism-like algorithm (EM-like) by means of some prominent indicators. The computation results show the superiority of the proposed algorithm in terms of accuracy, robustness and computation time. At last, the proposed algorithm is wisely combined with PSO to improve the computing time considerably.
Verifying a Computer Algorithm Mathematically.

ERIC Educational Resources Information Center

Olson, Alton T.

1986-01-01

Presents an example of mathematics from an algorithmic point of view, with emphasis on the design and verification of this algorithm. The program involves finding roots for algebraic equations using the half-interval search algorithm. The program listing is included. (JN)
Undergraduate computational physics projects on quantum computing

NASA Astrophysics Data System (ADS)

Candela, D.

2015-08-01

Computational projects on quantum computing suitable for students in a junior-level quantum mechanics course are described. In these projects students write their own programs to simulate quantum computers. Knowledge is assumed of introductory quantum mechanics through the properties of spin 1/2. Initial, more easily programmed projects treat the basics of quantum computation, quantum gates, and Grover's quantum search algorithm. These are followed by more advanced projects to increase the number of qubits and implement Shor's quantum factoring algorithm. The projects can be run on a typical laptop or desktop computer, using most programming languages. Supplementing resources available elsewhere, the projects are presented here in a self-contained format especially suitable for a short computational module for physics students.
Computer sciences

NASA Technical Reports Server (NTRS)

Smith, Paul H.

1988-01-01

The Computer Science Program provides advanced concepts, techniques, system architectures, algorithms, and software for both space and aeronautics information sciences and computer systems. The overall goal is to provide the technical foundation within NASA for the advancement of computing technology in aerospace applications. The research program is improving the state of knowledge of fundamental aerospace computing principles and advancing computing technology in space applications such as software engineering and information extraction from data collected by scientific instruments in space. The program includes the development of special algorithms and techniques to exploit the computing power provided by high performance parallel processors and special purpose architectures. Research is being conducted in the fundamentals of data base logic and improvement techniques for producing reliable computing systems.
X-Ray Radiography of Gas Turbine Ceramics.

DTIC Science & Technology

1979-10-20

Microfocus X-ray equipment. 1a4ihe definition of equipment concepts for a computer assisted tomography ( CAT ) system; and 4ffthe development of a CAT ...were obtained from these test coupons using Microfocus X-ray and image en- hancement techniques. A Computer Assisted Tomography ( CAT ) design concept...monitor. Computer reconstruction algorithms were investigated with respect to CAT and a preferred approach was determined. An appropriate CAT algorithm
Fast computation of the multivariable stability margin for real interrelated uncertain parameters

NASA Technical Reports Server (NTRS)

Sideris, Athanasios; Sanchez Pena, Ricardo S.

1988-01-01

A novel algorithm for computing the multivariable stability margin for checking the robust stability of feedback systems with real parametric uncertainty is proposed. This method eliminates the need for the frequency search involved in another given algorithm by reducing it to checking a finite number of conditions. These conditions have a special structure, which allows a significant improvement on the speed of computations.
Nonlinear Computational Aeroelasticity: Formulations and Solution Algorithms

DTIC Science & Technology

2003-03-01

problem is proposed. Fluid-structure coupling algorithms are then discussed with some emphasis on distributed computing strategies. Numerical results...the structure and the exchange of structure motion to the fluid. The computational fluid dynamics code PFES is our finite element code for the numerical ...unstructured meshes). It was numerically demonstrated [1-3] that EBS can be less diffusive than SUPG [4-6] and the standard Finite Volume schemes
Sigint Application for Polymorphous Computing Architecture (PCA): Wideband DF

DTIC Science & Technology

2006-08-01

Polymorphous Computing Architecture (PCA) program as stated by Robert Graybill is to Develop the computing foundation for agile systems by establishing...ubiquitous MUSIC algorithm rely upon an underlying narrowband signal model [8]. In this case, narrowband means that the signal bandwidth is less than...a wideband DF algorithm is needed to compensate for this model inadequacy. Among the various wideband DF techniques available, the coherent signal

Highly Scalable Matching Pursuit Signal Decomposition Algorithm

NASA Technical Reports Server (NTRS)

Christensen, Daniel; Das, Santanu; Srivastava, Ashok N.

2009-01-01

Matching Pursuit Decomposition (MPD) is a powerful iterative algorithm for signal decomposition and feature extraction. MPD decomposes any signal into linear combinations of its dictionary elements or atoms . A best fit atom from an arbitrarily defined dictionary is determined through cross-correlation. The selected atom is subtracted from the signal and this procedure is repeated on the residual in the subsequent iterations until a stopping criterion is met. The reconstructed signal reveals the waveform structure of the original signal. However, a sufficiently large dictionary is required for an accurate reconstruction; this in return increases the computational burden of the algorithm, thus limiting its applicability and level of adoption. The purpose of this research is to improve the scalability and performance of the classical MPD algorithm. Correlation thresholds were defined to prune insignificant atoms from the dictionary. The Coarse-Fine Grids and Multiple Atom Extraction techniques were proposed to decrease the computational burden of the algorithm. The Coarse-Fine Grids method enabled the approximation and refinement of the parameters for the best fit atom. The ability to extract multiple atoms within a single iteration enhanced the effectiveness and efficiency of each iteration. These improvements were implemented to produce an improved Matching Pursuit Decomposition algorithm entitled MPD++. Disparate signal decomposition applications may require a particular emphasis of accuracy or computational efficiency. The prominence of the key signal features required for the proper signal classification dictates the level of accuracy necessary in the decomposition. The MPD++ algorithm may be easily adapted to accommodate the imposed requirements. Certain feature extraction applications may require rapid signal decomposition. The full potential of MPD++ may be utilized to produce incredible performance gains while extracting only slightly less energy than the standard algorithm. When the utmost accuracy must be achieved, the modified algorithm extracts atoms more conservatively but still exhibits computational gains over classical MPD. The MPD++ algorithm was demonstrated using an over-complete dictionary on real life data. Computational times were reduced by factors of 1.9 and 44 for the emphases of accuracy and performance, respectively. The modified algorithm extracted similar amounts of energy compared to classical MPD. The degree of the improvement in computational time depends on the complexity of the data, the initialization parameters, and the breadth of the dictionary. The results of the research confirm that the three modifications successfully improved the scalability and computational efficiency of the MPD algorithm. Correlation Thresholding decreased the time complexity by reducing the dictionary size. Multiple Atom Extraction also reduced the time complexity by decreasing the number of iterations required for a stopping criterion to be reached. The Course-Fine Grids technique enabled complicated atoms with numerous variable parameters to be effectively represented in the dictionary. Due to the nature of the three proposed modifications, they are capable of being stacked and have cumulative effects on the reduction of the time complexity.
The Cyborg Astrobiologist: testing a novelty detection algorithm on two mobile exploration systems at Rivas Vaciamadrid in Spain and at the Mars Desert Research Station in Utah

NASA Astrophysics Data System (ADS)

McGuire, P. C.; Gross, C.; Wendt, L.; Bonnici, A.; Souza-Egipsy, V.; Ormö, J.; Díaz-Martínez, E.; Foing, B. H.; Bose, R.; Walter, S.; Oesker, M.; Ontrup, J.; Haschke, R.; Ritter, H.

2010-01-01

In previous work, a platform was developed for testing computer-vision algorithms for robotic planetary exploration. This platform consisted of a digital video camera connected to a wearable computer for real-time processing of images at geological and astrobiological field sites. The real-time processing included image segmentation and the generation of interest points based upon uncommonness in the segmentation maps. Also in previous work, this platform for testing computer-vision algorithms has been ported to a more ergonomic alternative platform, consisting of a phone camera connected via the Global System for Mobile Communications (GSM) network to a remote-server computer. The wearable-computer platform has been tested at geological and astrobiological field sites in Spain (Rivas Vaciamadrid and Riba de Santiuste), and the phone camera has been tested at a geological field site in Malta. In this work, we (i) apply a Hopfield neural-network algorithm for novelty detection based upon colour, (ii) integrate a field-capable digital microscope on the wearable computer platform, (iii) test this novelty detection with the digital microscope at Rivas Vaciamadrid, (iv) develop a Bluetooth communication mode for the phone-camera platform, in order to allow access to a mobile processing computer at the field sites, and (v) test the novelty detection on the Bluetooth-enabled phone camera connected to a netbook computer at the Mars Desert Research Station in Utah. This systems engineering and field testing have together allowed us to develop a real-time computer-vision system that is capable, for example, of identifying lichens as novel within a series of images acquired in semi-arid desert environments. We acquired sequences of images of geologic outcrops in Utah and Spain consisting of various rock types and colours to test this algorithm. The algorithm robustly recognized previously observed units by their colour, while requiring only a single image or a few images to learn colours as familiar, demonstrating its fast learning capability.
The science of computing - Parallel computation

NASA Technical Reports Server (NTRS)

Denning, P. J.

1985-01-01

Although parallel computation architectures have been known for computers since the 1920s, it was only in the 1970s that microelectronic components technologies advanced to the point where it became feasible to incorporate multiple processors in one machine. Concommitantly, the development of algorithms for parallel processing also lagged due to hardware limitations. The speed of computing with solid-state chips is limited by gate switching delays. The physical limit implies that a 1 Gflop operational speed is the maximum for sequential processors. A computer recently introduced features a 'hypercube' architecture with 128 processors connected in networks at 5, 6 or 7 points per grid, depending on the design choice. Its computing speed rivals that of supercomputers, but at a fraction of the cost. The added speed with less hardware is due to parallel processing, which utilizes algorithms representing different parts of an equation that can be broken into simpler statements and processed simultaneously. Present, highly developed computer languages like FORTRAN, PASCAL, COBOL, etc., rely on sequential instructions. Thus, increased emphasis will now be directed at parallel processing algorithms to exploit the new architectures.
System identification using Nuclear Norm & Tabu Search optimization

NASA Astrophysics Data System (ADS)

Ahmed, Asif A.; Schoen, Marco P.; Bosworth, Ken W.

2018-01-01

In recent years, subspace System Identification (SI) algorithms have seen increased research, stemming from advanced minimization methods being applied to the Nuclear Norm (NN) approach in system identification. These minimization algorithms are based on hard computing methodologies. To the authors’ knowledge, as of now, there has been no work reported that utilizes soft computing algorithms to address the minimization problem within the nuclear norm SI framework. A linear, time-invariant, discrete time system is used in this work as the basic model for characterizing a dynamical system to be identified. The main objective is to extract a mathematical model from collected experimental input-output data. Hankel matrices are constructed from experimental data, and the extended observability matrix is employed to define an estimated output of the system. This estimated output and the actual - measured - output are utilized to construct a minimization problem. An embedded rank measure assures minimum state realization outcomes. Current NN-SI algorithms employ hard computing algorithms for minimization. In this work, we propose a simple Tabu Search (TS) algorithm for minimization. TS algorithm based SI is compared with the iterative Alternating Direction Method of Multipliers (ADMM) line search optimization based NN-SI. For comparison, several different benchmark system identification problems are solved by both approaches. Results show improved performance of the proposed SI-TS algorithm compared to the NN-SI ADMM algorithm.
Computing border bases using mutant strategies

NASA Astrophysics Data System (ADS)

Ullah, E.; Abbas Khan, S.

2014-01-01

Border bases, a generalization of Gröbner bases, have actively been addressed during recent years due to their applicability to industrial problems. In cryptography and coding theory a useful application of border based is to solve zero-dimensional systems of polynomial equations over finite fields, which motivates us for developing optimizations of the algorithms that compute border bases. In 2006, Kehrein and Kreuzer formulated the Border Basis Algorithm (BBA), an algorithm which allows the computation of border bases that relate to a degree compatible term ordering. In 2007, J. Ding et al. introduced mutant strategies bases on finding special lower degree polynomials in the ideal. The mutant strategies aim to distinguish special lower degree polynomials (mutants) from the other polynomials and give them priority in the process of generating new polynomials in the ideal. In this paper we develop hybrid algorithms that use the ideas of J. Ding et al. involving the concept of mutants to optimize the Border Basis Algorithm for solving systems of polynomial equations over finite fields. In particular, we recall a version of the Border Basis Algorithm which is actually called the Improved Border Basis Algorithm and propose two hybrid algorithms, called MBBA and IMBBA. The new mutants variants provide us space efficiency as well as time efficiency. The efficiency of these newly developed hybrid algorithms is discussed using standard cryptographic examples.
Natural Inspired Intelligent Visual Computing and Its Application to Viticulture.

PubMed

Ang, Li Minn; Seng, Kah Phooi; Ge, Feng Lu

2017-05-23

This paper presents an investigation of natural inspired intelligent computing and its corresponding application towards visual information processing systems for viticulture. The paper has three contributions: (1) a review of visual information processing applications for viticulture; (2) the development of natural inspired computing algorithms based on artificial immune system (AIS) techniques for grape berry detection; and (3) the application of the developed algorithms towards real-world grape berry images captured in natural conditions from vineyards in Australia. The AIS algorithms in (2) were developed based on a nature-inspired clonal selection algorithm (CSA) which is able to detect the arcs in the berry images with precision, based on a fitness model. The arcs detected are then extended to perform the multiple arcs and ring detectors information processing for the berry detection application. The performance of the developed algorithms were compared with traditional image processing algorithms like the circular Hough transform (CHT) and other well-known circle detection methods. The proposed AIS approach gave a Fscore of 0.71 compared with Fscores of 0.28 and 0.30 for the CHT and a parameter-free circle detection technique (RPCD) respectively.
Detecting chaos in irregularly sampled time series.

PubMed

Kulp, C W

2013-09-01

Recently, Wiebe and Virgin [Chaos 22, 013136 (2012)] developed an algorithm which detects chaos by analyzing a time series' power spectrum which is computed using the Discrete Fourier Transform (DFT). Their algorithm, like other time series characterization algorithms, requires that the time series be regularly sampled. Real-world data, however, are often irregularly sampled, thus, making the detection of chaotic behavior difficult or impossible with those methods. In this paper, a characterization algorithm is presented, which effectively detects chaos in irregularly sampled time series. The work presented here is a modification of Wiebe and Virgin's algorithm and uses the Lomb-Scargle Periodogram (LSP) to compute a series' power spectrum instead of the DFT. The DFT is not appropriate for irregularly sampled time series. However, the LSP is capable of computing the frequency content of irregularly sampled data. Furthermore, a new method of analyzing the power spectrum is developed, which can be useful for differentiating between chaotic and non-chaotic behavior. The new characterization algorithm is successfully applied to irregularly sampled data generated by a model as well as data consisting of observations of variable stars.
A Parallel Point Matching Algorithm for Landmark Based Image Registration Using Multicore Platform

PubMed Central

Yang, Lin; Gong, Leiguang; Zhang, Hong; Nosher, John L.; Foran, David J.

2013-01-01

Point matching is crucial for many computer vision applications. Establishing the correspondence between a large number of data points is a computationally intensive process. Some point matching related applications, such as medical image registration, require real time or near real time performance if applied to critical clinical applications like image assisted surgery. In this paper, we report a new multicore platform based parallel algorithm for fast point matching in the context of landmark based medical image registration. We introduced a non-regular data partition algorithm which utilizes the K-means clustering algorithm to group the landmarks based on the number of available processing cores, which optimize the memory usage and data transfer. We have tested our method using the IBM Cell Broadband Engine (Cell/B.E.) platform. The results demonstrated a significant speed up over its sequential implementation. The proposed data partition and parallelization algorithm, though tested only on one multicore platform, is generic by its design. Therefore the parallel algorithm can be extended to other computing platforms, as well as other point matching related applications. PMID:24308014
Algorithm and code development for unsteady three-dimensional Navier-Stokes equations

NASA Technical Reports Server (NTRS)

Obayashi, Shigeru

1991-01-01

A streamwise upwind algorithm for solving the unsteady 3-D Navier-Stokes equations was extended to handle the moving grid system. It is noted that the finite volume concept is essential to extend the algorithm. The resulting algorithm is conservative for any motion of the coordinate system. Two extensions to an implicit method were considered and the implicit extension that makes the algorithm computationally efficient is implemented into Ames's aeroelasticity code, ENSAERO. The new flow solver has been validated through the solution of test problems. Test cases include three-dimensional problems with fixed and moving grids. The first test case shown is an unsteady viscous flow over an F-5 wing, while the second test considers the motion of the leading edge vortex as well as the motion of the shock wave for a clipped delta wing. The resulting algorithm has been implemented into ENSAERO. The upwind version leads to higher accuracy in both steady and unsteady computations than the previously used central-difference method does, while the increase in the computational time is small.
An imperialist competitive algorithm for virtual machine placement in cloud computing

NASA Astrophysics Data System (ADS)

Jamali, Shahram; Malektaji, Sepideh; Analoui, Morteza

2017-05-01

Cloud computing, the recently emerged revolution in IT industry, is empowered by virtualisation technology. In this paradigm, the user's applications run over some virtual machines (VMs). The process of selecting proper physical machines to host these virtual machines is called virtual machine placement. It plays an important role on resource utilisation and power efficiency of cloud computing environment. In this paper, we propose an imperialist competitive-based algorithm for the virtual machine placement problem called ICA-VMPLC. The base optimisation algorithm is chosen to be ICA because of its ease in neighbourhood movement, good convergence rate and suitable terminology. The proposed algorithm investigates search space in a unique manner to efficiently obtain optimal placement solution that simultaneously minimises power consumption and total resource wastage. Its final solution performance is compared with several existing methods such as grouping genetic and ant colony-based algorithms as well as bin packing heuristic. The simulation results show that the proposed method is superior to other tested algorithms in terms of power consumption, resource wastage, CPU usage efficiency and memory usage efficiency.
A method of optimized neural network by L-M algorithm to transformer winding hot spot temperature forecasting

NASA Astrophysics Data System (ADS)

Wei, B. G.; Wu, X. Y.; Yao, Z. F.; Huang, H.

2017-11-01

Transformers are essential devices of the power system. The accurate computation of the highest temperature (HST) of a transformer’s windings is very significant, as for the HST is a fundamental parameter in controlling the load operation mode and influencing the life time of the insulation. Based on the analysis of the heat transfer processes and the thermal characteristics inside transformers, there is taken into consideration the influence of factors like the sunshine, external wind speed etc. on the oil-immersed transformers. Experimental data and the neural network are used for modeling and protesting of the HST, and furthermore, investigations are conducted on the optimization of the structure and algorithms of neutral network are conducted. Comparison is made between the measured values and calculated values by using the recommended algorithm of IEC60076 and by using the neural network algorithm proposed by the authors; comparison that shows that the value computed with the neural network algorithm approximates better the measured value than the value computed with the algorithm proposed by IEC60076.
Implementation and analysis of a Navier-Stokes algorithm on parallel computers

NASA Technical Reports Server (NTRS)

Fatoohi, Raad A.; Grosch, Chester E.

1988-01-01

The results of the implementation of a Navier-Stokes algorithm on three parallel/vector computers are presented. The object of this research is to determine how well, or poorly, a single numerical algorithm would map onto three different architectures. The algorithm is a compact difference scheme for the solution of the incompressible, two-dimensional, time-dependent Navier-Stokes equations. The computers were chosen so as to encompass a variety of architectures. They are the following: the MPP, an SIMD machine with 16K bit serial processors; Flex/32, an MIMD machine with 20 processors; and Cray/2. The implementation of the algorithm is discussed in relation to these architectures and measures of the performance on each machine are given. The basic comparison is among SIMD instruction parallelism on the MPP, MIMD process parallelism on the Flex/32, and vectorization of a serial code on the Cray/2. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for this algorithm on these architectures. Finally, conclusions are presented.
A fast complex integer convolution using a hybrid transform

NASA Technical Reports Server (NTRS)

Reed, I. S.; K Truong, T.

1978-01-01

It is shown that the Winograd transform can be combined with a complex integer transform over the Galois field GF(q-squared) to yield a new algorithm for computing the discrete cyclic convolution of complex number points. By this means a fast method for accurately computing the cyclic convolution of a sequence of complex numbers for long convolution lengths can be obtained. This new hybrid algorithm requires fewer multiplications than previous algorithms.
A Computer Based Educational Aid for the Instruction of Combat Modeling

DTIC Science & Technology

1992-02-27

representation (36:363-370), and, as Knuth put it, "An algorithm must be seen to be believed" (23:4). Graphics not only aid in achieving instructional...consisted primarily of research, identification and use of existing combat model computer algorithms , interviews, and use of operation research...to-air combat models’ operating manuals provided valuable insight into pro- gram structure and algorithms used to represent the combat. From these
Identifying Optimal Measurement Subspace for the Ensemble Kalman Filter

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhou, Ning; Huang, Zhenyu; Welch, Greg

2012-05-24

To reduce the computational load of the ensemble Kalman filter while maintaining its efficacy, an optimization algorithm based on the generalized eigenvalue decomposition method is proposed for identifying the most informative measurement subspace. When the number of measurements is large, the proposed algorithm can be used to make an effective tradeoff between computational complexity and estimation accuracy. This algorithm also can be extended to other Kalman filters for measurement subspace selection.
Parallel language constructs for tensor product computations on loosely coupled architectures

NASA Technical Reports Server (NTRS)

Mehrotra, Piyush; Van Rosendale, John

1989-01-01

A set of language primitives designed to allow the specification of parallel numerical algorithms at a higher level is described. The authors focus on tensor product array computations, a simple but important class of numerical algorithms. They consider first the problem of programming one-dimensional kernel routines, such as parallel tridiagonal solvers, and then look at how such parallel kernels can be combined to form parallel tensor product algorithms.
Effects of Computer Architecture on FFT (Fast Fourier Transform) Algorithm Performance.

DTIC Science & Technology

1983-12-01

Criteria for Efficient Implementation of FFT Algorithms," IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol. ASSP-30, pp. 107-109, Feb...1982. Burrus, C. S. and P. W. Eschenbacher. "An In-Place, In-Order Prime Factor FFT Algorithm," IEEE Transactions on Acoustics, Speech, and Signal... Transactions on Acoustics, Speech, and Signal Processing, Vol. ASSP-30, pp. 217-226, Apr. 1982. Control Data Corporation. CDC Cyber 170 Computer Systems
Algorithm for Surface of Translation Attached Radiators (A-STAR). Volume 2. Users manual

NASA Astrophysics Data System (ADS)

Medgyesimitschang, L. N.; Putnam, J. M.

1982-05-01

A hierarchy of computer programs implementing the method of moments for bodies of translation (MM/BOT) is described. The algorithm treats the far-field radiation from off-surface and aperture antennas on finite-length open or closed bodies of arbitrary cross section. The near fields and antenna coupling on such bodies are computed. The theoretical development underlying the algorithm is described in Volume 1 of this report.
Study of Computational Structures for Multiobject Tracking Algorithms

DTIC Science & Technology

1986-12-01

MULTIOBJECT TRACKING ALGORITHMS 12. PERSONAL AUTHOR(S) i Allen, Thomas G .; Kurien, Thomas; Washburn, Robert B. Jr. 13a. TYPE OF REPORT 13b. TIME COVERED 14...mentioned possible restructurings of the tracking algorithm that increase the amount of available parallelism ’ g ~. are investigated. This step is extremely...sufficient for our needs here. In the following section we will examine the structure and computational requirements of the track- g , oriented approach
A Genetic Algorithm That Exchanges Neighboring Centers for Fuzzy c-Means Clustering

ERIC Educational Resources Information Center

Chahine, Firas Safwan

2012-01-01

Clustering algorithms are widely used in pattern recognition and data mining applications. Due to their computational efficiency, partitional clustering algorithms are better suited for applications with large datasets than hierarchical clustering algorithms. K-means is among the most popular partitional clustering algorithm, but has a major…

Computation of Quasi-Periodic Normally Hyperbolic Invariant Tori: Algorithms, Numerical Explorations and Mechanisms of Breakdown

NASA Astrophysics Data System (ADS)

Canadell, Marta; Haro, Àlex

2017-12-01

We present several algorithms for computing normally hyperbolic invariant tori carrying quasi-periodic motion of a fixed frequency in families of dynamical systems. The algorithms are based on a KAM scheme presented in Canadell and Haro (J Nonlinear Sci, 2016. doi: 10.1007/s00332-017-9389-y), to find the parameterization of the torus with prescribed dynamics by detuning parameters of the model. The algorithms use different hyperbolicity and reducibility properties and, in particular, compute also the invariant bundles and Floquet transformations. We implement these methods in several 2-parameter families of dynamical systems, to compute quasi-periodic arcs, that is, the parameters for which 1D normally hyperbolic invariant tori with a given fixed frequency do exist. The implementation lets us to perform the continuations up to the tip of the quasi-periodic arcs, for which the invariant curves break down. Three different mechanisms of breakdown are analyzed, using several observables, leading to several conjectures.
A proposed study of multiple scattering through clouds up to 1 THz

NASA Technical Reports Server (NTRS)

Gerace, G. C.; Smith, E. K.

1992-01-01

A rigorous computation of the electromagnetic field scattered from an atmospheric liquid water cloud is proposed. The recent development of a fast recursive algorithm (Chew algorithm) for computing the fields scattered from numerous scatterers now makes a rigorous computation feasible. A method is presented for adapting this algorithm to a general case where there are an extremely large number of scatterers. It is also proposed to extend a new binary PAM channel coding technique (El-Khamy coding) to multiple levels with non-square pulse shapes. The Chew algorithm can be used to compute the transfer function of a cloud channel. Then the transfer function can be used to design an optimum El-Khamy code. In principle, these concepts can be applied directly to the realistic case of a time-varying cloud (adaptive channel coding and adaptive equalization). A brief review is included of some preliminary work on cloud dispersive effects on digital communication signals and on cloud liquid water spectra and correlations.
Automated flight path planning for virtual endoscopy.

PubMed

Paik, D S; Beaulieu, C F; Jeffrey, R B; Rubin, G D; Napel, S

1998-05-01

In this paper, a novel technique for rapid and automatic computation of flight paths for guiding virtual endoscopic exploration of three-dimensional medical images is described. While manually planning flight paths is a tedious and time consuming task, our algorithm is automated and fast. Our method for positioning the virtual camera is based on the medial axis transform but is much more computationally efficient. By iteratively correcting a path toward the medial axis, the necessity of evaluating simple point criteria during morphological thinning is eliminated. The virtual camera is also oriented in a stable viewing direction, avoiding sudden twists and turns. We tested our algorithm on volumetric data sets of eight colons, one aorta and one bronchial tree. The algorithm computed the flight paths in several minutes per volume on an inexpensive workstation with minimal computation time added for multiple paths through branching structures (10%-13% per extra path). The results of our algorithm are smooth, centralized paths that aid in the task of navigation in virtual endoscopic exploration of three-dimensional medical images.
Flood inundation extent mapping based on block compressed tracing

NASA Astrophysics Data System (ADS)

Shen, Dingtao; Rui, Yikang; Wang, Jiechen; Zhang, Yu; Cheng, Liang

2015-07-01

Flood inundation extent, depth, and duration are important factors affecting flood hazard evaluation. At present, flood inundation analysis is based mainly on a seeded region-growing algorithm, which is an inefficient process because it requires excessive recursive computations and it is incapable of processing massive datasets. To address this problem, we propose a block compressed tracing algorithm for mapping the flood inundation extent, which reads the DEM data in blocks before transferring them to raster compression storage. This allows a smaller computer memory to process a larger amount of data, which solves the problem of the regular seeded region-growing algorithm. In addition, the use of a raster boundary tracing technique allows the algorithm to avoid the time-consuming computations required by the seeded region-growing. Finally, we conduct a comparative evaluation in the Chin-sha River basin, results show that the proposed method solves the problem of flood inundation extent mapping based on massive DEM datasets with higher computational efficiency than the original method, which makes it suitable for practical applications.
Vehicle routing problem with time windows using natural inspired algorithms

NASA Astrophysics Data System (ADS)

Pratiwi, A. B.; Pratama, A.; Sa’diyah, I.; Suprajitno, H.

2018-03-01

Process of distribution of goods needs a strategy to make the total cost spent for operational activities minimized. But there are several constrains have to be satisfied which are the capacity of the vehicles and the service time of the customers. This Vehicle Routing Problem with Time Windows (VRPTW) gives complex constrains problem. This paper proposes natural inspired algorithms for dealing with constrains of VRPTW which involves Bat Algorithm and Cat Swarm Optimization. Bat Algorithm is being hybrid with Simulated Annealing, the worst solution of Bat Algorithm is replaced by the solution from Simulated Annealing. Algorithm which is based on behavior of cats, Cat Swarm Optimization, is improved using Crow Search Algorithm to make simplier and faster convergence. From the computational result, these algorithms give good performances in finding the minimized total distance. Higher number of population causes better computational performance. The improved Cat Swarm Optimization with Crow Search gives better performance than the hybridization of Bat Algorithm and Simulated Annealing in dealing with big data.
Improved Ant Colony Clustering Algorithm and Its Performance Study

PubMed Central

Gao, Wei

2016-01-01

Clustering analysis is used in many disciplines and applications; it is an important tool that descriptively identifies homogeneous groups of objects based on attribute values. The ant colony clustering algorithm is a swarm-intelligent method used for clustering problems that is inspired by the behavior of ant colonies that cluster their corpses and sort their larvae. A new abstraction ant colony clustering algorithm using a data combination mechanism is proposed to improve the computational efficiency and accuracy of the ant colony clustering algorithm. The abstraction ant colony clustering algorithm is used to cluster benchmark problems, and its performance is compared with the ant colony clustering algorithm and other methods used in existing literature. Based on similar computational difficulties and complexities, the results show that the abstraction ant colony clustering algorithm produces results that are not only more accurate but also more efficiently determined than the ant colony clustering algorithm and the other methods. Thus, the abstraction ant colony clustering algorithm can be used for efficient multivariate data clustering. PMID:26839533
Complexity of the Quantum Adiabatic Algorithm

NASA Astrophysics Data System (ADS)

Hen, Itay

2013-03-01

The Quantum Adiabatic Algorithm (QAA) has been proposed as a mechanism for efficiently solving optimization problems on a quantum computer. Since adiabatic computation is analog in nature and does not require the design and use of quantum gates, it can be thought of as a simpler and perhaps more profound method for performing quantum computations that might also be easier to implement experimentally. While these features have generated substantial research in QAA, to date there is still a lack of solid evidence that the algorithm can outperform classical optimization algorihms. Here, we discuss several aspects of the quantum adiabatic algorithm: We analyze the efficiency of the algorithm on several ``hard'' (NP) computational problems. Studying the size dependence of the typical minimum energy gap of the Hamiltonians of these problems using quantum Monte Carlo methods, we find that while for most problems the minimum gap decreases exponentially with the size of the problem, indicating that the QAA is not more efficient than existing classical search algorithms, for other problems there is evidence to suggest that the gap may be polynomial near the phase transition. We also discuss applications of the QAA to ``real life'' problems and how they can be implemented on currently available (albeit prototypical) quantum hardware such as ``D-Wave One'', that impose serious restrictions as to which type of problems may be tested. Finally, we discuss different approaches to find improved implementations of the algorithm such as local adiabatic evolution, adaptive methods, local search in Hamiltonian space and others.
A multi-emitter fitting algorithm for potential live cell super-resolution imaging over a wide range of molecular densities.

PubMed

Takeshima, T; Takahashi, T; Yamashita, J; Okada, Y; Watanabe, S

2018-05-25

Multi-emitter fitting algorithms have been developed to improve the temporal resolution of single-molecule switching nanoscopy, but the molecular density range they can analyse is narrow and the computation required is intensive, significantly limiting their practical application. Here, we propose a computationally fast method, wedged template matching (WTM), an algorithm that uses a template matching technique to localise molecules at any overlapping molecular density from sparse to ultrahigh density with subdiffraction resolution. WTM achieves the localization of overlapping molecules at densities up to 600 molecules μm -2 with a high detection sensitivity and fast computational speed. WTM also shows localization precision comparable with that of DAOSTORM (an algorithm for high-density super-resolution microscopy), at densities up to 20 molecules μm -2 , and better than DAOSTORM at higher molecular densities. The application of WTM to a high-density biological sample image demonstrated that it resolved protein dynamics from live cell images with subdiffraction resolution and a temporal resolution of several hundred milliseconds or less through a significant reduction in the number of camera images required for a high-density reconstruction. WTM algorithm is a computationally fast, multi-emitter fitting algorithm that can analyse over a wide range of molecular densities. The algorithm is available through the website. https://doi.org/10.17632/bf3z6xpn5j.1. © 2018 The Authors. Journal of Microscopy published by JohnWiley & Sons Ltd on behalf of Royal Microscopical Society.
Interface Generation and Compositional Verification in JavaPathfinder

NASA Technical Reports Server (NTRS)

Giannakopoulou, Dimitra; Pasareanu, Corina

2009-01-01

We present a novel algorithm for interface generation of software components. Given a component, our algorithm uses learning techniques to compute a permissive interface representing legal usage of the component. Unlike our previous work, this algorithm does not require knowledge about the component s environment. Furthermore, in contrast to other related approaches, our algorithm computes permissive interfaces even in the presence of non-determinism in the component. Our algorithm is implemented in the JavaPathfinder model checking framework for UML statechart components. We have also added support for automated assume-guarantee style compositional verification in JavaPathfinder, using component interfaces. We report on the application of the presented approach to the generation of interfaces for flight software components.
Modelling the spread of innovation in wild birds.

PubMed

Shultz, Thomas R; Montrey, Marcel; Aplin, Lucy M

2017-06-01

We apply three plausible algorithms in agent-based computer simulations to recent experiments on social learning in wild birds. Although some of the phenomena are simulated by all three learning algorithms, several manifestations of social conformity bias are simulated by only the approximate majority (AM) algorithm, which has roots in chemistry, molecular biology and theoretical computer science. The simulations generate testable predictions and provide several explanatory insights into the diffusion of innovation through a population. The AM algorithm's success raises the possibility of its usefulness in studying group dynamics more generally, in several different scientific domains. Our differential-equation model matches simulation results and provides mathematical insights into the dynamics of these algorithms. © 2017 The Author(s).
A hybrid flower pollination algorithm based modified randomized location for multi-threshold medical image segmentation.

PubMed

Wang, Rui; Zhou, Yongquan; Zhao, Chengyan; Wu, Haizhou

2015-01-01

Multi-threshold image segmentation is a powerful image processing technique that is used for the preprocessing of pattern recognition and computer vision. However, traditional multilevel thresholding methods are computationally expensive because they involve exhaustively searching the optimal thresholds to optimize the objective functions. To overcome this drawback, this paper proposes a flower pollination algorithm with a randomized location modification. The proposed algorithm is used to find optimal threshold values for maximizing Otsu's objective functions with regard to eight medical grayscale images. When benchmarked against other state-of-the-art evolutionary algorithms, the new algorithm proves itself to be robust and effective through numerical experimental results including Otsu's objective values and standard deviations.
Fast Solution in Sparse LDA for Binary Classification

NASA Technical Reports Server (NTRS)

Moghaddam, Baback

2010-01-01

An algorithm that performs sparse linear discriminant analysis (Sparse-LDA) finds near-optimal solutions in far less time than the prior art when specialized to binary classification (of 2 classes). Sparse-LDA is a type of feature- or variable- selection problem with numerous applications in statistics, machine learning, computer vision, computational finance, operations research, and bio-informatics. Because of its combinatorial nature, feature- or variable-selection problems are NP-hard or computationally intractable in cases involving more than 30 variables or features. Therefore, one typically seeks approximate solutions by means of greedy search algorithms. The prior Sparse-LDA algorithm was a greedy algorithm that considered the best variable or feature to add/ delete to/ from its subsets in order to maximally discriminate between multiple classes of data. The present algorithm is designed for the special but prevalent case of 2-class or binary classification (e.g. 1 vs. 0, functioning vs. malfunctioning, or change versus no change). The present algorithm provides near-optimal solutions on large real-world datasets having hundreds or even thousands of variables or features (e.g. selecting the fewest wavelength bands in a hyperspectral sensor to do terrain classification) and does so in typical computation times of minutes as compared to days or weeks as taken by the prior art. Sparse LDA requires solving generalized eigenvalue problems for a large number of variable subsets (represented by the submatrices of the input within-class and between-class covariance matrices). In the general (fullrank) case, the amount of computation scales at least cubically with the number of variables and thus the size of the problems that can be solved is limited accordingly. However, in binary classification, the principal eigenvalues can be found using a special analytic formula, without resorting to costly iterative techniques. The present algorithm exploits this analytic form along with the inherent sequential nature of greedy search itself. Together this enables the use of highly-efficient partitioned-matrix-inverse techniques that result in large speedups of computation in both the forward-selection and backward-elimination stages of greedy algorithms in general.
New numerical approaches for modeling thermochemical convection in a compositionally stratified fluid

NASA Astrophysics Data System (ADS)

Puckett, Elbridge Gerry; Turcotte, Donald L.; He, Ying; Lokavarapu, Harsha; Robey, Jonathan M.; Kellogg, Louise H.

2018-03-01

Geochemical observations of mantle-derived rocks favor a nearly homogeneous upper mantle, the source of mid-ocean ridge basalts (MORB), and heterogeneous lower mantle regions. Plumes that generate ocean island basalts are thought to sample the lower mantle regions and exhibit more heterogeneity than MORB. These regions have been associated with lower mantle structures known as large low shear velocity provinces (LLSVPS) below Africa and the South Pacific. The isolation of these regions is attributed to compositional differences and density stratification that, consequently, have been the subject of computational and laboratory modeling designed to determine the parameter regime in which layering is stable and understanding how layering evolves. Mathematical models of persistent compositional interfaces in the Earth's mantle may be inherently unstable, at least in some regions of the parameter space relevant to the mantle. Computing approximations to solutions of such problems presents severe challenges, even to state-of-the-art numerical methods. Some numerical algorithms for modeling the interface between distinct compositions smear the interface at the boundary between compositions, such as methods that add numerical diffusion or 'artificial viscosity' in order to stabilize the algorithm. We present two new algorithms for maintaining high-resolution and sharp computational boundaries in computations of these types of problems: a discontinuous Galerkin method with a bound preserving limiter and a Volume-of-Fluid interface tracking algorithm. We compare these new methods with two approaches widely used for modeling the advection of two distinct thermally driven compositional fields in mantle convection computations: a high-order accurate finite element advection algorithm with entropy viscosity and a particle method that carries a scalar quantity representing the location of each compositional field. All four algorithms are implemented in the open source finite element code ASPECT, which we use to compute the velocity, pressure, and temperature associated with the underlying flow field. We compare the performance of these four algorithms on three problems, including computing an approximation to the solution of an initially compositionally stratified fluid at Ra =105 with buoyancy numbers B that vary from no stratification at B = 0 to stratified flow at large B .
A Programmable Five Qubit Quantum Computer Using Trapped Atomic Ions

NASA Astrophysics Data System (ADS)

Debnath, Shantanu

Quantum computers can solve certain problems more efficiently compared to conventional classical methods. In the endeavor to build a quantum computer, several competing platforms have emerged that can implement certain quantum algorithms using a few qubits. However, the demonstrations so far have been done usually by tailoring the hardware to meet the requirements of a particular algorithm implemented for a limited number of instances. Although such proof of principal implementations are important to verify the working of algorithms on a physical system, they further need to have the potential to serve as a general purpose quantum computer allowing the flexibility required for running multiple algorithms and be scaled up to host more qubits. Here we demonstrate a small programmable quantum computer based on five trapped atomic ions each of which serves as a qubit. By optically resolving each ion we can individually address them in order to perform a complete set of single-qubit and fully connected two-qubit quantum gates and alsoperform efficient individual qubit measurements. We implement a computation architecture that accepts an algorithm from a user interface in the form of a standard logic gate sequence and decomposes it into fundamental quantum operations that are native to the hardware using a set of compilation instructions that are defined within the software. These operations are then effected through a pattern of laser pulses that perform coherent rotations on targeted qubits in the chain. The architecture implemented in the experiment therefore gives us unprecedented flexibility in the programming of any quantum algorithm while staying blind to the underlying hardware. As a demonstration we implement the Deutsch-Jozsa and Bernstein-Vazirani algorithms on the five-qubit processor and achieve average success rates of 95 and 90 percent, respectively. We also implement a five-qubit coherent quantum Fourier transform and examine its performance in the period finding and phase estimation protocol. We find fidelities of 84 and 62 percent, respectively. While maintaining the same computation architecture the system can be scaled to more ions using resources that scale favorably (O(N. 2)) with the numberof qubits N.
Sensor placement algorithm development to maximize the efficiency of acid gas removal unit for integrated gasifiction combined sycle (IGCC) power plant with CO2 capture

DOE Office of Scientific and Technical Information (OSTI.GOV)

Paul, P.; Bhattacharyya, D.; Turton, R.

2012-01-01

Future integrated gasification combined cycle (IGCC) power plants with CO{sub 2} capture will face stricter operational and environmental constraints. Accurate values of relevant states/outputs/disturbances are needed to satisfy these constraints and to maximize the operational efficiency. Unfortunately, a number of these process variables cannot be measured while a number of them can be measured, but have low precision, reliability, or signal-to-noise ratio. In this work, a sensor placement (SP) algorithm is developed for optimal selection of sensor location, number, and type that can maximize the plant efficiency and result in a desired precision of the relevant measured/unmeasured states. In thismore » work, an SP algorithm is developed for an selective, dual-stage Selexol-based acid gas removal (AGR) unit for an IGCC plant with pre-combustion CO{sub 2} capture. A comprehensive nonlinear dynamic model of the AGR unit is developed in Aspen Plus Dynamics® (APD) and used to generate a linear state-space model that is used in the SP algorithm. The SP algorithm is developed with the assumption that an optimal Kalman filter will be implemented in the plant for state and disturbance estimation. The algorithm is developed assuming steady-state Kalman filtering and steady-state operation of the plant. The control system is considered to operate based on the estimated states and thereby, captures the effects of the SP algorithm on the overall plant efficiency. The optimization problem is solved by Genetic Algorithm (GA) considering both linear and nonlinear equality and inequality constraints. Due to the very large number of candidate sets available for sensor placement and because of the long time that it takes to solve the constrained optimization problem that includes more than 1000 states, solution of this problem is computationally expensive. For reducing the computation time, parallel computing is performed using the Distributed Computing Server (DCS®) and the Parallel Computing® toolbox from Mathworks®. In this presentation, we will share our experience in setting up parallel computing using GA in the MATLAB® environment and present the overall approach for achieving higher computational efficiency in this framework.« less
Sensor placement algorithm development to maximize the efficiency of acid gas removal unit for integrated gasification combined cycle (IGCC) power plant with CO{sub 2} capture

DOE Office of Scientific and Technical Information (OSTI.GOV)

Paul, P.; Bhattacharyya, D.; Turton, R.

2012-01-01

Future integrated gasification combined cycle (IGCC) power plants with CO{sub 2} capture will face stricter operational and environmental constraints. Accurate values of relevant states/outputs/disturbances are needed to satisfy these constraints and to maximize the operational efficiency. Unfortunately, a number of these process variables cannot be measured while a number of them can be measured, but have low precision, reliability, or signal-to-noise ratio. In this work, a sensor placement (SP) algorithm is developed for optimal selection of sensor location, number, and type that can maximize the plant efficiency and result in a desired precision of the relevant measured/unmeasured states. In thismore » work, an SP algorithm is developed for an selective, dual-stage Selexol-based acid gas removal (AGR) unit for an IGCC plant with pre-combustion CO{sub 2} capture. A comprehensive nonlinear dynamic model of the AGR unit is developed in Aspen Plus Dynamics® (APD) and used to generate a linear state-space model that is used in the SP algorithm. The SP algorithm is developed with the assumption that an optimal Kalman filter will be implemented in the plant for state and disturbance estimation. The algorithm is developed assuming steady-state Kalman filtering and steady-state operation of the plant. The control system is considered to operate based on the estimated states and thereby, captures the effects of the SP algorithm on the overall plant efficiency. The optimization problem is solved by Genetic Algorithm (GA) considering both linear and nonlinear equality and inequality constraints. Due to the very large number of candidate sets available for sensor placement and because of the long time that it takes to solve the constrained optimization problem that includes more than 1000 states, solution of this problem is computationally expensive. For reducing the computation time, parallel computing is performed using the Distributed Computing Server (DCS®) and the Parallel Computing® toolbox from Mathworks®. In this presentation, we will share our experience in setting up parallel computing using GA in the MATLAB® environment and present the overall approach for achieving higher computational efficiency in this framework.« less
Efficient computation of the joint probability of multiple inherited risk alleles from pedigree data.

PubMed

Madsen, Thomas; Braun, Danielle; Peng, Gang; Parmigiani, Giovanni; Trippa, Lorenzo

2018-06-25

The Elston-Stewart peeling algorithm enables estimation of an individual's probability of harboring germline risk alleles based on pedigree data, and serves as the computational backbone of important genetic counseling tools. However, it remains limited to the analysis of risk alleles at a small number of genetic loci because its computing time grows exponentially with the number of loci considered. We propose a novel, approximate version of this algorithm, dubbed the peeling and paring algorithm, which scales polynomially in the number of loci. This allows extending peeling-based models to include many genetic loci. The algorithm creates a trade-off between accuracy and speed, and allows the user to control this trade-off. We provide exact bounds on the approximation error and evaluate it in realistic simulations. Results show that the loss of accuracy due to the approximation is negligible in important applications. This algorithm will improve genetic counseling tools by increasing the number of pathogenic risk alleles that can be addressed. To illustrate we create an extended five genes version of BRCAPRO, a widely used model for estimating the carrier probabilities of BRCA1 and BRCA2 risk alleles and assess its computational properties. © 2018 WILEY PERIODICALS, INC.
Experimental realization of Shor's quantum factoring algorithm using nuclear magnetic resonance.

PubMed

Vandersypen, L M; Steffen, M; Breyta, G; Yannoni, C S; Sherwood, M H; Chuang, I L

The number of steps any classical computer requires in order to find the prime factors of an l-digit integer N increases exponentially with l, at least using algorithms known at present. Factoring large integers is therefore conjectured to be intractable classically, an observation underlying the security of widely used cryptographic codes. Quantum computers, however, could factor integers in only polynomial time, using Shor's quantum factoring algorithm. Although important for the study of quantum computers, experimental demonstration of this algorithm has proved elusive. Here we report an implementation of the simplest instance of Shor's algorithm: factorization of N = 15 (whose prime factors are 3 and 5). We use seven spin-1/2 nuclei in a molecule as quantum bits, which can be manipulated with room temperature liquid-state nuclear magnetic resonance techniques. This method of using nuclei to store quantum information is in principle scalable to systems containing many quantum bits, but such scalability is not implied by the present work. The significance of our work lies in the demonstration of experimental and theoretical techniques for precise control and modelling of complex quantum computers. In particular, we present a simple, parameter-free but predictive model of decoherence effects in our system.
Trellises and Trellis-Based Decoding Algorithms for Linear Block Codes. Part 3

NASA Technical Reports Server (NTRS)

Lin, Shu

1998-01-01

Decoding algorithms based on the trellis representation of a code (block or convolutional) drastically reduce decoding complexity. The best known and most commonly used trellis-based decoding algorithm is the Viterbi algorithm. It is a maximum likelihood decoding algorithm. Convolutional codes with the Viterbi decoding have been widely used for error control in digital communications over the last two decades. This chapter is concerned with the application of the Viterbi decoding algorithm to linear block codes. First, the Viterbi algorithm is presented. Then, optimum sectionalization of a trellis to minimize the computational complexity of a Viterbi decoder is discussed and an algorithm is presented. Some design issues for IC (integrated circuit) implementation of a Viterbi decoder are considered and discussed. Finally, a new decoding algorithm based on the principle of compare-select-add is presented. This new algorithm can be applied to both block and convolutional codes and is more efficient than the conventional Viterbi algorithm based on the add-compare-select principle. This algorithm is particularly efficient for rate 1/n antipodal convolutional codes and their high-rate punctured codes. It reduces computational complexity by one-third compared with the Viterbi algorithm.
Improved teaching-learning-based and JAYA optimization algorithms for solving flexible flow shop scheduling problems

NASA Astrophysics Data System (ADS)

Buddala, Raviteja; Mahapatra, Siba Sankar

2017-11-01

Flexible flow shop (or a hybrid flow shop) scheduling problem is an extension of classical flow shop scheduling problem. In a simple flow shop configuration, a job having `g' operations is performed on `g' operation centres (stages) with each stage having only one machine. If any stage contains more than one machine for providing alternate processing facility, then the problem becomes a flexible flow shop problem (FFSP). FFSP which contains all the complexities involved in a simple flow shop and parallel machine scheduling problems is a well-known NP-hard (Non-deterministic polynomial time) problem. Owing to high computational complexity involved in solving these problems, it is not always possible to obtain an optimal solution in a reasonable computation time. To obtain near-optimal solutions in a reasonable computation time, a large variety of meta-heuristics have been proposed in the past. However, tuning algorithm-specific parameters for solving FFSP is rather tricky and time consuming. To address this limitation, teaching-learning-based optimization (TLBO) and JAYA algorithm are chosen for the study because these are not only recent meta-heuristics but they do not require tuning of algorithm-specific parameters. Although these algorithms seem to be elegant, they lose solution diversity after few iterations and get trapped at the local optima. To alleviate such drawback, a new local search procedure is proposed in this paper to improve the solution quality. Further, mutation strategy (inspired from genetic algorithm) is incorporated in the basic algorithm to maintain solution diversity in the population. Computational experiments have been conducted on standard benchmark problems to calculate makespan and computational time. It is found that the rate of convergence of TLBO is superior to JAYA. From the results, it is found that TLBO and JAYA outperform many algorithms reported in the literature and can be treated as efficient methods for solving the FFSP.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.