Implementing the Deutsch-Jozsa algorithm with macroscopic ensembles
NASA Astrophysics Data System (ADS)
Semenenko, Henry; Byrnes, Tim
2016-05-01
Quantum computing implementations under consideration today typically deal with systems with microscopic degrees of freedom such as photons, ions, cold atoms, and superconducting circuits. The quantum information is stored typically in low-dimensional Hilbert spaces such as qubits, as quantum effects are strongest in such systems. It has, however, been demonstrated that quantum effects can be observed in mesoscopic and macroscopic systems, such as nanomechanical systems and gas ensembles. While few-qubit quantum information demonstrations have been performed with such macroscopic systems, a quantum algorithm showing exponential speedup over classical algorithms is yet to be shown. Here, we show that the Deutsch-Jozsa algorithm can be implemented with macroscopic ensembles. The encoding that we use avoids the detrimental effects of decoherence that normally plagues macroscopic implementations. We discuss two mapping procedures which can be chosen depending upon the constraints of the oracle and the experiment. Both methods have an exponential speedup over the classical case, and only require control of the ensembles at the level of the total spin of the ensembles. It is shown that both approaches reproduce the qubit Deutsch-Jozsa algorithm, and are robust under decoherence.
Quantum computation with classical light: Implementation of the Deutsch-Jozsa algorithm
NASA Astrophysics Data System (ADS)
Perez-Garcia, Benjamin; McLaren, Melanie; Goyal, Sandeep K.; Hernandez-Aranda, Raul I.; Forbes, Andrew; Konrad, Thomas
2016-05-01
We propose an optical implementation of the Deutsch-Jozsa Algorithm using classical light in a binary decision-tree scheme. Our approach uses a ring cavity and linear optical devices in order to efficiently query the oracle functional values. In addition, we take advantage of the intrinsic Fourier transforming properties of a lens to read out whether the function given by the oracle is balanced or constant.
Scheme for implementing the Deutsch-Jozsa algorithm in cavity QED
Zheng Shibiao
2004-09-01
We propose a scheme for realizing the Deutsch-Jozsa algorithm in cavity QED. The scheme is based on the resonant interaction of atoms with a cavity mode. The required experimental techniques are within the scope of what can be obtained in the microwave cavity QED setup. The experimental implementation of the scheme would be an important step toward more complex quantum computation in cavity QED.
Implementing Deutsch-Jozsa algorithm using light shifts and atomic ensembles
Dasgupta, Shubhrangshu; Biswas, Asoka; Agarwal, G.S.
2005-01-01
We present an optical scheme to implement the Deutsch-Jozsa algorithm using ac Stark shifts. The scheme uses an atomic ensemble consisting of four-level atoms interacting dispersively with a field. This leads to a Hamiltonian in the atom-field basis which is quite suitable for quantum computation. We show how one can implement the algorithm by performing proper one- and two-qubit operations. We emphasize that in our model the decoherence is expected to be minimal due to our usage of atomic ground states and freely propagating photon.
Kessel, Alexander R.; Yakovleva, Natalia M.
2002-12-01
Schemes of experimental realization of the main two-qubit processors for quantum computers and the Deutsch-Jozsa algorithm are derived in virtual spin representation. The results are applicable for every four quantum states allowing the required properties for quantum processor implementation if for qubit encoding, virtual spin representation is used. A four-dimensional Hilbert space of nuclear spin 3/2 is considered in detail for this aim.
Shi, Fazhan; Rong, Xing; Xu, Nanyang; Wang, Ya; Wu, Jie; Chong, Bo; Peng, Xinhua; Kniepert, Juliane; Schoenfeld, Rolf-Simon; Harneit, Wolfgang; Feng, Mang; Du, Jiangfeng
2010-07-23
The nitrogen-vacancy defect center (N-V center) is a promising candidate for quantum information processing due to the possibility of coherent manipulation of individual spins in the absence of the cryogenic requirement. We report a room-temperature implementation of the Deutsch-Jozsa algorithm by encoding both a qubit and an auxiliary state in the electron spin of a single N-V center. By thus exploiting the specific S=1 character of the spin system, we demonstrate how even scarce quantum resources can be used for test-bed experiments on the way towards a large-scale quantum computing architecture. PMID:20867828
Unifying parameter estimation and the Deutsch-Jozsa algorithm for continuous variables
Zwierz, Marcin; Perez-Delgado, Carlos A.; Kok, Pieter
2010-10-15
We reveal a close relationship between quantum metrology and the Deutsch-Jozsa algorithm on continuous-variable quantum systems. We develop a general procedure, characterized by two parameters, that unifies parameter estimation and the Deutsch-Jozsa algorithm. Depending on which parameter we keep constant, the procedure implements either the parameter-estimation protocol or the Deutsch-Jozsa algorithm. The parameter-estimation part of the procedure attains the Heisenberg limit and is therefore optimal. Due to the use of approximate normalizable continuous-variable eigenstates, the Deutsch-Jozsa algorithm is probabilistic. The procedure estimates a value of an unknown parameter and solves the Deutsch-Jozsa problem without the use of any entanglement.
Collins, David
2010-05-15
A general framework for regarding oracle-assisted quantum algorithms as tools for discriminating among unitary transformations is described. This framework is applied to the Deutsch-Jozsa problem and all possible quantum algorithms which solve the problem with certainty using oracle unitaries in a particular form are derived. It is also used to show that any quantum algorithm that solves the Deutsch-Jozsa problem starting with a quantum system in a particular class of initial, thermal equilibrium-based states of the type encountered in solution-state NMR can only succeed with greater probability than a classical algorithm when the problem size n exceeds {approx}10{sup 5}.
Using the Deutsch-Jozsa algorithm to partition arrays
NASA Astrophysics Data System (ADS)
Lipovaca, Samir
2010-03-01
Using the Deutsch-Jozsa algorithm, we will develop a method for solving a class of problems in which we need to determine parts of an array and then apply a specified function to each independent part. Since present quantum computers are not robust enough for code writing and execution, we will build a model of a vector quantum computer that implements the Deutsch-Jozsa algorithm from a machine language view using the APL2 programming language. The core of the method is an operator (DJBOX) which allows evaluation of an arbitrary function f by the Deutsch-Jozsa algorithm. Two key functions of the method are GET/PARTITION and CALC/WITH/PARTITIONS. The GET/PARTITION function determines parts of an array based on the function f. The CALC/WITH/PARTITIONS function determines parts of an array based on the function f and then applies another function to each independent part. We will imagine the method is implemented on the above vector quantum computer. We will show that the method can be successfully executed.
Experimental demonstration of the Deutsch-Jozsa algorithm in homonuclear multispin systems
Wu Zhen; Luo Jun; Feng Mang; Li Jun; Zheng Wenqiang; Peng Xinhua
2011-10-15
Despite early experimental tests of the Deutsch-Jozsa (DJ) algorithm, there have been only a very few nontrivial balanced functions tested for register number n>3. In this paper, we experimentally demonstrate the DJ algorithm in four- and five-qubit homonuclear spin systems by the nuclear-magnetic-resonance technique, by which we encode the one function evaluation into a long shaped pulse with the application of the gradient ascent algorithm. Our work, dramatically reducing the accumulated errors due to gate imperfections and relaxation, demonstrates a better implementation of the DJ algorithm.
Experimental realization of the Deutsch-Jozsa algorithm with a six-qubit cluster state
Vallone, Giuseppe; Donati, Gaia; Bruno, Natalia; Chiuri, Andrea; Mataloni, Paolo
2010-05-15
We describe an experimental realization of the Deutsch-Jozsa quantum algorithm to evaluate the properties of a two-bit Boolean function in the framework of one-way quantum computation. For this purpose, a two-photon six-qubit cluster state was engineered. Its peculiar topological structure is the basis of the original measurement pattern allowing the algorithm realization. The good agreement of the experimental results with the theoretical predictions, obtained at {approx}1 kHz success rate, demonstrates the correct implementation of the algorithm.
Tame, M. S.; Kim, M. S.
2010-09-15
We show that fundamental versions of the Deutsch-Jozsa and Bernstein-Vazirani quantum algorithms can be performed using a small entangled cluster state resource of only six qubits. We then investigate the minimal resource states needed to demonstrate general n-qubit versions and a scalable method to produce them. For this purpose, we propose a versatile photonic on-chip setup.
NMR tomography of the three-qubit Deutsch-Jozsa algorithm
Mangold, Oliver; Heidebrecht, Andreas; Mehring, Michael
2004-10-01
The optimized version of the Deutsch-Jozsa algorithm proposed by Collins et al. was implemented using the three {sup 19}F nuclear spins of 2,3,4-trifluoroaniline as qubits. To emulate the behavior of pure quantum-mechanical states pseudopure states of the ensemble were prepared prior to execution of the algorithm. Full tomography of the density matrix was employed to obtain detailed information about initial, intermediate, and final states. Information, thus obtained, was applied to optimize the pulse sequences used. It is shown that substantial improvement of the fidelity of the preparation may be achieved by compensating the effects caused by the different relaxation behavior of the different substates of the density matrix. All manipulations of the quantum states were performed under the conditions of unresolved spin-spin interactions.
Coherence as a resource in decision problems: The Deutsch-Jozsa algorithm and a variation
NASA Astrophysics Data System (ADS)
Hillery, Mark
2016-01-01
That superpositions of states can be useful for performing tasks in quantum systems has been known since the early days of quantum information, but only recently has a quantitative theory of quantum coherence been proposed. Here we apply that theory to an analysis of the Deutsch-Jozsa algorithm, which depends on quantum coherence for its operation. The Deutsch-Jozsa algorithm solves a decision problem, and we focus on a probabilistic version of that problem, comparing probability of being correct for both classical and quantum procedures. In addition, we study a related decision problem in which the quantum procedure has one-sided error while the classical procedure has two-sided error. The role of coherence on the quantum success probabilities in both of these problems is examined.
Anderson, Brandon M.; Collins, David
2005-10-15
We compare the failure probabilities of ensemble implementations of quantum algorithms which use pseudopure initial states, quantified by their polarization, to those of competing classical probabilistic algorithms. Specifically we consider a class algorithms which require only one bit to output the solution to problems. For large ensemble sizes, we present a general scheme to determine a critical polarization beneath which the quantum algorithm fails with greater probability than its classical competitor. We apply this to the Deutsch-Jozsa algorithm and show that the critical polarization is 86.6%.
Optical simulation of quantum algorithms using programmable liquid-crystal displays
Puentes, Graciana; La Mela, Cecilia; Ledesma, Silvia; Iemmi, Claudio; Paz, Juan Pablo; Saraceno, Marcos
2004-04-01
We present a scheme to perform an all optical simulation of quantum algorithms and maps. The main components are lenses to efficiently implement the Fourier transform and programmable liquid-crystal displays to introduce space dependent phase changes on a classical optical beam. We show how to simulate Deutsch-Jozsa and Grover's quantum algorithms using essentially the same optical array programmed in two different ways.
Multipartite entanglement in quantum algorithms
Bruss, D.; Macchiavello, C.
2011-05-15
We investigate the entanglement features of the quantum states employed in quantum algorithms. In particular, we analyze the multipartite entanglement properties in the Deutsch-Jozsa, Grover, and Simon algorithms. Our results show that for these algorithms most instances involve multipartite entanglement.
Realization of Simple Quantum Algorithms with Circuit Quantum Electrodynamics
NASA Astrophysics Data System (ADS)
Dicarlo, Leonardo
2010-03-01
Superconducting circuits have made considerable progress in the requirements of quantum coherence, universal gate operations and qubit readout necessary to realize a quantum computer. However, simultaneously meeting these requirements makes the solid-state realization of few-qubit processors, as previously implemented in nuclear magnetic resonance, ion-trap and optical systems, an exciting challenge. We present the realization of a two-qubit superconducting processor based on circuit quantum electrodynamics (cQED), and report progress by the Yale cQED team towards a four-qubit upgrade. The architecture employs a microwave transmission-line cavity as a quantum bus coupling multiple transmon qubits. Unitary control is achieved by concatenation of high-fidelity single-qubit rotations induced via resonant microwave tones, and multi-qubit adiabatic phase gates realized by local flux control of qubit frequencies. Qubit readout uses the cavity as a quadratic detector, such that a single, calibrated measurement channel gives direct access to multi-qubit correlations. We present generation of Bell states; entanglement quantification by strong violation of Clauser-Horne-Shimony-Holt inequalities; and implementations of the Grover search and Deutsch-Jozsa algorithms. We report experimental progress in extending adiabatic phase gates and joint readout to four qubits, and improving qubit coherence on the road to realizing more complex quantum algorithms. Research done in collaboration with J. M. Chow, J. M. Gambetta, Lev S. Bishop, B. R. Johnson, D. I. Schuster, A. Nunnenkamp, J. Majer, A. Blais, L. Frunzio, M. H. Devoret, S. M. Girvin, and R. J. Schoelkopf.
Systolic algorithms and their implementation
Kung, H.T.
1984-01-01
Very high performance computer systems must rely heavily on parallelism since there are severe physical and technological limits on the ultimate speed of any single processor. The systolic array concept developed in the last several years allows effective use of a very large number of processors in parallel. This article illustrates the basic ideas by reviewing a systolic array design for matrix triangularization and describing its use in the on-the-fly updating of Cholesky decomposition of covariance matrices-a crucial computation in adaptive signal processing. Following this are discussions on issues related to the hardware implementation of systolic algorithms in general, and some guidelines for designing systolic algorithms that will be convenient for implementation. 33 references.
Linear Bregman algorithm implemented in parallel GPU
NASA Astrophysics Data System (ADS)
Li, Pengyan; Ke, Jue; Sui, Dong; Wei, Ping
2015-08-01
At present, most compressed sensing (CS) algorithms have poor converging speed, thus are difficult to run on PC. To deal with this issue, we use a parallel GPU, to implement a broadly used compressed sensing algorithm, the Linear Bregman algorithm. Linear iterative Bregman algorithm is a reconstruction algorithm proposed by Osher and Cai. Compared with other CS reconstruction algorithms, the linear Bregman algorithm only involves the vector and matrix multiplication and thresholding operation, and is simpler and more efficient for programming. We use C as a development language and adopt CUDA (Compute Unified Device Architecture) as parallel computing architectures. In this paper, we compared the parallel Bregman algorithm with traditional CPU realized Bregaman algorithm. In addition, we also compared the parallel Bregman algorithm with other CS reconstruction algorithms, such as OMP and TwIST algorithms. Compared with these two algorithms, the result of this paper shows that, the parallel Bregman algorithm needs shorter time, and thus is more convenient for real-time object reconstruction, which is important to people's fast growing demand to information technology.
Java implementation of Class Association Rule algorithms
Tamura, Makio
2007-08-30
Java implementation of three Class Association Rule mining algorithms, NETCAR, CARapriori, and clustering based rule mining. NETCAR algorithm is a novel algorithm developed by Makio Tamura. The algorithm is discussed in a paper: UCRL-JRNL-232466-DRAFT, and would be published in a peer review scientific journal. The software is used to extract combinations of genes relevant with a phenotype from a phylogenetic profile and a phenotype profile. The phylogenetic profiles is represented by a binary matrix and a phenotype profile is represented by a binary vector. The present application of this software will be in genome analysis, however, it could be applied more generally.
Java implementation of Class Association Rule algorithms
Energy Science and Technology Software Center (ESTSC)
2007-08-30
Java implementation of three Class Association Rule mining algorithms, NETCAR, CARapriori, and clustering based rule mining. NETCAR algorithm is a novel algorithm developed by Makio Tamura. The algorithm is discussed in a paper: UCRL-JRNL-232466-DRAFT, and would be published in a peer review scientific journal. The software is used to extract combinations of genes relevant with a phenotype from a phylogenetic profile and a phenotype profile. The phylogenetic profiles is represented by a binary matrix andmore » a phenotype profile is represented by a binary vector. The present application of this software will be in genome analysis, however, it could be applied more generally.« less
Implementation of the phase gradient algorithm
Wahl, D.E.; Eichel, P.H.; Jakowatz, C.V. Jr.
1990-01-01
The recently introduced Phase Gradient Autofocus (PGA) algorithm is a non-parametric autofocus technique which has been shown to be quite effective for phase correction of Synthetic Aperture Radar (SAR) imagery. This paper will show that this powerful algorithm can be executed at near real-time speeds and also be implemented in a relatively small piece of hardware. A brief review of the PGA will be presented along with an overview of some critical implementation considerations. In addition, a demonstration of the PGA algorithm running on a 7 in. {times} 10 in. printed circuit board containing a TMS320C30 digital signal processing (DSP) chip will be given. With this system, using only the 20 range bins which contain the brightest points in the image, the algorithm can correct a badly degraded 256 {times} 256 image in as little as 3 seconds. Using all range bins, the algorithm can correct the image in 9 seconds. 4 refs., 2 figs.
Implementing Shor's algorithm on Josephson charge qubits
Vartiainen, Juha J.; Salomaa, Martti M.; Niskanen, Antti O.; Nakahara, Mikio
2004-07-01
We investigate the physical implementation of Shor's factorization algorithm on a Josephson charge qubit register. While we pursue a universal method to factor a composite integer of any size, the scheme is demonstrated for the number 21. We consider both the physical and algorithmic requirements for an optimal implementation when only a small number of qubits are available. These aspects of quantum computation are usually the topics of separate research communities; we present a unifying discussion of both of these fundamental features bridging Shor's algorithm to its physical realization using Josephson junction qubits. In order to meet the stringent requirements set by a short decoherence time, we accelerate the algorithm by decomposing the quantum circuit into tailored two- and three-qubit gates and we find their physical realizations through numerical optimization.
Deterministic quantum computation with one photonic qubit
NASA Astrophysics Data System (ADS)
Hor-Meyll, M.; Tasca, D. S.; Walborn, S. P.; Ribeiro, P. H. Souto; Santos, M. M.; Duzzioni, E. I.
2015-07-01
We show that deterministic quantum computing with one qubit (DQC1) can be experimentally implemented with a spatial light modulator, using the polarization and the transverse spatial degrees of freedom of light. The scheme allows the computation of the trace of a high-dimension matrix, being limited by the resolution of the modulator panel and the technical imperfections. In order to illustrate the method, we compute the normalized trace of unitary matrices and implement the Deutsch-Jozsa algorithm. The largest matrix that can be manipulated with our setup is 1080 ×1920 , which is able to represent a system with approximately 21 qubits.
CORDIC algorithms for SVM FPGA implementation
NASA Astrophysics Data System (ADS)
Gimeno Sarciada, Jesús; Lamel Rivera, Horacio; Jiménez, Matías
2010-04-01
Support Vector Machines are currently one of the best classification algorithms used in a wide number of applications. The ability to extract a classification function from a limited number of learning examples keeping in the structural risk low has demonstrated to be a clear alternative to other neural networks. However, the calculations involved in computing the kernel and the repetition of the process for all support vectors in the classification problem are certainly intensive, requiring time or power consumption in order to function correctly. This problem could be a drawback in certain applications with limited resources or time. Therefore simple algorithms circumventing this problem are needed. In this paper we analyze an FPGA implementation of a SVM which uses a CORDIC algorithm for simplifying the calculation of as specific kernel greatly reducing the time and hardware requirements needed for the classification, allowing for powerful in-field portable applications. The algorithm is and its calculation capabilities are shown. The full SVM classifier using this algorithm is implemented in an FPGA and its in-field use assessed for high speed low power classification.
Automating parallel implementation of neural learning algorithms.
Rana, O F
2000-06-01
Neural learning algorithms generally involve a number of identical processing units, which are fully or partially connected, and involve an update function, such as a ramp, a sigmoid or a Gaussian function for instance. Some variations also exist, where units can be heterogeneous, or where an alternative update technique is employed, such as a pulse stream generator. Associated with connections are numerical values that must be adjusted using a learning rule, and and dictated by parameters that are learning rule specific, such as momentum, a learning rate, a temperature, amongst others. Usually, neural learning algorithms involve local updates, and a global interaction between units is often discouraged, except in instances where units are fully connected, or involve synchronous updates. In all of these instances, concurrency within a neural algorithm cannot be fully exploited without a suitable implementation strategy. A design scheme is described for translating a neural learning algorithm from inception to implementation on a parallel machine using PVM or MPI libraries, or onto programmable logic such as FPGAs. A designer must first describe the algorithm using a specialised Neural Language, from which a Petri net (PN) model is constructed automatically for verification, and building a performance model. The PN model can be used to study issues such as synchronisation points, resource sharing and concurrency within a learning rule. Specialised constructs are provided to enable a designer to express various aspects of a learning rule, such as the number and connectivity of neural nodes, the interconnection strategies, and information flows required by the learning algorithm. A scheduling and mapping strategy is then used to translate this PN model onto a multiprocessor template. We demonstrate our technique using a Kohonen and backpropagation learning rules, implemented on a loosely coupled workstation cluster, and a dedicated parallel machine, with PVM libraries
Terascale spectral element algorithms and implementations.
Fischer, P. F.; Tufo, H. M.
1999-08-17
We describe the development and implementation of an efficient spectral element code for multimillion gridpoint simulations of incompressible flows in general two- and three-dimensional domains. We review basic and recently developed algorithmic underpinnings that have resulted in good parallel and vector performance on a broad range of architectures, including the terascale computing systems now coming online at the DOE labs. Sustained performance of 219 GFLOPS has been recently achieved on 2048 nodes of the Intel ASCI-Red machine at Sandia.
Parallel Implementation of Katsevich's FBP Algorithm
Guo, Xiaohu; Kong, Qiang; Zhou, Tie; Jiang, Ming
2006-01-01
For spiral cone-beam CT, parallel computing is an effective approach to resolving the problem of heavy computation burden. It is well known that the major computation time is spent in the backprojection step for either filtered-backprojection (FBP) or backprojected-filtration (BPF) algorithms. By the cone-beam cover method [1], the backprojection procedure is driven by cone-beam projections, and every cone-beam projection can be backprojected independently. Basing on this fact, we develop a parallel implementation of Katsevich's FBP algorithm. We do all the numerical experiments on a Linux cluster. In one typical experiment, the sequential reconstruction time is 781.3 seconds, while the parallel reconstruction time is 25.7 seconds with 32 processors. PMID:23165019
A Fast Implementation of the ISOCLUS Algorithm
NASA Technical Reports Server (NTRS)
Memarsadeghi, Nargess; Mount, David M.; Netanyahu, Nathan S.; LeMoigne, Jacqueline
2003-01-01
Unsupervised clustering is a fundamental building block in numerous image processing applications. One of the most popular and widely used clustering schemes for remote sensing applications is the ISOCLUS algorithm, which is based on the ISODATA method. The algorithm is given a set of n data points in d-dimensional space, an integer k indicating the initial number of clusters, and a number of additional parameters. The general goal is to compute the coordinates of a set of cluster centers in d-space, such that those centers minimize the mean squared distance from each data point to its nearest center. This clustering algorithm is similar to another well-known clustering method, called k-means. One significant feature of ISOCLUS over k-means is that the actual number of clusters reported might be fewer or more than the number supplied as part of the input. The algorithm uses different heuristics to determine whether to merge lor split clusters. As ISOCLUS can run very slowly, particularly on large data sets, there has been a growing .interest in the remote sensing community in computing it efficiently. We have developed a faster implementation of the ISOCLUS algorithm. Our improvement is based on a recent acceleration to the k-means algorithm of Kanungo, et al. They showed that, by using a kd-tree data structure for storing the data, it is possible to reduce the running time of k-means. We have adapted this method for the ISOCLUS algorithm, and we show that it is possible to achieve essentially the same results as ISOCLUS on large data sets, but with significantly lower running times. This adaptation involves computing a number of cluster statistics that are needed for ISOCLUS but not for k-means. Both the k-means and ISOCLUS algorithms are based on iterative schemes, in which nearest neighbors are calculated until some convergence criterion is satisfied. Each iteration requires that the nearest center for each data point be computed. Naively, this requires O
Quantum adiabatic algorithm for factorization and its experimental implementation.
Peng, Xinhua; Liao, Zeyang; Xu, Nanyang; Qin, Gan; Zhou, Xianyi; Suter, Dieter; Du, Jiangfeng
2008-11-28
We propose an adiabatic quantum algorithm capable of factorizing numbers, using fewer qubits than Shor's algorithm. We implement the algorithm in a NMR quantum information processor and experimentally factorize the number 21. In the range that our classical computer could simulate, the quantum adiabatic algorithm works well, providing evidence that the running time of this algorithm scales polynomially with the problem size. PMID:19113467
Implementation of a watershed algorithm on FPGAs
NASA Astrophysics Data System (ADS)
Zahirazami, Shahram; Akil, Mohamed
1998-10-01
In this article we present an implementation of a watershed algorithm on a multi-FPGA architecture. This implementation is based on an hierarchical FIFO. A separate FIFO for each gray level. The gray scale value of a pixel is taken for the altitude of the point. In this way we look at the image as a relief. We proceed by a flooding step. It's like as we immerse the relief in a lake. The water begins to come up and when the water of two different catchment basins reach each other, we will construct a separator or a `Watershed'. This approach is data dependent, hence the process time is different for different images. The H-FIFO is used to guarantee the nature of immersion, it means that we need two types of priority. All the points of an altitude `n' are processed before any point of altitude `n + 1'. And inside an altitude water propagates with a constant velocity in all directions from the source. This operator needs two images as input. An original image or it's gradient and the marker image. A classic way to construct the marker image is to build an image of minimal regions. Each minimal region has it's unique label. This label is the color of the water and will be used to see whether two different water touch each other. The algorithm at first fill the hierarchy FIFO with neighbors of all the regions who are not colored. Next it fetches the first pixel from the first non-empty FIFO and treats this pixel. This pixel will take the color of its neighbor, and all the neighbors who are not already in the H-FIFO are put in their correspondent FIFO. The process is over when the H-FIFO is empty. The result is a segmented and labeled image.
Categorizing Variations of Student-Implemented Sorting Algorithms
ERIC Educational Resources Information Center
Taherkhani, Ahmad; Korhonen, Ari; Malmi, Lauri
2012-01-01
In this study, we examined freshmen students' sorting algorithm implementations in data structures and algorithms' course in two phases: at the beginning of the course before the students received any instruction on sorting algorithms, and after taking a lecture on sorting algorithms. The analysis revealed that many students have insufficient…
Implementation of a Wavefront-Sensing Algorithm
NASA Technical Reports Server (NTRS)
Smith, Jeffrey S.; Dean, Bruce; Aronstein, David
2013-01-01
A computer program has been written as a unique implementation of an image-based wavefront-sensing algorithm reported in "Iterative-Transform Phase Retrieval Using Adaptive Diversity" (GSC-14879-1), NASA Tech Briefs, Vol. 31, No. 4 (April 2007), page 32. This software was originally intended for application to the James Webb Space Telescope, but is also applicable to other segmented-mirror telescopes. The software is capable of determining optical-wavefront information using, as input, a variable number of irradiance measurements collected in defocus planes about the best focal position. The software also uses input of the geometrical definition of the telescope exit pupil (otherwise denoted the pupil mask) to identify the locations of the segments of the primary telescope mirror. From the irradiance data and mask information, the software calculates an estimate of the optical wavefront (a measure of performance) of the telescope generally and across each primary mirror segment specifically. The software is capable of generating irradiance data, wavefront estimates, and basis functions for the full telescope and for each primary-mirror segment. Optionally, each of these pieces of information can be measured or computed outside of the software and incorporated during execution of the software.
Parallel optimization algorithms and their implementation in VLSI design
NASA Technical Reports Server (NTRS)
Lee, G.; Feeley, J. J.
1991-01-01
Two new parallel optimization algorithms based on the simplex method are described. They may be executed by a SIMD parallel processor architecture and be implemented in VLSI design. Several VLSI design implementations are introduced. An application example is reported to demonstrate that the algorithms are effective.
Implementation and analysis of a fast backprojection algorithm
NASA Astrophysics Data System (ADS)
Gorham, LeRoy A.; Majumder, Uttam K.; Buxa, Peter; Backues, Mark J.; Lindgren, Andrew C.
2006-05-01
The convolution backprojection algorithm is an accurate synthetic aperture radar imaging technique, but it has seen limited use in the radar community due to its high computational costs. Therefore, significant research has been conducted for a fast backprojection algorithm, which surrenders some image quality for increased computational efficiency. This paper describes an implementation of both a standard convolution backprojection algorithm and a fast backprojection algorithm optimized for use on a Linux cluster and a field-programmable gate array (FPGA) based processing system. The performance of the different implementations is compared using synthetic ideal point targets and the SPIE XPatch Backhoe dataset.
An adaptive, lossless data compression algorithm and VLSI implementations
NASA Technical Reports Server (NTRS)
Venbrux, Jack; Zweigle, Greg; Gambles, Jody; Wiseman, Don; Miller, Warner H.; Yeh, Pen-Shu
1993-01-01
This paper first provides an overview of an adaptive, lossless, data compression algorithm originally devised by Rice in the early '70s. It then reports the development of a VLSI encoder/decoder chip set developed which implements this algorithm. A recent effort in making a space qualified version of the encoder is described along with several enhancements to the algorithm. The performance of the enhanced algorithm is compared with those from other currently available lossless compression techniques on multiple sets of test data. The results favor our implemented technique in many applications.
New algorithms for phase unwrapping: implementation and testing
NASA Astrophysics Data System (ADS)
Kotlicki, Krzysztof
1998-11-01
In this paper it is shown how the regularization theory was used for the new noise immune algorithm for phase unwrapping. The algorithm were developed by M. Servin, J.L. Marroquin and F.J. Cuevas in Centro de Investigaciones en Optica A.C. and Centro de Investigacion en Matematicas A.C. in Mexico. The theory was presented. The objective of the work was to implement the algorithm into the software able to perform the off-line unwrapping on the fringe pattern. The algorithms are present as well as the result and also the software developed for the implementation.
Implementation of the IDEA algorithm for image encryption
NASA Astrophysics Data System (ADS)
Dang, Philip P.; Chau, Paul M.
2000-11-01
In this paper, we present an implementation of the IDEA algorithm for image encryption. The image encryption is incorporated into the compression algorithm for transmission over a data network. In the proposed method, Embedded Wavelet Zero-tree Coding is used for image compression. Experimental results show that our proposed scheme enhances data security and reduces the network bandwidth required for video transmissions. A software implementation and system architecture for hardware implementation of the IDEA image encryption algorithm based on Field Programmable Gate Array (FPGA) technology are presented in this paper.
An Efficient Implementation of the Gliding Box Lacunarity Algorithm
Charles R. Tolle,; Timothy R. McJunkin; David J. Gorsich
2008-03-01
Lacunarity is a measure of how data fills space. It complements fractal dimension, which measures how much space is filled. Currently, many researchers use the gliding box algorithm for calculating lacunarity. This paper introduces a fast algorithm for making this calculation. The algorithm presented is akin to fast box counting algorithms used by some researchers in estimating fractal dimension. A simplified gliding box measure equation along with key pseudo code implementations for the algorithm are presented. Applications for the gliding box lacunarity measure have included subjects that range from biological community modeling to target detection.
An Agent Inspired Reconfigurable Computing Implementation of a Genetic Algorithm
NASA Technical Reports Server (NTRS)
Weir, John M.; Wells, B. Earl
2003-01-01
Many software systems have been successfully implemented using an agent paradigm which employs a number of independent entities that communicate with one another to achieve a common goal. The distributed nature of such a paradigm makes it an excellent candidate for use in high speed reconfigurable computing hardware environments such as those present in modem FPGA's. In this paper, a distributed genetic algorithm that can be applied to the agent based reconfigurable hardware model is introduced. The effectiveness of this new algorithm is evaluated by comparing the quality of the solutions found by the new algorithm with those found by traditional genetic algorithms. The performance of a reconfigurable hardware implementation of the new algorithm on an FPGA is compared to traditional single processor implementations.
Multiple Lookup Table-Based AES Encryption Algorithm Implementation
NASA Astrophysics Data System (ADS)
Gong, Jin; Liu, Wenyi; Zhang, Huixin
Anew AES (Advanced Encryption Standard) encryption algorithm implementation was proposed in this paper. It is based on five lookup tables, which are generated from S-box(the substitution table in AES). The obvious advantages are reducing the code-size, improving the implementation efficiency, and helping new learners to understand the AES encryption algorithm and GF(28) multiplication which are necessary to correctly implement AES[1]. This method can be applied on processors with word length 32 or above, FPGA and others. And correspondingly we can implement it by VHDL, Verilog, VB and other languages.
Nuclear magnetic resonance implementation of a quantum clock synchronization algorithm
Zhang Jingfu; Long, G.C; Liu Wenzhang; Deng Zhiwei; Lu Zhiheng
2004-12-01
The quantum clock synchronization (QCS) algorithm proposed by Chuang [Phys. Rev. Lett. 85, 2006 (2000)] has been implemented in a three qubit nuclear magnetic resonance quantum system. The time difference between two separated clocks can be determined by measuring the output states. The experimental realization of the QCS algorithm also demonstrates an application of the quantum phase estimation.
A fast portable implementation of the Secure Hash Algorithm, III.
McCurley, Kevin S.
1992-10-01
In 1992, NIST announced a proposed standard for a collision-free hash function. The algorithm for producing the hash value is known as the Secure Hash Algorithm (SHA), and the standard using the algorithm in known as the Secure Hash Standard (SHS). Later, an announcement was made that a scientist at NSA had discovered a weakness in the original algorithm. A revision to this standard was then announced as FIPS 180-1, and includes a slight change to the algorithm that eliminates the weakness. This new algorithm is called SHA-1. In this report we describe a portable and efficient implementation of SHA-1 in the C language. Performance information is given, as well as tips for porting the code to other architectures. We conclude with some observations on the efficiency of the algorithm, and a discussion of how the efficiency of SHA might be improved.
Algorithm implementation on the Navier-Stokes computer
NASA Technical Reports Server (NTRS)
Krist, Steven E.; Zang, Thomas A.
1987-01-01
The Navier-Stokes Computer is a multi-purpose parallel-processing supercomputer which is currently under development at Princeton University. It consists of multiple local memory parallel processors, called Nodes, which are interconnected in a hypercube network. Details of the procedures involved in implementing an algorithm on the Navier-Stokes computer are presented. The particular finite difference algorithm considered in this analysis was developed for simulation of laminar-turbulent transition in wall bounded shear flows. Projected timing results for implementing this algorithm indicate that operation rates in excess of 42 GFLOPS are feasible on a 128 Node machine.
Implementing a self-structuring data learning algorithm
NASA Astrophysics Data System (ADS)
Graham, James; Carson, Daniel; Ternovskiy, Igor
2016-05-01
In this paper, we elaborate on what we did to implement our self-structuring data learning algorithm. To recap, we are working to develop a data learning algorithm that will eventually be capable of goal driven pattern learning and extrapolation of more complex patterns from less complex ones. At this point we have developed a conceptual framework for the algorithm, but have yet to discuss our actual implementation and the consideration and shortcuts we needed to take to create said implementation. We will elaborate on our initial setup of the algorithm and the scenarios we used to test our early stage algorithm. While we want this to be a general algorithm, it is necessary to start with a simple scenario or two to provide a viable development and testing environment. To that end, our discussion will be geared toward what we include in our initial implementation and why, as well as what concerns we may have. In the future, we expect to be able to apply our algorithm to a more general approach, but to do so within a reasonable time, we needed to pick a place to start.
Rapid algorithm prototyping and implementation for power quality measurement
NASA Astrophysics Data System (ADS)
Kołek, Krzysztof; Piątek, Krzysztof
2015-12-01
This article presents a Model-Based Design (MBD) approach to rapidly implement power quality (PQ) metering algorithms. Power supply quality is a very important aspect of modern power systems and will become even more important in future smart grids. In this case, maintaining the PQ parameters at the desired level will require efficient implementation methods of the metering algorithms. Currently, the development of new, advanced PQ metering algorithms requires new hardware with adequate computational capability and time intensive, cost-ineffective manual implementations. An alternative, considered here, is an MBD approach. The MBD approach focuses on the modelling and validation of the model by simulation, which is well-supported by a Computer-Aided Engineering (CAE) packages. This paper presents two algorithms utilized in modern PQ meters: a phase-locked loop based on an Enhanced Phase Locked Loop (EPLL), and the flicker measurement according to the IEC 61000-4-15 standard. The algorithms were chosen because of their complexity and non-trivial development. They were first modelled in the MATLAB/Simulink package, then tested and validated in a simulation environment. The models, in the form of Simulink diagrams, were next used to automatically generate C code. The code was compiled and executed in real-time on the Zynq Xilinx platform that combines a reconfigurable Field Programmable Gate Array (FPGA) with a dual-core processor. The MBD development of PQ algorithms, automatic code generation, and compilation form a rapid algorithm prototyping and implementation path for PQ measurements. The main advantage of this approach is the ability to focus on the design, validation, and testing stages while skipping over implementation issues. The code generation process renders production-ready code that can be easily used on the target hardware. This is especially important when standards for PQ measurement are in constant development, and the PQ issues in emerging smart
Testing of hardware implementation of infrared image enhancing algorithm
NASA Astrophysics Data System (ADS)
Dulski, R.; Sosnowski, T.; PiÄ tkowski, T.; Trzaskawka, P.; Kastek, M.; Kucharz, J.
2012-10-01
The interpretation of IR images depends on radiative properties of observed objects and surrounding scenery. Skills and experience of an observer itself are also of great importance. The solution to improve the effectiveness of observation is utilization of algorithm of image enhancing capable to improve the image quality and the same effectiveness of object detection. The paper presents results of testing the hardware implementation of IR image enhancing algorithm based on histogram processing. Main issue in hardware implementation of complex procedures for image enhancing algorithms is high computational cost. As a result implementation of complex algorithms using general purpose processors and software usually does not bring satisfactory results. Because of high efficiency requirements and the need of parallel operation, the ALTERA's EP2C35F672 FPGA device was used. It provides sufficient processing speed combined with relatively low power consumption. A digital image processing and control module was designed and constructed around two main integrated circuits: a FPGA device and a microcontroller. Programmable FPGA device performs image data processing operations which requires considerable computing power. It also generates the control signals for array readout, performs NUC correction and bad pixel mapping, generates the control signals for display module and finally executes complex image processing algorithms. Implemented adaptive algorithm is based on plateau histogram equalization. Tests were performed on real IR images of different types of objects registered in different spectral bands. The simulations and laboratory experiments proved the correct operation of the designed system in executing the sophisticated image enhancement.
Efficient Implementation of the Backpropagation Algorithm in FPGAs and Microcontrollers.
Ortega-Zamorano, Francisco; Jerez, Jose M; Urda Munoz, Daniel; Luque-Baena, Rafael M; Franco, Leonardo
2016-09-01
The well-known backpropagation learning algorithm is implemented in a field-programmable gate array (FPGA) board and a microcontroller, focusing in obtaining efficient implementations in terms of a resource usage and computational speed. The algorithm was implemented in both cases using a training/validation/testing scheme in order to avoid overfitting problems. For the case of the FPGA implementation, a new neuron representation that reduces drastically the resource usage was introduced by combining the input and first hidden layer units in a single module. Further, a time-division multiplexing scheme was implemented for carrying out product computations taking advantage of the built-in digital signal processor cores. In both implementations, the floating-point data type representation normally used in a personal computer (PC) has been changed to a more efficient one based on a fixed-point scheme, reducing system memory variable usage and leading to an increase in computation speed. The results show that the modifications proposed produced a clear increase in computation speed in comparison with the standard PC-based implementation, demonstrating the usefulness of the intrinsic parallelism of FPGAs in neurocomputational tasks and the suitability of both implementations of the algorithm for its application to the real world problems. PMID:26277004
Outline of a fast hardware implementation of Winograd's DFT algorithm
NASA Technical Reports Server (NTRS)
Zohar, S.
1980-01-01
The main characteristics of the discrete Fourier transform (DFT) algorithm considered by Winograd (1976) is a significant reduction in the number of multiplications. Its primary disadvantage is a higher structural complexity. It is, therefore, difficult to translate the reduced number of multiplications into faster execution of the DFT by means of a software implementation of the algorithm. For this reason, a hardware implementation is considered in the current study, taking into account a design based on the algorithm prescription discussed by Zohar (1979). The hardware implementation of a FORTRAN subroutine is proposed, giving attention to a pipelining scheme in which 5 consecutive data batches are being operated on simultaneously, each batch undergoing one of 5 processing phases.
Implementation of modified SPIHT algorithm for Compression of images
NASA Astrophysics Data System (ADS)
Kurume, A. V.; Yana, D. M.
2011-12-01
We present a throughput-efficient FPGA implementation of the Set Partitioning in Hierarchical Trees (SPIHT) algorithm for compression of images. The SPIHT uses inherent redundancy among wavelet coefficients and suited for both grey and color images. The SPIHT algorithm uses dynamic data structure which hinders hardware realization. we have modified basic SPIHT in two ways, one by using static (fixed) mappings which represent significant information and the other by interchanging the sorting and refinement passes.
On the design, analysis, and implementation of efficient parallel algorithms
Sohn, S.M.
1989-01-01
There is considerable interest in developing algorithms for a variety of parallel computer architectures. This is not a trivial problem, although for certain models great progress has been made. Recently, general-purpose parallel machines have become available commercially. These machines possess widely varying interconnection topologies and data/instruction access schemes. It is important, therefore, to develop methodologies and design paradigms for not only synthesizing parallel algorithms from initial problem specifications, but also for mapping algorithms between different architectures. This work has considered both of these problems. A systolic array consists of a large collection of simple processors that are interconnected in a uniform pattern. The author has studied in detain the problem of mapping systolic algorithms onto more general-purpose parallel architectures such as the hypercube. The hypercube architecture is notable due to its symmetry and high connectivity, characteristics which are conducive to the efficient embedding of parallel algorithms. Although the parallel-to-parallel mapping techniques have yielded efficient target algorithms, it is not surprising that an algorithm designed directly for a particular parallel model would achieve superior performance. In this context, the author has developed hypercube algorithms for some important problems in speech and signal processing, text processing, language processing and artificial intelligence. These algorithms were implemented on a 64-node NCUBE/7 hypercube machine in order to evaluate their performance.
Implementation and testing of algorithms for data fitting
NASA Astrophysics Data System (ADS)
Monahan, Alison; Engelhardt, Larry
2012-03-01
This poster will describe an undergraduate senior research project involving the creation and testing of a java class to implement the Nelder-Mead algorithm, which can be used for data fitting. The performance between the Nelder-Mead algorithm and the Levenberg-Marquardt algorithm will be compared using a variety of different data. The new class will be made available at http://www.compadre.org/osp/items/detail.cfm?ID=11593. At the time of the presentation, this project will be nearing completion; and I will discuss my progress, successes, and challenges.
Efficient implementation of the adaptive scale pixel decomposition algorithm
NASA Astrophysics Data System (ADS)
Zhang, L.; Bhatnagar, S.; Rau, U.; Zhang, M.
2016-08-01
Context. Most popular algorithms in use to remove the effects of a telescope's point spread function (PSF) in radio astronomy are variants of the CLEAN algorithm. Most of these algorithms model the sky brightness using the delta-function basis, which results in undesired artefacts when used to image extended emission. The adaptive scale pixel decomposition (Asp-Clean) algorithm models the sky brightness on a scale-sensitive basis and thus gives a significantly better imaging performance when imaging fields that contain both resolved and unresolved emission. Aims: However, the runtime cost of Asp-Clean is higher than that of scale-insensitive algorithms. In this paper, we identify the most expensive step in the original Asp-Clean algorithm and present an efficient implementation of it, which significantly reduces the computational cost while keeping the imaging performance comparable to the original algorithm. The PSF sidelobe levels of modern wide-band telescopes are significantly reduced, allowing us to make approximations to reduce the computational cost, which in turn allows for the deconvolution of larger images on reasonable timescales. Methods: As in the original algorithm, scales in the image are estimated through function fitting. Here we introduce an analytical method to model extended emission, and a modified method for estimating the initial values used for the fitting procedure, which ultimately leads to a lower computational cost. Results: The new implementation was tested with simulated EVLA data and the imaging performance compared well with the original Asp-Clean algorithm. Tests show that the current algorithm can recover features at different scales with lower computational cost.
NMR implementation of adiabatic SAT algorithm using strongly modulated pulses.
Mitra, Avik; Mahesh, T S; Kumar, Anil
2008-03-28
NMR implementation of adiabatic algorithms face severe problems in homonuclear spin systems since the qubit selective pulses are long and during this period, evolution under the Hamiltonian and decoherence cause errors. The decoherence destroys the answer as it causes the final state to evolve to mixed state and in homonuclear systems, evolution under the internal Hamiltonian causes phase errors preventing the initial state to converge to the solution state. The resolution of these issues is necessary before one can proceed to implement an adiabatic algorithm in a large system where homonuclear coupled spins will become a necessity. In the present work, we demonstrate that by using "strongly modulated pulses" (SMPs) for the creation of interpolating Hamiltonian, one can circumvent both the problems and successfully implement the adiabatic SAT algorithm in a homonuclear three qubit system. This work also demonstrates that the SMPs tremendously reduce the time taken for the implementation of the algorithm, can overcome problems associated with decoherence, and will be the modality in future implementation of quantum information processing by NMR. PMID:18376911
NMR implementation of adiabatic SAT algorithm using strongly modulated pulses
NASA Astrophysics Data System (ADS)
Mitra, Avik; Mahesh, T. S.; Kumar, Anil
2008-03-01
NMR implementation of adiabatic algorithms face severe problems in homonuclear spin systems since the qubit selective pulses are long and during this period, evolution under the Hamiltonian and decoherence cause errors. The decoherence destroys the answer as it causes the final state to evolve to mixed state and in homonuclear systems, evolution under the internal Hamiltonian causes phase errors preventing the initial state to converge to the solution state. The resolution of these issues is necessary before one can proceed to implement an adiabatic algorithm in a large system where homonuclear coupled spins will become a necessity. In the present work, we demonstrate that by using "strongly modulated pulses" (SMPs) for the creation of interpolating Hamiltonian, one can circumvent both the problems and successfully implement the adiabatic SAT algorithm in a homonuclear three qubit system. This work also demonstrates that the SMPs tremendously reduce the time taken for the implementation of the algorithm, can overcome problems associated with decoherence, and will be the modality in future implementation of quantum information processing by NMR.
A novel pipeline based FPGA implementation of a genetic algorithm
NASA Astrophysics Data System (ADS)
Thirer, Nonel
2014-05-01
To solve problems when an analytical solution is not available, more and more bio-inspired computation techniques have been applied in the last years. Thus, an efficient algorithm is the Genetic Algorithm (GA), which imitates the biological evolution process, finding the solution by the mechanism of "natural selection", where the strong has higher chances to survive. A genetic algorithm is an iterative procedure which operates on a population of individuals called "chromosomes" or "possible solutions" (usually represented by a binary code). GA performs several processes with the population individuals to produce a new population, like in the biological evolution. To provide a high speed solution, pipelined based FPGA hardware implementations are used, with a nstages pipeline for a n-phases genetic algorithm. The FPGA pipeline implementations are constraints by the different execution time of each stage and by the FPGA chip resources. To minimize these difficulties, we propose a bio-inspired technique to modify the crossover step by using non identical twins. Thus two of the chosen chromosomes (parents) will build up two new chromosomes (children) not only one as in classical GA. We analyze the contribution of this method to reduce the execution time in the asynchronous and synchronous pipelines and also the possibility to a cheaper FPGA implementation, by using smaller populations. The full hardware architecture for a FPGA implementation to our target ALTERA development card is presented and analyzed.
Implementing Linear Algebra Related Algorithms on the TI-92+ Calculator.
ERIC Educational Resources Information Center
Alexopoulos, John; Abraham, Paul
2001-01-01
Demonstrates a less utilized feature of the TI-92+: its natural and powerful programming language. Shows how to implement several linear algebra related algorithms including the Gram-Schmidt process, Least Squares Approximations, Wronskians, Cholesky Decompositions, and Generalized Linear Least Square Approximations with QR Decompositions.…
Efficient implementations of hyperspectral chemical-detection algorithms
NASA Astrophysics Data System (ADS)
Brett, Cory J. C.; DiPietro, Robert S.; Manolakis, Dimitris G.; Ingle, Vinay K.
2013-10-01
Many military and civilian applications depend on the ability to remotely sense chemical clouds using hyperspectral imagers, from detecting small but lethal concentrations of chemical warfare agents to mapping plumes in the aftermath of natural disasters. Real-time operation is critical in these applications but becomes diffcult to achieve as the number of chemicals we search for increases. In this paper, we present efficient CPU and GPU implementations of matched-filter based algorithms so that real-time operation can be maintained with higher chemical-signature counts. The optimized C++ implementations show between 3x and 9x speedup over vectorized MATLAB implementations.
A distributed Canny edge detector: algorithm and FPGA implementation.
Xu, Qian; Varadarajan, Srenivas; Chakrabarti, Chaitali; Karam, Lina J
2014-07-01
The Canny edge detector is one of the most widely used edge detection algorithms due to its superior performance. Unfortunately, not only is it computationally more intensive as compared with other edge detection algorithms, but it also has a higher latency because it is based on frame-level statistics. In this paper, we propose a mechanism to implement the Canny algorithm at the block level without any loss in edge detection performance compared with the original frame-level Canny algorithm. Directly applying the original Canny algorithm at the block-level leads to excessive edges in smooth regions and to loss of significant edges in high-detailed regions since the original Canny computes the high and low thresholds based on the frame-level statistics. To solve this problem, we present a distributed Canny edge detection algorithm that adaptively computes the edge detection thresholds based on the block type and the local distribution of the gradients in the image block. In addition, the new algorithm uses a nonuniform gradient magnitude histogram to compute block-based hysteresis thresholds. The resulting block-based algorithm has a significantly reduced latency and can be easily integrated with other block-based image codecs. It is capable of supporting fast edge detection of images and videos with high resolutions, including full-HD since the latency is now a function of the block size instead of the frame size. In addition, quantitative conformance evaluations and subjective tests show that the edge detection performance of the proposed algorithm is better than the original frame-based algorithm, especially when noise is present in the images. Finally, this algorithm is implemented using a 32 computing engine architecture and is synthesized on the Xilinx Virtex-5 FPGA. The synthesized architecture takes only 0.721 ms (including the SRAM READ/WRITE time and the computation time) to detect edges of 512 × 512 images in the USC SIPI database when clocked at 100
Implementation of realistic image rendition algorithm based on DSP
NASA Astrophysics Data System (ADS)
Lv, Lily; Gao, Kun; Ni, Guoqiang; Zhou, Liwei; Shao, Xiaoguang
2010-11-01
Realistic image rendition is to reproduce the human perception of natural scenes. Retinex is a classical algorithm that simultaneously provides high dynamic range compression contrast and color constancy of an image. In this paper, we discuss a design of a digital signal processor (DSP) implementation of the single scale monochromatic Retinex algorithm. The target processor is Texas Instruments TMS320DM642, a 32-bit fix point DSP which is clocked at 600 MHz. This DSP hardware platform designed is of powerful consumption and video image processing capability. We give an overview of the DSP hardware and software, and discuss some feasible optimizations to achieve a real-time version of the Retinex algorithm. In the end, the performance of the algorithm executing on DSP platform is shown.
Experimental implementation of Grover's search algorithm with neutral atom qubits
NASA Astrophysics Data System (ADS)
Sun, Yuan; Lichtman, Martin; Baker, Kevin; Saffman, Mark
2016-05-01
Grover's algorithm for searching an unsorted data base provides a provable speedup over the best possible classical search and is therefore a test bed for demonstrating the power of quantum computation. The algorithm has been demonstrated with NMR, trapped ion, photonic, and superconducting hardware, but only with two qubits encoding a four element database. We report on progress towards experimental demonstration of Grover's algorithm using two and three neutral atom qubits encoding a database with up to eight elements. Our approach uses a Rydberg blockade Ck NOT gate for efficient implementation of the Grover iterations. Quantum Monte Carlo simulations of the algorithm performance that account for gate errors and decoherence rates are compared with experimental results. Work supported by the IARPA MQCO program.
The Sys-Rem Detrending Algorithm: Implementation and Testing
NASA Astrophysics Data System (ADS)
Mazeh, T.; Tamuz, O.; Zucker, S.
2007-07-01
Sys-Rem (Tamuz, Mazeh & Zucker 2005) is a detrending algorithm designed to remove systematic effects in a large set of light curves obtained by a photometric survey. The algorithm works without any prior knowledge of the effects, as long as they appear in many stars of the sample. This paper presents the basic principles of Sys-Rem and discusses a parameterization used to determine the number of effects removed. We assess the performance of Sys-Rem on simulated transits injected into WHAT survey data. This test is proposed as a general scheme to assess the effectiveness of detrending algorithms. Application of Sys-Rem to the OGLE dataset demonstrates the power of the algorithm. We offer a coded implementation of Sys-Rem to the community.
Implementing a Gaussian Process Learning Algorithm in Mixed Parallel Environment
Chandola, Varun; Vatsavai, Raju
2011-01-01
In this paper, we present a scalability analysis of a parallel Gaussian process training algorithm to simultaneously analyze a massive number of time series. We study three different parallel implementations: using threads, MPI, and a hybrid implementation using threads and MPI. We compare the scalability for the multi-threaded implementation on three different hardware platforms: a Mac desktop with two quad-core Intel Xeon processors (16 virtual cores), a Linux cluster node with four quad-core 2.3 GHz AMD Opteron processors, and SGI Altix ICE 8200 cluster node with two quad-core Intel Xeon processors (16 virtual cores). We also study the scalability of the MPI based and the hybrid MPI and thread based implementations on the SGI cluster with 128 nodes (2048 cores). Experimental results show that the hybrid implementation scales better than the multi-threaded and MPI based implementations. The hybrid implementation, using 1536 cores, can analyze a remote sensing data set with over 4 million time series in nearly 5 seconds while the serial algorithm takes nearly 12 hours to process the same data set.
3D video sequence reconstruction algorithms implemented on a DSP
NASA Astrophysics Data System (ADS)
Ponomaryov, V. I.; Ramos-Diaz, E.
2011-03-01
A novel approach for 3D image and video reconstruction is proposed and implemented. This is based on the wavelet atomic functions (WAF) that have demonstrated better approximation properties in different processing problems in comparison with classical wavelets. Disparity maps using WAF are formed, and then they are employed in order to present 3D visualization using color anaglyphs. Additionally, the compression via Pth law is performed to improve the disparity map quality. Other approaches such as optical flow and stereo matching algorithm are also implemented as the comparative approaches. Numerous simulation results have justified the efficiency of the novel framework. The implementation of the proposed algorithm on the Texas Instruments DSP TMS320DM642 permits to demonstrate possible real time processing mode during 3D video reconstruction for images and video sequences.
Bomble, Laeetitia; Pellegrini, Philippe; Ghesquiere, Pierre; Desouter-Lecomte, Michele
2010-12-15
We numerically investigate the possibilities of driving quantum algorithms with laser pulses in a register of ultracold NaCs polar molecules in a static electric field. We focus on the possibilities of performing scalable logical operations by considering circuits that involve intermolecular gates (implemented on adjacent interacting molecules) to enable the transfer of information from one molecule to another during conditional laser-driven population inversions. We study the implementation of an arithmetic operation (the addition of 0 or 1 on a binary digit and a carry in) which requires population inversions only and the Deutsch-Jozsa algorithm which requires a control of the phases. Under typical experimental conditions, our simulations show that high-fidelity logical operations involving several qubits can be performed in a time scale of a few hundreds of microseconds, opening promising perspectives for the manipulation of a large number of qubits in these systems.
Efficient Hardware Implementation of the Lightweight Block Encryption Algorithm LEA
Lee, Donggeon; Kim, Dong-Chan; Kwon, Daesung; Kim, Howon
2014-01-01
Recently, due to the advent of resource-constrained trends, such as smartphones and smart devices, the computing environment is changing. Because our daily life is deeply intertwined with ubiquitous networks, the importance of security is growing. A lightweight encryption algorithm is essential for secure communication between these kinds of resource-constrained devices, and many researchers have been investigating this field. Recently, a lightweight block cipher called LEA was proposed. LEA was originally targeted for efficient implementation on microprocessors, as it is fast when implemented in software and furthermore, it has a small memory footprint. To reflect on recent technology, all required calculations utilize 32-bit wide operations. In addition, the algorithm is comprised of not complex S-Box-like structures but simple Addition, Rotation, and XOR operations. To the best of our knowledge, this paper is the first report on a comprehensive hardware implementation of LEA. We present various hardware structures and their implementation results according to key sizes. Even though LEA was originally targeted at software efficiency, it also shows high efficiency when implemented as hardware. PMID:24406859
Efficient hardware implementation of the lightweight block encryption algorithm LEA.
Lee, Donggeon; Kim, Dong-Chan; Kwon, Daesung; Kim, Howon
2014-01-01
Recently, due to the advent of resource-constrained trends, such as smartphones and smart devices, the computing environment is changing. Because our daily life is deeply intertwined with ubiquitous networks, the importance of security is growing. A lightweight encryption algorithm is essential for secure communication between these kinds of resource-constrained devices, and many researchers have been investigating this field. Recently, a lightweight block cipher called LEA was proposed. LEA was originally targeted for efficient implementation on microprocessors, as it is fast when implemented in software and furthermore, it has a small memory footprint. To reflect on recent technology, all required calculations utilize 32-bit wide operations. In addition, the algorithm is comprised of not complex S-Box-like structures but simple Addition, Rotation, and XOR operations. To the best of our knowledge, this paper is the first report on a comprehensive hardware implementation of LEA. We present various hardware structures and their implementation results according to key sizes. Even though LEA was originally targeted at software efficiency, it also shows high efficiency when implemented as hardware. PMID:24406859
FPGA implementation of digital down converter using CORDIC algorithm
NASA Astrophysics Data System (ADS)
Agarwal, Ashok; Lakshmi, Boppana
2013-01-01
In radio receivers, Digital Down Converters (DDC) are used to translate the signal from Intermediate Frequency level to baseband. It also decimates the oversampled signal to a lower sample rate, eliminating the need of a high end digital signal processors. In this paper we have implemented architecture for DDC employing CORDIC algorithm, which down converts an IF signal of 70MHz (3G) to 200 KHz baseband GSM signal, with an SFDR greater than 100dB. The implemented architecture reduces the hardware resource requirements by 15 percent when compared with other architecture available in the literature due to elimination of explicit multipliers and a quadrature phase shifter for mixing.
VIRTEX-5 Fpga Implementation of Advanced Encryption Standard Algorithm
NASA Astrophysics Data System (ADS)
Rais, Muhammad H.; Qasim, Syed M.
2010-06-01
In this paper, we present an implementation of Advanced Encryption Standard (AES) cryptographic algorithm using state-of-the-art Virtex-5 Field Programmable Gate Array (FPGA). The design is coded in Very High Speed Integrated Circuit Hardware Description Language (VHDL). Timing simulation is performed to verify the functionality of the designed circuit. Performance evaluation is also done in terms of throughput and area. The design implemented on Virtex-5 (XC5VLX50FFG676-3) FPGA achieves a maximum throughput of 4.34 Gbps utilizing a total of 399 slices.
Implementation and Optimization of Image Processing Algorithms on Embedded GPU
NASA Astrophysics Data System (ADS)
Singhal, Nitin; Yoo, Jin Woo; Choi, Ho Yeol; Park, In Kyu
In this paper, we analyze the key factors underlying the implementation, evaluation, and optimization of image processing and computer vision algorithms on embedded GPU using OpenGL ES 2.0 shader model. First, we present the characteristics of the embedded GPU and its inherent advantage when compared to embedded CPU. Additionally, we propose techniques to achieve increased performance with optimized shader design. To show the effectiveness of the proposed techniques, we employ cartoon-style non-photorealistic rendering (NPR), speeded-up robust feature (SURF) detection, and stereo matching as our example algorithms. Performance is evaluated in terms of the execution time and speed-up achieved in comparison with the implementation on embedded CPU.
Implementing wide baseline matching algorithms on a graphics processing unit.
Rothganger, Fredrick H.; Larson, Kurt W.; Gonzales, Antonio Ignacio; Myers, Daniel S.
2007-10-01
Wide baseline matching is the state of the art for object recognition and image registration problems in computer vision. Though effective, the computational expense of these algorithms limits their application to many real-world problems. The performance of wide baseline matching algorithms may be improved by using a graphical processing unit as a fast multithreaded co-processor. In this paper, we present an implementation of the difference of Gaussian feature extractor, based on the CUDA system of GPU programming developed by NVIDIA, and implemented on their hardware. For a 2000x2000 pixel image, the GPU-based method executes nearly thirteen times faster than a comparable CPU-based method, with no significant loss of accuracy.
A high performance hardware implementation image encryption with AES algorithm
NASA Astrophysics Data System (ADS)
Farmani, Ali; Jafari, Mohamad; Miremadi, Seyed Sohrab
2011-06-01
This paper describes implementation of a high-speed encryption algorithm with high throughput for encrypting the image. Therefore, we select a highly secured symmetric key encryption algorithm AES(Advanced Encryption Standard), in order to increase the speed and throughput using pipeline technique in four stages, control unit based on logic gates, optimal design of multiplier blocks in mixcolumn phase and simultaneous production keys and rounds. Such procedure makes AES suitable for fast image encryption. Implementation of a 128-bit AES on FPGA of Altra company has been done and the results are as follow: throughput, 6 Gbps in 471MHz. The time of encrypting in tested image with 32*32 size is 1.15ms.
Control algorithm implementation for a redundant degree of freedom manipulator
NASA Technical Reports Server (NTRS)
Cohan, Steve
1991-01-01
This project's purpose is to develop and implement control algorithms for a kinematically redundant robotic manipulator. The manipulator is being developed concurrently by Odetics Inc., under internal research and development funding. This SBIR contract supports algorithm conception, development, and simulation, as well as software implementation and integration with the manipulator hardware. The Odetics Dexterous Manipulator is a lightweight, high strength, modular manipulator being developed for space and commercial applications. It has seven fully active degrees of freedom, is electrically powered, and is fully operational in 1 G. The manipulator consists of five self-contained modules. These modules join via simple quick-disconnect couplings and self-mating connectors which allow rapid assembly/disassembly for reconfiguration, transport, or servicing. Each joint incorporates a unique drive train design which provides zero backlash operation, is insensitive to wear, and is single fault tolerant to motor or servo amplifier failure. The sensing system is also designed to be single fault tolerant. Although the initial prototype is not space qualified, the design is well-suited to meeting space qualification requirements. The control algorithm design approach is to develop a hierarchical system with well defined access and interfaces at each level. The high level endpoint/configuration control algorithm transforms manipulator endpoint position/orientation commands to joint angle commands, providing task space motion. At the same time, the kinematic redundancy is resolved by controlling the configuration (pose) of the manipulator, using several different optimizing criteria. The center level of the hierarchy servos the joints to their commanded trajectories using both linear feedback and model-based nonlinear control techniques. The lowest control level uses sensed joint torque to close torque servo loops, with the goal of improving the manipulator dynamic behavior
Decoding the brain's algorithm for categorization from its neural implementation.
Mack, Michael L; Preston, Alison R; Love, Bradley C
2013-10-21
Acts of cognition can be described at different levels of analysis: what behavior should characterize the act, what algorithms and representations underlie the behavior, and how the algorithms are physically realized in neural activity [1]. Theories that bridge levels of analysis offer more complete explanations by leveraging the constraints present at each level [2-4]. Despite the great potential for theoretical advances, few studies of cognition bridge levels of analysis. For example, formal cognitive models of category decisions accurately predict human decision making [5, 6], but whether model algorithms and representations supporting category decisions are consistent with underlying neural implementation remains unknown. This uncertainty is largely due to the hurdle of forging links between theory and brain [7-9]. Here, we tackle this critical problem by using brain response to characterize the nature of mental computations that support category decisions to evaluate two dominant, and opposing, models of categorization. We found that brain states during category decisions were significantly more consistent with latent model representations from exemplar [5] rather than prototype theory [10, 11]. Representations of individual experiences, not the abstraction of experiences, are critical for category decision making. Holding models accountable for behavior and neural implementation provides a means for advancing more complete descriptions of the algorithms of cognition. PMID:24094852
Multiangle Implementation of Atmospheric Correction (MAIAC): 2. Aerosol Algorithm
NASA Technical Reports Server (NTRS)
Lyapustin, A.; Wang, Y.; Laszlo, I.; Kahn, R.; Korkin, S.; Remer, L.; Levy, R.; Reid, J. S.
2011-01-01
An aerosol component of a new multiangle implementation of atmospheric correction (MAIAC) algorithm is presented. MAIAC is a generic algorithm developed for the Moderate Resolution Imaging Spectroradiometer (MODIS), which performs aerosol retrievals and atmospheric correction over both dark vegetated surfaces and bright deserts based on a time series analysis and image-based processing. The MAIAC look-up tables explicitly include surface bidirectional reflectance. The aerosol algorithm derives the spectral regression coefficient (SRC) relating surface bidirectional reflectance in the blue (0.47 micron) and shortwave infrared (2.1 micron) bands; this quantity is prescribed in the MODIS operational Dark Target algorithm based on a parameterized formula. The MAIAC aerosol products include aerosol optical thickness and a fine-mode fraction at resolution of 1 km. This high resolution, required in many applications such as air quality, brings new information about aerosol sources and, potentially, their strength. AERONET validation shows that the MAIAC and MOD04 algorithms have similar accuracy over dark and vegetated surfaces and that MAIAC generally improves accuracy over brighter surfaces due to the SRC retrieval and explicit bidirectional reflectance factor characterization, as demonstrated for several U.S. West Coast AERONET sites. Due to its generic nature and developed angular correction, MAIAC performs aerosol retrievals over bright deserts, as demonstrated for the Solar Village Aerosol Robotic Network (AERONET) site in Saudi Arabia.
A bioinspired collision detection algorithm for VLSI implementation
NASA Astrophysics Data System (ADS)
Cuadri, J.; Linan, G.; Stafford, R.; Keil, M. S.; Roca, E.
2005-06-01
In this paper a bioinspired algorithm for collision detection is proposed, based on previous models of the locust (Locusta migratoria) visual system reported by F.C. Rind and her group, in the University of Newcastle-upon-Tyne. The algorithm is suitable for VLSI implementation in standard CMOS technologies as a system-on-chip for automotive applications. The working principle of the algorithm is to process a video stream that represents the current scenario, and to fire an alarm whenever an object approaches on a collision course. Moreover, it establishes a scale of warning states, from no danger to collision alarm, depending on the activity detected in the current scenario. In the worst case, the minimum time before collision at which the model fires the collision alarm is 40 msec (1 frame before, at 25 frames per second). Since the average time to successfully fire an airbag system is 2 msec, even in the worst case, this algorithm would be very helpful to more efficiently arm the airbag system, or even take some kind of collision avoidance countermeasures. Furthermore, two additional modules have been included: a "Topological Feature Estimator" and an "Attention Focusing Algorithm". The former takes into account the shape of the approaching object to decide whether it is a person, a road line or a car. This helps to take more adequate countermeasures and to filter false alarms. The latter centres the processing power into the most active zones of the input frame, thus saving memory and processing time resources.
Experience with a Genetic Algorithm Implemented on a Multiprocessor Computer
NASA Technical Reports Server (NTRS)
Plassman, Gerald E.; Sobieszczanski-Sobieski, Jaroslaw
2000-01-01
Numerical experiments were conducted to find out the extent to which a Genetic Algorithm (GA) may benefit from a multiprocessor implementation, considering, on one hand, that analyses of individual designs in a population are independent of each other so that they may be executed concurrently on separate processors, and, on the other hand, that there are some operations in a GA that cannot be so distributed. The algorithm experimented with was based on a gaussian distribution rather than bit exchange in the GA reproductive mechanism, and the test case was a hub frame structure of up to 1080 design variables. The experimentation engaging up to 128 processors confirmed expectations of radical elapsed time reductions comparing to a conventional single processor implementation. It also demonstrated that the time spent in the non-distributable parts of the algorithm and the attendant cross-processor communication may have a very detrimental effect on the efficient utilization of the multiprocessor machine and on the number of processors that can be used effectively in a concurrent manner. Three techniques were devised and tested to mitigate that effect, resulting in efficiency increasing to exceed 99 percent.
Implementation of several mathematical algorithms to breast tissue density classification
NASA Astrophysics Data System (ADS)
Quintana, C.; Redondo, M.; Tirao, G.
2014-02-01
The accuracy of mammographic abnormality detection methods is strongly dependent on breast tissue characteristics, where a dense breast tissue can hide lesions causing cancer to be detected at later stages. In addition, breast tissue density is widely accepted to be an important risk indicator for the development of breast cancer. This paper presents the implementation and the performance of different mathematical algorithms designed to standardize the categorization of mammographic images, according to the American College of Radiology classifications. These mathematical techniques are based on intrinsic properties calculations and on comparison with an ideal homogeneous image (joint entropy, mutual information, normalized cross correlation and index Q) as categorization parameters. The algorithms evaluation was performed on 100 cases of the mammographic data sets provided by the Ministerio de Salud de la Provincia de Córdoba, Argentina—Programa de Prevención del Cáncer de Mama (Department of Public Health, Córdoba, Argentina, Breast Cancer Prevention Program). The obtained breast classifications were compared with the expert medical diagnostics, showing a good performance. The implemented algorithms revealed a high potentiality to classify breasts into tissue density categories.
Developing and Implementing the Data Mining Algorithms in RAVEN
Sen, Ramazan Sonat; Maljovec, Daniel Patrick; Alfonsi, Andrea; Rabiti, Cristian
2015-09-01
The RAVEN code is becoming a comprehensive tool to perform probabilistic risk assessment, uncertainty quantification, and verification and validation. The RAVEN code is being developed to support many programs and to provide a set of methodologies and algorithms for advanced analysis. Scientific computer codes can generate enormous amounts of data. To post-process and analyze such data might, in some cases, take longer than the initial software runtime. Data mining algorithms/methods help in recognizing and understanding patterns in the data, and thus discover knowledge in databases. The methodologies used in the dynamic probabilistic risk assessment or in uncertainty and error quantification analysis couple system/physics codes with simulation controller codes, such as RAVEN. RAVEN introduces both deterministic and stochastic elements into the simulation while the system/physics code model the dynamics deterministically. A typical analysis is performed by sampling values of a set of parameter values. A major challenge in using dynamic probabilistic risk assessment or uncertainty and error quantification analysis for a complex system is to analyze the large number of scenarios generated. Data mining techniques are typically used to better organize and understand data, i.e. recognizing patterns in the data. This report focuses on development and implementation of Application Programming Interfaces (APIs) for different data mining algorithms, and the application of these algorithms to different databases.
Algorithms and implementations of APT resonant control system
Wang, Yi-Ming; Regan, A.
1997-08-01
A digital signal processor is implemented to control resonant frequency of the RFQ prototype in APT/LEDA. Two schemes are implemented to calculate the resonant frequency of RFQ. One uses the measurement of the forward and reflected fields. The other uses the measurement of the forward and transmitted fields. The former is sensitive and accurate when the operation frequency is relatively far from the resonant frequency. The latter gives accurate results when the operation frequency is close to the resonant frequency. Linearized algorithms are derived to calculate the resonant frequency of the RFQ efficiently using a fixed-point DSP. The control frequency range is about 100kHz for 350MHz operation frequency. A frequency agile scheme is employed using a dual direct digital synthesizer to drive klystron at the cavity`s resonant frequency (not necessarily the required beam resonant frequency) in power-up mode to quickly the cavity to the desired resonant frequency. This paper will address the algorithm implementation, error analysis, as well as related hardware design issues.
Infrared Jitter Imaging Data Reduction: Algorithms and Implementation
NASA Astrophysics Data System (ADS)
Devillard, Nicolas
Jitter imaging (also known as microscanning) is probably one of the most efficient ways to perform astronomical observations in the infrared. It requires very efficient filtering and recentering methods to produce the best possible output from raw data. This paper discusses issues attached to Poisson offset generation, efficient infrared sky filtering, offset recovery between planes through cross-correlation and/or point pattern recognition techniques, plane shifting with subpixel resolution through various kernel-based interpolation schemes, and 3D filtering for plane accumulation. Several algorithms are described for each step, having in mind an automatic data processing in pipeline mode (i.e., without user interaction) as intended for the Very Large Telescope. Implementation of these algorithms in optimized ANSI C (the eclipse library) is also described here.
Kodiak: An Implementation Framework for Branch and Bound Algorithms
NASA Technical Reports Server (NTRS)
Smith, Andrew P.; Munoz, Cesar A.; Narkawicz, Anthony J.; Markevicius, Mantas
2015-01-01
Recursive branch and bound algorithms are often used to refine and isolate solutions to several classes of global optimization problems. A rigorous computation framework for the solution of systems of equations and inequalities involving nonlinear real arithmetic over hyper-rectangular variable and parameter domains is presented. It is derived from a generic branch and bound algorithm that has been formally verified, and utilizes self-validating enclosure methods, namely interval arithmetic and, for polynomials and rational functions, Bernstein expansion. Since bounds computed by these enclosure methods are sound, this approach may be used reliably in software verification tools. Advantage is taken of the partial derivatives of the constraint functions involved in the system, firstly to reduce the branching factor by the use of bisection heuristics and secondly to permit the computation of bifurcation sets for systems of ordinary differential equations. The associated software development, Kodiak, is presented, along with examples of three different branch and bound problem types it implements.
Neural network implementations of data association algorithms for sensor fusion
NASA Technical Reports Server (NTRS)
Brown, Donald E.; Pittard, Clarence L.; Martin, Worthy N.
1989-01-01
The paper is concerned with locating a time varying set of entities in a fixed field when the entities are sensed at discrete time instances. At a given time instant a collection of bivariate Gaussian sensor reports is produced, and these reports estimate the location of a subset of the entities present in the field. A database of reports is maintained, which ideally should contain one report for each entity sensed. Whenever a collection of sensor reports is received, the database must be updated to reflect the new information. This updating requires association processing between the database reports and the new sensor reports to determine which pairs of sensor and database reports correspond to the same entity. Algorithms for performing this association processing are presented. Neural network implementation of the algorithms, along with simulation results comparing the approaches are provided.
Neural network fusion capabilities for efficient implementation of tracking algorithms
NASA Astrophysics Data System (ADS)
Sundareshan, Malur K.; Amoozegar, Farid
1996-05-01
The ability to efficiently fuse information of different forms for facilitating intelligent decision-making is one of the major capabilities of trained multilayer neural networks that is being recognized int eh recent times. While development of innovative adaptive control algorithms for nonlinear dynamical plants which attempt to exploit these capabilities seems to be more popular, a corresponding development of nonlinear estimation algorithms using these approaches, particularly for application in target surveillance and guidance operations, has not received similar attention. In this paper we describe the capabilities and functionality of neural network algorithms for data fusion and implementation of nonlinear tracking filters. For a discussion of details and for serving as a vehicle for quantitative performance evaluations, the illustrative case of estimating the position and velocity of surveillance targets is considered. Efficient target tracking algorithms that can utilize data from a host of sensing modalities and are capable of reliably tracking even uncooperative targets executing fast and complex maneuvers are of interest in a number of applications. The primary motivation for employing neural networks in these applications comes form the efficiency with which more features extracted from different sensor measurements can be utilized as inputs for estimating target maneuvers. Such an approach results in an overall nonlinear tracking filter which has several advantages over the popular efforts at designing nonlinear estimation algorithms for tracking applications, the principle one being the reduction of mathematical and computational complexities. A system architecture that efficiently integrates the processing capabilities of a trained multilayer neural net with the tracking performance of a Kalman filter is described in this paper.
A pipelined FPGA implementation of an encryption algorithm based on genetic algorithm
NASA Astrophysics Data System (ADS)
Thirer, Nonel
2013-05-01
With the evolution of digital data storage and exchange, it is essential to protect the confidential information from every unauthorized access. High performance encryption algorithms were developed and implemented by software and hardware. Also many methods to attack the cipher text were developed. In the last years, the genetic algorithm has gained much interest in cryptanalysis of cipher texts and also in encryption ciphers. This paper analyses the possibility to use the genetic algorithm as a multiple key sequence generator for an AES (Advanced Encryption Standard) cryptographic system, and also to use a three stages pipeline (with four main blocks: Input data, AES Core, Key generator, Output data) to provide a fast encryption and storage/transmission of a large amount of data.
DSP Implementation of the Multiscale Retinex Image Enhancement Algorithm
NASA Technical Reports Server (NTRS)
Hines, Glenn D.; Rahman, Zia-ur; Jobson, Daniel J.; Woodell, Glenn A.
2004-01-01
The Retinex is a general-purpose image enhancement algorithm that is used to produce good visual representations of scenes. It performs a non-linear spatial/ spectral transform that synthesizes strong local contrast enhancement and color constancy. A real-time, video frame rate implementation of the Retinex is required to meet the needs of various potential users. Retinex processing contains a relatively large number of complex computations, thus to achieve real-time performance using current technologies requires specialized hardware and software. In this paper we discuss the design and development of a digital signal processor (DSP) implementation of the Retinex. The target processor is a Texas Instruments TMS320C6711 floating point DSP. NTSC video is captured using a dedicated frame grabber card, Retinex processed, and displayed on a standard monitor. We discuss the optimizations used to achieve real-time performance of the Retinex and also describe our future plans on using alternative architectures.
DSP Implementation of the Retinex Image Enhancement Algorithm
NASA Technical Reports Server (NTRS)
Hines, Glenn; Rahman, Zia-Ur; Jobson, Daniel; Woodell, Glenn
2004-01-01
The Retinex is a general-purpose image enhancement algorithm that is used to produce good visual representations of scenes. It performs a non-linear spatial/spectral transform that synthesizes strong local contrast enhancement and color constancy. A real-time, video frame rate implementation of the Retinex is required to meet the needs of various potential users. Retinex processing contains a relatively large number of complex computations, thus to achieve real-time performance using current technologies requires specialized hardware and software. In this paper we discuss the design and development of a digital signal processor (DSP) implementation of the Retinex. The target processor is a Texas Instruments TMS320C6711 floating point DSP. NTSC video is captured using a dedicated frame-grabber card, Retinex processed, and displayed on a standard monitor. We discuss the optimizations used to achieve real-time performance of the Retinex and also describe our future plans on using alternative architectures.
Purgatorio - A new implementation of the Inferno algorithm
Wilson, B; Sonnad, V; Sterne, P; Isaacs, W
2005-03-29
For astrophysical applications, as well as modeling laser-produced plasmas, there is a continual need for equation-of-state data over a wide domain of physical conditions. This paper presents algorithmic aspects for computing the Helmholtz free energy of plasma electrons for temperatures spanning from a few Kelvin to several KeV, and densities ranging from essentially isolated ion conditions to such large compressions that most bound orbitals become delocalized. The objective is high precision results in order to compute pressure and other thermodynamic quantities by numerical differentiation. This approach has the advantage that internal thermodynamic self-consistency is ensured, regardless of the specific physical model, but at the cost of very stringent numerical tolerances for each operation. The computational aspects we address in this paper are faced by any model that relies on input from the quantum mechanical spectrum of a spherically symmetric Hamiltonian operator. The particular physical model we employ is that of INFERNO; of a spherically averaged ion embedded in jellium. An overview of PURGATORIO, a new implementation of the INFERNO equation of state model, is presented. The new algorithm emphasizes a novel decimation scheme for automatically resolving the structure of the continuum density of states, circumventing limitations of the pseudo-R matrix algorithm previously utilized.
Neuromorphic implementations of neurobiological learning algorithms for spiking neural networks.
Walter, Florian; Röhrbein, Florian; Knoll, Alois
2015-12-01
The application of biologically inspired methods in design and control has a long tradition in robotics. Unlike previous approaches in this direction, the emerging field of neurorobotics not only mimics biological mechanisms at a relatively high level of abstraction but employs highly realistic simulations of actual biological nervous systems. Even today, carrying out these simulations efficiently at appropriate timescales is challenging. Neuromorphic chip designs specially tailored to this task therefore offer an interesting perspective for neurorobotics. Unlike Von Neumann CPUs, these chips cannot be simply programmed with a standard programming language. Like real brains, their functionality is determined by the structure of neural connectivity and synaptic efficacies. Enabling higher cognitive functions for neurorobotics consequently requires the application of neurobiological learning algorithms to adjust synaptic weights in a biologically plausible way. In this paper, we therefore investigate how to program neuromorphic chips by means of learning. First, we provide an overview over selected neuromorphic chip designs and analyze them in terms of neural computation, communication systems and software infrastructure. On the theoretical side, we review neurobiological learning techniques. Based on this overview, we then examine on-die implementations of these learning algorithms on the considered neuromorphic chips. A final discussion puts the findings of this work into context and highlights how neuromorphic hardware can potentially advance the field of autonomous robot systems. The paper thus gives an in-depth overview of neuromorphic implementations of basic mechanisms of synaptic plasticity which are required to realize advanced cognitive capabilities with spiking neural networks. PMID:26422422
Cascade Error Projection: A Learning Algorithm for Hardware Implementation
NASA Technical Reports Server (NTRS)
Duong, Tuan A.; Daud, Taher
1996-01-01
In this paper, we workout a detailed mathematical analysis for a new learning algorithm termed Cascade Error Projection (CEP) and a general learning frame work. This frame work can be used to obtain the cascade correlation learning algorithm by choosing a particular set of parameters. Furthermore, CEP learning algorithm is operated only on one layer, whereas the other set of weights can be calculated deterministically. In association with the dynamical stepsize change concept to convert the weight update from infinite space into a finite space, the relation between the current stepsize and the previous energy level is also given and the estimation procedure for optimal stepsize is used for validation of our proposed technique. The weight values of zero are used for starting the learning for every layer, and a single hidden unit is applied instead of using a pool of candidate hidden units similar to cascade correlation scheme. Therefore, simplicity in hardware implementation is also obtained. Furthermore, this analysis allows us to select from other methods (such as the conjugate gradient descent or the Newton's second order) one of which will be a good candidate for the learning technique. The choice of learning technique depends on the constraints of the problem (e.g., speed, performance, and hardware implementation); one technique may be more suitable than others. Moreover, for a discrete weight space, the theoretical analysis presents the capability of learning with limited weight quantization. Finally, 5- to 8-bit parity and chaotic time series prediction problems are investigated; the simulation results demonstrate that 4-bit or more weight quantization is sufficient for learning neural network using CEP. In addition, it is demonstrated that this technique is able to compensate for less bit weight resolution by incorporating additional hidden units. However, generation result may suffer somewhat with lower bit weight quantization.
All-Optical Implementation of the Ant Colony Optimization Algorithm
NASA Astrophysics Data System (ADS)
Hu, Wenchao; Wu, Kan; Shum, Perry Ping; Zheludev, Nikolay I.; Soci, Cesare
2016-05-01
We report all-optical implementation of the optimization algorithm for the famous “ant colony” problem. Ant colonies progressively optimize pathway to food discovered by one of the ants through identifying the discovered route with volatile chemicals (pheromones) secreted on the way back from the food deposit. Mathematically this is an important example of graph optimization problem with dynamically changing parameters. Using an optical network with nonlinear waveguides to represent the graph and a feedback loop, we experimentally show that photons traveling through the network behave like ants that dynamically modify the environment to find the shortest pathway to any chosen point in the graph. This proof-of-principle demonstration illustrates how transient nonlinearity in the optical system can be exploited to tackle complex optimization problems directly, on the hardware level, which may be used for self-routing of optical signals in transparent communication networks and energy flow in photonic systems.
All-Optical Implementation of the Ant Colony Optimization Algorithm.
Hu, Wenchao; Wu, Kan; Shum, Perry Ping; Zheludev, Nikolay I; Soci, Cesare
2016-01-01
We report all-optical implementation of the optimization algorithm for the famous "ant colony" problem. Ant colonies progressively optimize pathway to food discovered by one of the ants through identifying the discovered route with volatile chemicals (pheromones) secreted on the way back from the food deposit. Mathematically this is an important example of graph optimization problem with dynamically changing parameters. Using an optical network with nonlinear waveguides to represent the graph and a feedback loop, we experimentally show that photons traveling through the network behave like ants that dynamically modify the environment to find the shortest pathway to any chosen point in the graph. This proof-of-principle demonstration illustrates how transient nonlinearity in the optical system can be exploited to tackle complex optimization problems directly, on the hardware level, which may be used for self-routing of optical signals in transparent communication networks and energy flow in photonic systems. PMID:27222098
All-Optical Implementation of the Ant Colony Optimization Algorithm
Hu, Wenchao; Wu, Kan; Shum, Perry Ping; Zheludev, Nikolay I.; Soci, Cesare
2016-01-01
We report all-optical implementation of the optimization algorithm for the famous “ant colony” problem. Ant colonies progressively optimize pathway to food discovered by one of the ants through identifying the discovered route with volatile chemicals (pheromones) secreted on the way back from the food deposit. Mathematically this is an important example of graph optimization problem with dynamically changing parameters. Using an optical network with nonlinear waveguides to represent the graph and a feedback loop, we experimentally show that photons traveling through the network behave like ants that dynamically modify the environment to find the shortest pathway to any chosen point in the graph. This proof-of-principle demonstration illustrates how transient nonlinearity in the optical system can be exploited to tackle complex optimization problems directly, on the hardware level, which may be used for self-routing of optical signals in transparent communication networks and energy flow in photonic systems. PMID:27222098
An overview of SuperLU: Algorithms, implementation, and userinterface
Li, Xiaoye S.
2003-09-30
We give an overview of the algorithms, design philosophy,and implementation techniques in the software SuperLU, for solving sparseunsymmetric linear systems. In particular, we highlight the differencesbetween the sequential SuperLU (including its multithreaded extension)and parallel SuperLU_DIST. These include the numerical pivoting strategy,the ordering strategy for preserving sparsity, the ordering in which theupdating tasks are performed, the numerical kernel, and theparallelization strategy. Because of the scalability concern, theparallel code is drastically different from the sequential one. Wedescribe the user interfaces ofthe libraries, and illustrate how to usethe libraries most efficiently depending on some matrix characteristics.Finally, we give some examples of how the solver has been used inlarge-scale scientific applications, and the performance.
FPGA Implementation of Generalized Hebbian Algorithm for Texture Classification
Lin, Shiow-Jyu; Hwang, Wen-Jyi; Lee, Wei-Hao
2012-01-01
This paper presents a novel hardware architecture for principal component analysis. The architecture is based on the Generalized Hebbian Algorithm (GHA) because of its simplicity and effectiveness. The architecture is separated into three portions: the weight vector updating unit, the principal computation unit and the memory unit. In the weight vector updating unit, the computation of different synaptic weight vectors shares the same circuit for reducing the area costs. To show the effectiveness of the circuit, a texture classification system based on the proposed architecture is physically implemented by Field Programmable Gate Array (FPGA). It is embedded in a System-On-Programmable-Chip (SOPC) platform for performance measurement. Experimental results show that the proposed architecture is an efficient design for attaining both high speed performance and low area costs. PMID:22778640
An implementation of continuous genetic algorithm in parameter estimation of predator-prey model
NASA Astrophysics Data System (ADS)
Windarto
2016-03-01
Genetic algorithm is an optimization method based on the principles of genetics and natural selection in life organisms. The main components of this algorithm are chromosomes population (individuals population), parent selection, crossover to produce new offspring, and random mutation. In this paper, continuous genetic algorithm was implemented to estimate parameters in a predator-prey model of Lotka-Volterra type. For simplicity, all genetic algorithm parameters (selection rate and mutation rate) are set to be constant along implementation of the algorithm. It was found that by selecting suitable mutation rate, the algorithms can estimate these parameters well.
An Object-Oriented Collection of Minimum Degree Algorithms: Design, Implementation, and Experiences
NASA Technical Reports Server (NTRS)
Kumfert, Gary; Pothen, Alex
1999-01-01
The multiple minimum degree (MMD) algorithm and its variants have enjoyed 20+ years of research and progress in generating fill-reducing orderings for sparse, symmetric positive definite matrices. Although conceptually simple, efficient implementations of these algorithms are deceptively complex and highly specialized. In this case study, we present an object-oriented library that implements several recent minimum degree-like algorithms. We discuss how object-oriented design forces us to decompose these algorithms in a different manner than earlier codes and demonstrate how this impacts the flexibility and efficiency of our C++ implementation. We compare the performance of our code against other implementations in C or Fortran.
NASA Astrophysics Data System (ADS)
Klimesh, M.; Stanton, V.; Watola, D.
2000-10-01
We describe a hardware implementation of a state-of-the-art lossless image compression algorithm. The algorithm is based on the LOCO-I (low complexity lossless compression for images) algorithm developed by Weinberger, Seroussi, and Sapiro, with modifications to lower the implementation complexity. In this setup, the compression itself is performed entirely in hardware using a field programmable gate array and a small amount of random access memory. The compression speed achieved is 1.33 Mpixels/second. Our algorithm yields about 15 percent better compression than the Rice algorithm.
Improvement and implementation for Canny edge detection algorithm
NASA Astrophysics Data System (ADS)
Yang, Tao; Qiu, Yue-hong
2015-07-01
Edge detection is necessary for image segmentation and pattern recognition. In this paper, an improved Canny edge detection approach is proposed due to the defect of traditional algorithm. A modified bilateral filter with a compensation function based on pixel intensity similarity judgment was used to smooth image instead of Gaussian filter, which could preserve edge feature and remove noise effectively. In order to solve the problems of sensitivity to the noise in gradient calculating, the algorithm used 4 directions gradient templates. Finally, Otsu algorithm adaptively obtain the dual-threshold. All of the algorithm simulated with OpenCV 2.4.0 library in the environments of vs2010, and through the experimental analysis, the improved algorithm has been proved to detect edge details more effectively and with more adaptability.
A modified density-based clustering algorithm and its implementation
NASA Astrophysics Data System (ADS)
Ban, Zhihua; Liu, Jianguo; Yuan, Lulu; Yang, Hua
2015-12-01
This paper presents an improved density-based clustering algorithm based on the paper of clustering by fast search and find of density peaks. A distance threshold is introduced for the purpose of economizing memory. In order to reduce the probability that two points share the same density value, similarity is utilized to define proximity measure. We have tested the modified algorithm on a large data set, several small data sets and shape data sets. It turns out that the proposed algorithm can obtain acceptable results and can be applied more wildly.
Algorithm Implementation for a Prototype Time-Encoded Signature Detector
Mercier, Theresa M.; Runkle, Robert C.; Stephens, Daniel L.; Hyronimus, Brian J.; Morris, Scott J.; Seifert, Allen; Wyatt, Cory R.
2007-12-31
The authors constructed a prototype Time-Encoded Signature (TES) system, complete with automated detection algorithms, useful for the detection of point-like, gamma-ray sources in search applications where detectors observe large variability in background count rates beyond statistical (Poisson) noise. The person-carried TES instrument consists of two Cesium Iodide scintillators placed on opposite sides of a Tungsten shield. This geometry mitigates systematic background variation, and induces a unique signature upon encountering point-like sources. This manuscript focuses on the development of detection algorithms that both identify point-source signatures and are computationally simple. The latter constraint derives from the instrument’s mobile (and thus low power) operation. The authors evaluated algorithms on both simulated and field data. The results of this analysis demonstrate the ability to detect sources at a wide range of source-detector distances using computationally simple algorithms.
An Improved QRS Wave Group Detection Algorithm and Matlab Implementation
NASA Astrophysics Data System (ADS)
Zhang, Hongjun
This paper presents an algorithm using Matlab software to detect QRS wave group of MIT-BIH ECG database. First of all the noise in ECG be Butterworth filtered, and then analysis the ECG signal based on wavelet transform to detect the parameters of the principle of singularity, more accurate detection of the QRS wave group was achieved.
Implementations of back propagation algorithm in ecosystems applications
NASA Astrophysics Data System (ADS)
Ali, Khalda F.; Sulaiman, Riza; Elamir, Amir Mohamed
2015-05-01
Artificial Neural Networks (ANNs) have been applied to an increasing number of real world problems of considerable complexity. Their most important advantage is in solving problems which are too complex for conventional technologies, that do not have an algorithmic solutions or their algorithmic Solutions is too complex to be found. In general, because of their abstraction from the biological brain, ANNs are developed from concept that evolved in the late twentieth century neuro-physiological experiments on the cells of the human brain to overcome the perceived inadequacies with conventional ecological data analysis methods. ANNs have gained increasing attention in ecosystems applications, because of ANN's capacity to detect patterns in data through non-linear relationships, this characteristic confers them a superior predictive ability. In this research, ANNs is applied in an ecological system analysis. The neural networks use the well known Back Propagation (BP) Algorithm with the Delta Rule for adaptation of the system. The Back Propagation (BP) training Algorithm is an effective analytical method for adaptation of the ecosystems applications, the main reason because of their capacity to detect patterns in data through non-linear relationships. This characteristic confers them a superior predicting ability. The BP algorithm uses supervised learning, which means that we provide the algorithm with examples of the inputs and outputs we want the network to compute, and then the error is calculated. The idea of the back propagation algorithm is to reduce this error, until the ANNs learns the training data. The training begins with random weights, and the goal is to adjust them so that the error will be minimal. This research evaluated the use of artificial neural networks (ANNs) techniques in an ecological system analysis and modeling. The experimental results from this research demonstrate that an artificial neural network system can be trained to act as an expert
A block-wise approximate parallel implementation for ART algorithm on CUDA-enabled GPU.
Fan, Zhongyin; Xie, Yaoqin
2015-01-01
Computed tomography (CT) has been widely used to acquire volumetric anatomical information in the diagnosis and treatment of illnesses in many clinics. However, the ART algorithm for reconstruction from under-sampled and noisy projection is still time-consuming. It is the goal of our work to improve a block-wise approximate parallel implementation for the ART algorithm on CUDA-enabled GPU to make the ART algorithm applicable to the clinical environment. The resulting method has several compelling features: (1) the rays are allotted into blocks, making the rays in the same block parallel; (2) GPU implementation caters to the actual industrial and medical application demand. We test the algorithm on a digital shepp-logan phantom, and the results indicate that our method is more efficient than the existing CPU implementation. The high computation efficiency achieved in our algorithm makes it possible for clinicians to obtain real-time 3D images. PMID:26405857
An implementation of the look-ahead Lanczos algorithm for non-Hermitian matrices, part 2
NASA Technical Reports Server (NTRS)
Freund, Roland W.; Nachtigal, Noel M.
1990-01-01
It is shown how the look-ahead Lanczos process (combined with a quasi-minimal residual QMR) approach) can be used to develop a robust black box solver for large sparse non-Hermitian linear systems. Details of an implementation of the resulting QMR algorithm are presented. It is demonstrated that the QMR method is closely related to the biconjugate gradient (BCG) algorithm; however, unlike BCG, the QMR algorithm has smooth convergence curves and good numerical properties. We report numerical experiments with our implementation of the look-ahead Lanczos algorithm, both for eigenvalue problem and linear systems. Also, program listings of FORTRAN implementations of the look-ahead algorithm and the QMR method are included.
A Fast Implementation of the ISODATA Clustering Algorithm
NASA Technical Reports Server (NTRS)
Memarsadeghi, Nargess; Mount, David M.; Netanyahu, Nathan S.; LeMoigne, Jacqueline
2005-01-01
Clustering is central to many image processing and remote sensing applications. ISODATA is one of the most popular and widely used clustering methods in geoscience applications, but it can run slowly, particularly with large data sets. We present a more efficient approach to ISODATA clustering, which achieves better running times by storing the points in a kd-tree and through a modification of the way in which the algorithm estimates the dispersion of each cluster. We also present an approximate version of the algorithm which allows the user to further improve the running time, at the expense of lower fidelity in computing the nearest cluster center to each point. We provide both theoretical and empirical justification that our modified approach produces clusterings that are very similar to those produced by the standard ISODATA approach. We also provide empirical studies on both synthetic data and remotely sensed Landsat and MODIS images that show that our approach has significantly lower running times.
Comparative study of fusion algorithms and implementation of new efficient solution
NASA Astrophysics Data System (ADS)
Besrour, Amine; Snoussi, Hichem; Siala, Mohamed; Abdelkefi, Fatma
2014-05-01
High Dynamic Range (HDR) imaging has been the subject of significant researches over the past years, the goal of acquiring the best cinema-quality HDR images of fast-moving scenes using an efficient merging algorithm has not yet been achieved. In fact, through the years, many efficient algorithms have been implemented and developed. However, they are not yet efficient since they don't treat all the situations and they have not enough speed to ensure fast HDR image reconstitution. In this paper, we will present a full comparative analyze and study of the available fusion algorithms. Also, we will implement our personal algorithm which can be more optimized and faster than the existed ones. We will also present our investigated algorithm that has the advantage to be more optimized than the existing ones. This merging algorithm is related to our hardware solution allowing us to obtain four pictures with different exposures.
A simple implementation of the Viterbi algorithm on the Motorola DSP56001
NASA Technical Reports Server (NTRS)
Messer, Dion D.; Park, Sangil
1990-01-01
As systems designers design communication systems with digital rather than analog components to reduce noise and increase channel capacity, they must have the ability to perform traditional communication algorithms digitally. The use of trellis coded modulation as well as the extensive use of convolutional encoding for error detection and correction requires an efficient digital implementation of the Viterbi Algorithm for real time demodulation and decoding. Digital signal processors are now fast enough to implement Viterbi decoding in conjunction with the normal receiver/transmitter functions for lower speed channels on a single chip as well as performing fast decoding for higher speed channels, if the algorithm is implemented efficiently. The purpose of this paper is to identify a good way to implement the Viterbi Algorithm (VA) on the Motorola DSP56001, balancing performance considerations with speed and memory efficiency.
A Novel Implementation of Efficient Algorithms for Quantum Circuit Synthesis
NASA Astrophysics Data System (ADS)
Zeller, Luke
In this project, we design and develop a computer program to effectively approximate arbitrary quantum gates using the discrete set of Clifford Gates together with the T gate (π/8 gate). Employing recent results from Mosca et. al. and Giles and Selinger, we implement a decomposition scheme that outputs a sequence of Clifford, T, and Tt gates that approximate the input to within a specified error range ɛ. Specifically, the given gate is first rounded to an element of Z[1/2, i] with a precision determined by ɛ, and then exact synthesis is employed to produce the resulting gate. It is known that this procedure is optimal in approximating an arbitrary single qubit gate. Our program, written in Matlab and Python, can complete both approximate and exact synthesis of qubits. It can be used to assist in the experimental implementation of an arbitrary fault-tolerant single qubit gate, for which direct implementation isn't feasible.
Implementation of Real-Time Feedback Flow Control Algorithms on a Canonical Testbed
NASA Technical Reports Server (NTRS)
Tian, Ye; Song, Qi; Cattafesta, Louis
2005-01-01
This report summarizes the activities on "Implementation of Real-Time Feedback Flow Control Algorithms on a Canonical Testbed." The work summarized consists primarily of two parts. The first part summarizes our previous work and the extensions to adaptive ID and control algorithms. The second part concentrates on the validation of adaptive algorithms by applying them to a vibration beam test bed. Extensions to flow control problems are discussed.
Investigations into Implementation of an Iterative Feedback Tuning Algorithm into Microcontroller
NASA Astrophysics Data System (ADS)
Himunzowa, Grayson
In this paper, implementation of an Iterative Feedback Tuning (IFT) and Myopic Unfalsified Control (MUC) algorithms into a microcontroller is investigated. First step taken was to search for a suitable hardware to accommodate these complex algorithms. The Motorola DSP56F807C and ARM7024 microcontrollers were selected for use in the research. The algorithms were coded in the C language of the respective microcontrollers and were tested by simulation of the DC Motor models obtained from step response of the motor.
NASA Technical Reports Server (NTRS)
Delaat, J. C.; Merrill, W. C.
1983-01-01
A sensor failure detection, isolation, and accommodation algorithm was developed which incorporates analytic sensor redundancy through software. This algorithm was implemented in a high level language on a microprocessor based controls computer. Parallel processing and state-of-the-art 16-bit microprocessors are used along with efficient programming practices to achieve real-time operation.
FPGA implementation of vision algorithms for small autonomous robots
NASA Astrophysics Data System (ADS)
Anderson, J. D.; Lee, D. J.; Archibald, J. K.
2005-10-01
The use of on-board vision with small autonomous robots has been made possible by the advances in the field of Field Programmable Gate Array (FPGA) technology. By connecting a CMOS camera to an FPGA board, on-board vision has been used to reduce the computation time inherent in vision algorithms. The FPGA board allows the user to create custom hardware in a faster, safer, and more easily verifiable manner that decreases the computation time and allows the vision to be done in real-time. Real-time vision tasks for small autonomous robots include object tracking, obstacle detection and avoidance, and path planning. Competitions were created to demonstrate that our algorithms work with our small autonomous vehicles in dealing with these problems. These competitions include Mouse-Trapped-in-a-Box, where the robot has to detect the edges of a box that it is trapped in and move towards them without touching them; Obstacle Avoidance, where an obstacle is placed at any arbitrary point in front of the robot and the robot has to navigate itself around the obstacle; Canyon Following, where the robot has to move to the center of a canyon and follow the canyon walls trying to stay in the center; the Grand Challenge, where the robot had to navigate a hallway and return to its original position in a given amount of time; and Stereo Vision, where a separate robot had to catch tennis balls launched from an air powered cannon. Teams competed on each of these competitions that were designed for a graduate-level robotic vision class, and each team had to develop their own algorithm and hardware components. This paper discusses one team's approach to each of these problems.
Evaluation of a segmentation algorithm designed for an FPGA implementation
NASA Astrophysics Data System (ADS)
Schwenk, Kurt; Schönermark, Maria; Huber, Felix
2013-10-01
The present work has to be seen in the context of real-time on-board image evaluation of optical satellite data. With on board image evaluation more useful data can be acquired, the time to get requested information can be decreased and new real-time applications are possible. Because of the relative high processing power in comparison to the low power consumption, Field Programmable Gate Array (FPGA) technology has been chosen as an adequate hardware platform for image processing tasks. One fundamental part for image evaluation is image segmentation. It is a basic tool to extract spatial image information which is very important for many applications such as object detection. Therefore a special segmentation algorithm using the advantages of FPGA technology has been developed. The aim of this work is the evaluation of this algorithm. Segmentation evaluation is a difficult task. The most common way for evaluating the performance of a segmentation method is still subjective evaluation, in which human experts determine the quality of a segmentation. This way is not in compliance with our needs. The evaluation process has to provide a reasonable quality assessment, should be objective, easy to interpret and simple to execute. To reach these requirements a so called Segmentation Accuracy Equality norm (SA EQ) was created, which compares the difference of two segmentation results. It can be shown that this norm is capable as a first quality measurement. Due to its objectivity and simplicity the algorithm has been tested on a specially chosen synthetic test model. In this work the most important results of the quality assessment will be presented.
Computer implementation of the Kerov-Kirillov-Reshetikhin algorithm
NASA Astrophysics Data System (ADS)
Jakubczyk, P.; Wal, A.; Jakubczyk, D.; Lulek, T.
2012-06-01
The Kerov-Kirillov-Reshetikhin (KKR) algorithm establishes a bijection between semistandard Weyl tableaux of the shape λ and weight μ and rigged string configurations of type (λ,μ). This algorithm can be applied to Heisenberg magnetic chains and their generalisations by use of Schur-Weyl duality, and gives a way to classify all eigenstates of the chain in a methodological manner. Program summaryProgram title: KKR algorithm Catalogue identifier: AELQ_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AELQ_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 974 No. of bytes in distributed program, including test data, etc.: 11 118 Distribution format: tar.gz Programming language: Maple Computer: PC Operating system: Windows, Linux RAM: 1 GB Classification: 2.7, 4.2 Nature of problem: The method of classification of all eigenstates and eigenvalues is given for the generalised model of the one-dimensional Heisenberg magnet consisting of N nodes. Solution method: We use a prolongation of the RSK bijection [1] to the set of all eigenstates of an integrable model, in terms of rigged string configurations. More specifically, one selects from the irreducible basis b of the Schur-Weyl duality all highest weight states |λty> with respect to the unitary group U(n) and generates a rigged string configurations |ν→,{J}>=KKR(|λty>) for each y∈SYT(λ). Running time: The running time is about 1 ms.
Motion Cueing Algorithm Development: New Motion Cueing Program Implementation and Tuning
NASA Technical Reports Server (NTRS)
Houck, Jacob A. (Technical Monitor); Telban, Robert J.; Cardullo, Frank M.; Kelly, Lon C.
2005-01-01
A computer program has been developed for the purpose of driving the NASA Langley Research Center Visual Motion Simulator (VMS). This program includes two new motion cueing algorithms, the optimal algorithm and the nonlinear algorithm. A general description of the program is given along with a description and flowcharts for each cueing algorithm, and also descriptions and flowcharts for subroutines used with the algorithms. Common block variable listings and a program listing are also provided. The new cueing algorithms have a nonlinear gain algorithm implemented that scales each aircraft degree-of-freedom input with a third-order polynomial. A description of the nonlinear gain algorithm is given along with past tuning experience and procedures for tuning the gain coefficient sets for each degree-of-freedom to produce the desired piloted performance. This algorithm tuning will be needed when the nonlinear motion cueing algorithm is implemented on a new motion system in the Cockpit Motion Facility (CMF) at the NASA Langley Research Center.
A Linac Simulation Code for Macro-Particles Tracking and Steering Algorithm Implementation
sun, yipeng
2012-05-03
In this paper, a linac simulation code written in Fortran90 is presented and several simulation examples are given. This code is optimized to implement linac alignment and steering algorithms, and evaluate the accelerator errors such as RF phase and acceleration gradient, quadrupole and BPM misalignment. It can track a single particle or a bunch of particles through normal linear accelerator elements such as quadrupole, RF cavity, dipole corrector and drift space. One-to-one steering algorithm and a global alignment (steering) algorithm are implemented in this code.
Implementation and performance of a domain decomposition algorithm in Sisal
DeBoni, T.; Feo, J.; Rodrigue, G.; Muller, J.
1993-09-23
Sisal is a general-purpose functional language that hides the complexity of parallel processing, expedites parallel program development, and guarantees determinacy. Parallelism and management of concurrent tasks are realized automatically by the compiler and runtime system. Spatial domain decomposition is a widely-used method that focuses computational resources on the most active, or important, areas of a domain. Many complex programming issues are introduced in paralleling this method including: dynamic spatial refinement, dynamic grid partitioning and fusion, task distribution, data distribution, and load balancing. In this paper, we describe a spatial domain decomposition algorithm programmed in Sisal. We explain the compilation process, and present the execution performance of the resultant code on two different multiprocessor systems: a multiprocessor vector supercomputer, and cache-coherent scalar multiprocessor.
Implementation of jump-diffusion algorithms for understanding FLIR scenes
NASA Astrophysics Data System (ADS)
Lanterman, Aaron D.; Miller, Michael I.; Snyder, Donald L.
1995-07-01
Our pattern theoretic approach to the automated understanding of forward-looking infrared (FLIR) images brings the traditionally separate endeavors of detection, tracking, and recognition together into a unified jump-diffusion process. New objects are detected and object types are recognized through discrete jump moves. Between jumps, the location and orientation of objects are estimated via continuous diffusions. An hypothesized scene, simulated from the emissive characteristics of the hypothesized scene elements, is compared with the collected data by a likelihood function based on sensor statistics. This likelihood is combined with a prior distribution defined over the set of possible scenes to form a posterior distribution. The jump-diffusion process empirically generates the posterior distribution. Both the diffusion and jump operations involve the simulation of a scene produced by a hypothesized configuration. Scene simulation is most effectively accomplished by pipelined rendering engines such as silicon graphics. We demonstrate the execution of our algorithm on a silicon graphics onyx/reality engine.
Implementation and evaluation of ILLIAC 4 algorithms for multispectral image processing
NASA Technical Reports Server (NTRS)
Swain, P. H.
1974-01-01
Data concerning a multidisciplinary and multi-organizational effort to implement multispectral data analysis algorithms on a revolutionary computer, the Illiac 4, are reported. The effectiveness and efficiency of implementing the digital multispectral data analysis techniques for producing useful land use classifications from satellite collected data were demonstrated.
NASA Technical Reports Server (NTRS)
Delaat, J. C.
1984-01-01
An advanced, sensor failure detection, isolation, and accomodation algorithm has been developed by NASA for the F100 turbofan engine. The algorithm takes advantage of the analytical redundancy of the sensors to improve the reliability of the sensor set. The method requires the controls computer, to determine when a sensor failure has occurred without the help of redundant hardware sensors in the control system. The controls computer provides an estimate of the correct value of the output of the failed sensor. The algorithm has been programmed in FORTRAN using a real-time microprocessor-based controls computer. A detailed description of the algorithm and its implementation on a microprocessor is given.
On Distribution Reduction and Algorithm Implementation in Inconsistent Ordered Information Systems
Zhang, Yanqin
2014-01-01
As one part of our work in ordered information systems, distribution reduction is studied in inconsistent ordered information systems (OISs). Some important properties on distribution reduction are studied and discussed. The dominance matrix is restated for reduction acquisition in dominance relations based information systems. Matrix algorithm for distribution reduction acquisition is stepped. And program is implemented by the algorithm. The approach provides an effective tool for the theoretical research and the applications for ordered information systems in practices. For more detailed and valid illustrations, cases are employed to explain and verify the algorithm and the program which shows the effectiveness of the algorithm in complicated information systems. PMID:25258721
de Paula, Lauro C. M.; Soares, Anderson S.; de Lima, Telma W.; Delbem, Alexandre C. B.; Coelho, Clarimar J.; Filho, Arlindo R. G.
2014-01-01
Several variable selection algorithms in multivariate calibration can be accelerated using Graphics Processing Units (GPU). Among these algorithms, the Firefly Algorithm (FA) is a recent proposed metaheuristic that may be used for variable selection. This paper presents a GPU-based FA (FA-MLR) with multiobjective formulation for variable selection in multivariate calibration problems and compares it with some traditional sequential algorithms in the literature. The advantage of the proposed implementation is demonstrated in an example involving a relatively large number of variables. The results showed that the FA-MLR, in comparison with the traditional algorithms is a more suitable choice and a relevant contribution for the variable selection problem. Additionally, the results also demonstrated that the FA-MLR performed in a GPU can be five times faster than its sequential implementation. PMID:25493625
Implementation of software-based sensor linearization algorithms on low-cost microcontrollers.
Erdem, Hamit
2010-10-01
Nonlinear sensors and microcontrollers are used in many embedded system designs. As the input-output characteristic of most sensors is nonlinear in nature, obtaining data from a nonlinear sensor by using an integer microcontroller has always been a design challenge. This paper discusses the implementation of six software-based sensor linearization algorithms for low-cost microcontrollers. The comparative study of the linearization algorithms is performed by using a nonlinear optical distance-measuring sensor. The performance of the algorithms is examined with respect to memory space usage, linearization accuracy and algorithm execution time. The implementation and comparison results can be used for selection of a linearization algorithm based on the sensor transfer function, expected linearization accuracy and microcontroller capacity. PMID:20466366
Implementation on a nonlinear concrete cracking algorithm in NASTRAN
NASA Technical Reports Server (NTRS)
Herting, D. N.; Herendeen, D. L.; Hoesly, R. L.; Chang, H.
1976-01-01
A computer code for the analysis of reinforced concrete structures was developed using NASTRAN as a basis. Nonlinear iteration procedures were developed for obtaining solutions with a wide variety of loading sequences. A direct access file system was used to save results at each load step to restart within the solution module for further analysis. A multi-nested looping capability was implemented to control the iterations and change the loads. The basis for the analysis is a set of mutli-layer plate elements which allow local definition of materials and cracking properties.
Farber, R.M.; Lapedes, A.S.; Rico-Martinez, R.; Kevrekidis, I.G.
1993-06-01
Time-delay mappings constructed using neural networks have proven successful performing nonlinear system identification; however, because of their discrete nature, their use in bifurcation analysis of continuous-tune systems is limited. This shortcoming can be avoided by embedding the neural networks in a training algorithm that mimics a numerical integrator. Both explicit and implicit integrators can be used. The former case is based on repeated evaluations of the network in a feedforward implementation; the latter relies on a recurrent network implementation. Here the algorithms and their implementation on parallel machines (SIMD and MIMD architectures) are discussed.
Farber, R.M.; Lapedes, A.S. ); Rico-Martinez, R.; Kevrekidis, I.G. . Dept. of Chemical Engineering)
1993-01-01
Time-delay mappings constructed using neural networks have proven successful performing nonlinear system identification; however, because of their discrete nature, their use in bifurcation analysis of continuous-tune systems is limited. This shortcoming can be avoided by embedding the neural networks in a training algorithm that mimics a numerical integrator. Both explicit and implicit integrators can be used. The former case is based on repeated evaluations of the network in a feedforward implementation; the latter relies on a recurrent network implementation. Here the algorithms and their implementation on parallel machines (SIMD and MIMD architectures) are discussed.
NASA Astrophysics Data System (ADS)
Jin, Minglei; Jin, Weiqi; Li, Yiyang; Li, Shuo
2015-08-01
In this paper, we propose a novel scene-based non-uniformity correction algorithm for infrared image processing-temporal high-pass non-uniformity correction algorithm based on grayscale mapping (THP and GM). The main sources of non-uniformity are: (1) detector fabrication inaccuracies; (2) non-linearity and variations in the read-out electronics and (3) optical path effects. The non-uniformity will be reduced by non-uniformity correction (NUC) algorithms. The NUC algorithms are often divided into calibration-based non-uniformity correction (CBNUC) algorithms and scene-based non-uniformity correction (SBNUC) algorithms. As non-uniformity drifts temporally, CBNUC algorithms must be repeated by inserting a uniform radiation source which SBNUC algorithms do not need into the view, so the SBNUC algorithm becomes an essential part of infrared imaging system. The SBNUC algorithms' poor robustness often leads two defects: artifacts and over-correction, meanwhile due to complicated calculation process and large storage consumption, hardware implementation of the SBNUC algorithms is difficult, especially in Field Programmable Gate Array (FPGA) platform. The THP and GM algorithm proposed in this paper can eliminate the non-uniformity without causing defects. The hardware implementation of the algorithm only based on FPGA has two advantages: (1) low resources consumption, and (2) small hardware delay: less than 20 lines, it can be transplanted to a variety of infrared detectors equipped with FPGA image processing module, it can reduce the stripe non-uniformity and the ripple non-uniformity.
Implementation of a partitioned algorithm for simulation of large CSI problems
NASA Technical Reports Server (NTRS)
Alvin, Kenneth F.; Park, K. C.
1991-01-01
The implementation of a partitioned numerical algorithm for determining the dynamic response of coupled structure/controller/estimator finite-dimensional systems is reviewed. The partitioned approach leads to a set of coupled first and second-order linear differential equations which are numerically integrated with extrapolation and implicit step methods. The present software implementation, ACSIS, utilizes parallel processing techniques at various levels to optimize performance on a shared-memory concurrent/vector processing system. A general procedure for the design of controller and filter gains is also implemented, which utilizes the vibration characteristics of the structure to be solved. Also presented are: example problems; a user's guide to the software; the procedures and algorithm scripts; a stability analysis for the algorithm; and the source code for the parallel implementation.
NASA Astrophysics Data System (ADS)
Xu, Dexiang
This dissertation presents a novel method of designing finite word length Finite Impulse Response (FIR) digital filters using a Real Parameter Parallel Genetic Algorithm (RPPGA). This algorithm is derived from basic Genetic Algorithms which are inspired by natural genetics principles. Both experimental results and theoretical studies in this work reveal that the RPPGA is a suitable method for determining the optimal or near optimal discrete coefficients of finite word length FIR digital filters. Performance of RPPGA is evaluated by comparing specifications of filters designed by other methods with filters designed by RPPGA. The parallel and spatial structures of the algorithm result in faster and more robust optimization than basic genetic algorithms. A filter designed by RPPGA is implemented in hardware to attenuate high frequency noise in a data acquisition system for collecting seismic signals. These studies may lead to more applications of the Real Parameter Parallel Genetic Algorithms in Electrical Engineering.
NASA Astrophysics Data System (ADS)
Xiong, Yuting; Gu, Zhaojun; Jin, Wei
In this paper, a novel practical algorithmic solution for automatic discovering the physical topology of switched Ethernet was proposed. Our algorithm collects standard SNMP MIB information that is widely supported in modern IP networks and then builds the physical topology of the active network. We described the relative definitions, system model and proved the correctness of the algorithm. Practically, the algorithm was implemented in our visualization network monitoring system. We also presented the main steps of the algorithm, core codes and running results on the lab network. The experimental results clearly validate our approach, demonstrating that our algorithm is simple and effective which can discover the accurate up-to-date physical network topology.
Efficient implementation of Jacobi algorithms and Jacobi sets on distributed memory architectures
NASA Astrophysics Data System (ADS)
Eberlein, P. J.; Park, Haesun
1990-04-01
One-sided methods for implementing Jacobi diagonalization algorithms have been recently proposed for both distributed memory and vector machines. These methods are naturally well suited to distributed memory and vector architectures because of their inherent parallelism and their abundance of vector operations. Also, one-sided methods require substantially less message passing than the two-sided methods, and thus can achieve higher efficiency. We describe in detail the use of the one-sided Jacobi rotation as opposed to the rotation used in the ``Hestenes'' algorithm; we perceive this difference to have been widely misunderstood. Furthermore the one-sided algorithm generalizes to other problems such as the nonsymmetric eigenvalue problem while the Hestenes algorithm does not. We discuss two new implementations for Jacobi sets for a ring connected array of processors and show their isomorphism to the round-robin ordering. Moreover, we show that two implementations produce Jacobi sets in identical orders up to a relabeling. These orderings are optimal in the sense that they complete each sweep in a minimum number of stages with minimal communication. We present implementation results of one-sided Jacobi algorithms using these orderings on the NCUBE/seven hypercube as well as the Intel iPSC/2 hypercube. Finally, we mention how other orderings, and can be, implemented. The number of nonisomorphic Jacobi sets has recently been shown to become infinite with increasing n. The work of this author was supported by National Science Foundation Grant CCR-8813493.
Implementing embedded artificial intelligence rules within algorithmic programming languages
NASA Technical Reports Server (NTRS)
Feyock, Stefan
1988-01-01
Most integrations of artificial intelligence (AI) capabilities with non-AI (usually FORTRAN-based) application programs require the latter to execute separately to run as a subprogram or, at best, as a coroutine, of the AI system. In many cases, this organization is unacceptable; instead, the requirement is for an AI facility that runs in embedded mode; i.e., is called as subprogram by the application program. The design and implementation of a Prolog-based AI capability that can be invoked in embedded mode are described. The significance of this system is twofold: Provision of Prolog-based symbol-manipulation and deduction facilities makes a powerful symbolic reasoning mechanism available to applications programs written in non-AI languages. The power of the deductive and non-procedural descriptive capabilities of Prolog, which allow the user to describe the problem to be solved, rather than the solution, is to a large extent vitiated by the absence of the standard control structures provided by other languages. Embedding invocations of Prolog rule bases in programs written in non-AI languages makes it possible to put Prolog calls inside DO loops and similar control constructs. The resulting merger of non-AI and AI languages thus results in a symbiotic system in which the advantages of both programming systems are retained, and their deficiencies largely remedied.
Clinical implementation of a neonatal seizure detection algorithm
Temko, Andriy; Marnane, William; Boylan, Geraldine; Lightbody, Gordon
2015-01-01
Technologies for automated detection of neonatal seizures are gradually moving towards cot-side implementation. The aim of this paper is to present different ways to visualize the output of a neonatal seizure detection system and analyse their influence on performance in a clinical environment. Three different ways to visualize the detector output are considered: a binary output, a probabilistic trace, and a spatio-temporal colormap of seizure observability. As an alternative to visual aids, audified neonatal EEG is also considered. Additionally, a survey on the usefulness and accuracy of the presented methods has been performed among clinical personnel. The main advantages and disadvantages of the presented methods are discussed. The connection between information visualization and different methods to compute conventional metrics is established. The results of the visualization methods along with the system validation results indicate that the developed neonatal seizure detector with its current level of performance would unambiguously be of benefit to clinicians as a decision support system. The results of the survey suggest that a suitable way to visualize the output of neonatal seizure detection systems in a clinical environment is a combination of a binary output and a probabilistic trace. The main healthcare benefits of the tool are outlined. The decision support system with the chosen visualization interface is currently undergoing pre-market European multi-centre clinical investigation to support its regulatory approval and clinical adoption. PMID:25892834
Quantum computation: algorithms and implementation in quantum dot devices
NASA Astrophysics Data System (ADS)
Gamble, John King
In this thesis, we explore several aspects of both the software and hardware of quantum computation. First, we examine the computational power of multi-particle quantum random walks in terms of distinguishing mathematical graphs. We study both interacting and non-interacting multi-particle walks on strongly regular graphs, proving some limitations on distinguishing powers and presenting extensive numerical evidence indicative of interactions providing more distinguishing power. We then study the recently proposed adiabatic quantum algorithm for Google PageRank, and show that it exhibits power-law scaling for realistic WWW-like graphs. Turning to hardware, we next analyze the thermal physics of two nearby 2D electron gas (2DEG), and show that an analogue of the Coulomb drag effect exists for heat transfer. In some distance and temperature, this heat transfer is more significant than phonon dissipation channels. After that, we study the dephasing of two-electron states in a single silicon quantum dot. Specifically, we consider dephasing due to the electron-phonon coupling and charge noise, separately treating orbital and valley excitations. In an ideal system, dephasing due to charge noise is strongly suppressed due to a vanishing dipole moment. However, introduction of disorder or anharmonicity leads to large effective dipole moments, and hence possibly strong dephasing. Building on this work, we next consider more realistic systems, including structural disorder systems. We present experiment and theory, which demonstrate energy levels that vary with quantum dot translation, implying a structurally disordered system. Finally, we turn to the issues of valley mixing and valley-orbit hybridization, which occurs due to atomic-scale disorder at quantum well interfaces. We develop a new theoretical approach to study these effects, which we name the disorder-expansion technique. We demonstrate that this method successfully reproduces atomistic tight-binding techniques
NASA Astrophysics Data System (ADS)
Trigueros-Espinosa, Blas; Vélez-Reyes, Miguel; Santiago-Santiago, Nayda G.; Rosario-Torres, Samuel
2011-06-01
Hyperspectral sensors can collect hundreds of images taken at different narrow and contiguously spaced spectral bands. This high-resolution spectral information can be used to identify materials and objects within the field of view of the sensor by their spectral signature, but this process may be computationally intensive due to the large data sizes generated by the hyperspectral sensors, typically hundreds of megabytes. This can be an important limitation for some applications where the detection process must be performed in real time (surveillance, explosive detection, etc.). In this work, we developed a parallel implementation of three state-ofthe- art target detection algorithms (RX algorithm, matched filter and adaptive matched subspace detector) using a graphics processing unit (GPU) based on the NVIDIA® CUDA™ architecture. In addition, a multi-core CPUbased implementation of each algorithm was developed to be used as a baseline for the speedups estimation. We evaluated the performance of the GPU-based implementations using an NVIDIA ® Tesla® C1060 GPU card, and the detection accuracy of the implemented algorithms was evaluated using a set of phantom images simulating traces of different materials on clothing. We achieved a maximum speedup in the GPU implementations of around 20x over a multicore CPU-based implementation, which suggests that applications for real-time detection of targets in HSI can greatly benefit from the performance of GPUs as processing hardware.
Efficient implementation of Jacobi algorithms and Jacobi sets on distributed memory architectures
Eberlein, P.J. ); Park, H. )
1990-04-01
One-sided methods for implementing Jacobi diagonalization algorithms have been recently proposed for both distributed memory and vector machines. These methods are naturally well suited to distributed memory and vector architectures because of their inherent parallelism and their abundance of vector operations. Also, one-sided methods require substantially less message passing than the two-sided methods, and thus can achieve higher efficiency. The authors describe in detail the use of the one-sided Jacobi rotation as opposed to the rotation used in the Hestenes algorithm; they perceive the difference to have been widely misunderstood. Furthermore, the one-sided algorithm generalizes to other problems such as the nonsymmetric eigenvalue problem while the Hestenes algorithm does not. The authors discuss two new implementations for Jacobi sets for a ring connected array of processors and show their isomorphism to the round-robin ordering.
NASA Astrophysics Data System (ADS)
Kershaw, James; Hamlyn, Garry
1994-06-01
This paper describes the implementation of algorithms for automatically extracting and matching features in stereo pairs of images. The implementation has been designed to be as modular as possible to allow different algorithms for each stage in the matching process to be combined in the most appropriate manner for each particular problem. The modules have been implemented in the AVS environment but are designed to be portable to any platform. This work has been undertaken as part of task DEF 93/1 63 'Intelligence Analysis of Imagery', and forms part of ITD's contribution to the Visual Processing research program in the Centre for Sensor System and Information Processing. A major aim of both the task and the research program is to produce software to assist intelligence analysts in extracting three dimensional shape from imagery: the algorithms and software described here will form the first part of a module for automatically extracting depth information from stereo image pairs.
An implementation of the look-ahead Lanczos algorithm for non-Hermitian matrices, part 1
NASA Technical Reports Server (NTRS)
Freund, Roland W.; Gutknecht, Martin H.; Nachtigal, Noel M.
1990-01-01
The nonsymmetric Lanczos method can be used to compute eigenvalues of large sparse non-Hermitian matrices or to solve large sparse non-Hermitian linear systems. However, the original Lanczos algorithm is susceptible to possible breakdowns and potential instabilities. We present an implementation of a look-ahead version of the Lanczos algorithm which overcomes these problems by skipping over those steps in which a breakdown or near-breakdown would occur in the standard process. The proposed algorithm can handle look-ahead steps of any length and is not restricted to steps of length 2, as earlier implementations are. Also, our implementation has the feature that it requires roughly the same number of inner products as the standard Lanczos process without look-ahead.
Star-field identification algorithm. [for implementation on CCD-based imaging camera
NASA Technical Reports Server (NTRS)
Scholl, M. S.
1993-01-01
A description of a new star-field identification algorithm that is suitable for implementation on CCD-based imaging cameras is presented. The minimum identifiable star pattern element consists of an oriented star triplet defined by three stars, their celestial coordinates, and their visual magnitudes. The algorithm incorporates tolerance to faulty input data, errors in the reference catalog, and instrument-induced systematic errors.
NASA Technical Reports Server (NTRS)
Delaat, J. C.; Merrill, W. C.
1984-01-01
A sensor failure detection, isolation, and accommodation algorithm was developed which incorporates analytic sensor redundancy through software. This algorithm was implemented in a high level language on a microprocessor based controls computer. Parallel processing and state-of-the-art 16-bit microprocessors are used along with efficient programming practices to achieve real-time operation. Previously announced in STAR as N84-13140
A data-parallel algorithm for three-dimensional Delaunay triangulation and its implementation
Teng, Y.A.; Sullivan, F.; Beichl, I.; Puppo, E.
1993-12-31
In this paper, the authors present a parallel algorithm for constructing the Delaunay triangulation of a set of vertices in three-dimensional space. The algorithm achieves a high degree of parallelism by starting the construction from every vertex and expanding over all open faces thereafter. In the expansion of open faces, the search is made faster by using a bucketing technique. The algorithm is designed under a data-parallel paradigm. It uses segmented list structures and virtual processing for load-balancing. As a result, the algorithm achieves a fast running time and good scalability over a wide range of problem sizes and machine sizes. They also incorporate a topological check to eliminate inconsistencies due to degeneracies and numerical error. The algorithm is implemented on Connection Machines CM-2 and CM-5, and experimental results are presented.
NASA Astrophysics Data System (ADS)
Lu, Dawei; Zhu, Jing; Zou, Ping; Peng, Xinhua; Yu, Yihua; Zhang, Shanmin; Chen, Qun; Du, Jiangfeng
2010-02-01
An important quantum search algorithm based on the quantum random walk performs an oracle search on a database of N items with O(phN) calls, yielding a speedup similar to the Grover quantum search algorithm. The algorithm was implemented on a quantum information processor of three-qubit liquid-crystal nuclear magnetic resonance (NMR) in the case of finding 1 out of 4, and the diagonal elements’ tomography of all the final density matrices was completed with comprehensible one-dimensional NMR spectra. The experimental results agree well with the theoretical predictions.
Lu Dawei; Peng Xinhua; Du Jiangfeng; Zhu Jing; Zou Ping; Yu Yihua; Zhang Shanmin; Chen Qun
2010-02-15
An important quantum search algorithm based on the quantum random walk performs an oracle search on a database of N items with O({radical}(phN)) calls, yielding a speedup similar to the Grover quantum search algorithm. The algorithm was implemented on a quantum information processor of three-qubit liquid-crystal nuclear magnetic resonance (NMR) in the case of finding 1 out of 4, and the diagonal elements' tomography of all the final density matrices was completed with comprehensible one-dimensional NMR spectra. The experimental results agree well with the theoretical predictions.
Subspace scheduling and parallel implementation of non-systolic regular iterative algorithms
Roychowdhury, V.P.; Kailath, T.
1989-01-01
The study of Regular Iterative Algorithms (RIAs) was introduced in a seminal paper by Karp, Miller, and Winograd in 1967. In more recent years, the study of systolic architectures has led to a renewed interest in this class of algorithms, and the class of algorithms implementable on systolic arrays (as commonly understood) has been identified as a precise subclass of RIAs include matrix pivoting algorithms and certain forms of numerically stable two-dimensional filtering algorithms. It has been shown that the so-called hyperplanar scheduling for systolic algorithms can no longer be used to schedule and implement non-systolic RIAs. Based on the analysis of a so-called computability tree we generalize the concept of hyperplanar scheduling and determine linear subspaces in the index space of a given RIA such that all variables lying on the same subspace can be scheduled at the same time. This subspace scheduling technique is shown to be asymptotically optimal, and formal procedures are developed for designing processor arrays that will be compatible with our scheduling schemes. Explicit formulas for the schedule of a given variable are determined whenever possible; subspace scheduling is also applied to obtain lower dimensional processor arrays for systolic algorithms.
Demonstration of a small programmable quantum computer with atomic qubits.
Debnath, S; Linke, N M; Figgatt, C; Landsman, K A; Wright, K; Monroe, C
2016-08-01
Quantum computers can solve certain problems more efficiently than any possible conventional computer. Small quantum algorithms have been demonstrated on multiple quantum computing platforms, many specifically tailored in hardware to implement a particular algorithm or execute a limited number of computational paths. Here we demonstrate a five-qubit trapped-ion quantum computer that can be programmed in software to implement arbitrary quantum algorithms by executing any sequence of universal quantum logic gates. We compile algorithms into a fully connected set of gate operations that are native to the hardware and have a mean fidelity of 98 per cent. Reconfiguring these gate sequences provides the flexibility to implement a variety of algorithms without altering the hardware. As examples, we implement the Deutsch-Jozsa and Bernstein-Vazirani algorithms with average success rates of 95 and 90 per cent, respectively. We also perform a coherent quantum Fourier transform on five trapped-ion qubits for phase estimation and period finding with average fidelities of 62 and 84 per cent, respectively. This small quantum computer can be scaled to larger numbers of qubits within a single register, and can be further expanded by connecting several such modules through ion shuttling or photonic quantum channels. PMID:27488798
Implementation of C* Boundary Conditions in the Hybrid Monte Carlo Algorithm
NASA Astrophysics Data System (ADS)
Carmona, José Manuel; D'elia, Massimo; Di Giacomo, Adriano; Lucini, Biagio
In the study of QCD dynamics, C* boundary conditions are physically relevant in certain cases. In this paper, we study the implementation of these boundary conditions in the lattice formulation of full QCD with staggered fermions. In particular, we show that the usual even-odd partition trick to avoid the redoubling of the fermion matrix is still valid in this case. We give an explicit implementation of these boundary conditions for the Hybrid Monte Carlo algorithm.
A fast implementation of the incremental backprojection algorithms for parallel beam geometries
Chen, C.M.; Wang, C.Y.; Cho, Z.H.
1996-12-01
Filtered-backprojection algorithms are the most widely used approaches for reconstruction of computed tomographic (CT) images, such as X-ray CT and positron emission tomographic (PET) images. The Incremental backprojection algorithm is a fast backprojection approach based on restructuring the Shepp and Logan algorithm. By exploiting interdependency (position and values) of adjacent pixels, the Incremental algorithm requires only O(N) and O(N{sup 2}) multiplications in contrast to O(N{sup 2}) and O(N{sup 3}) multiplications for the Shepp and Logan algorithm in two-dimensional (2-D) and three-dimensional (3-D) backprojections, respectively, for each view, where N is the size of the image in each dimension. In addition, it may reduce the number of additions for each pixel computation. The improvement achieved by the Incremental algorithm in practice was not, however, as significant as expected. One of the main reasons is due to inevitably visiting pixels outside the beam in the searching flow scheme originally developed for the Incremental algorithm. To optimize implementation of the Incremental algorithm, an efficient scheme, namely, coded searching flow scheme, is proposed in this paper to minimize the overhead caused by searching for all pixels in a beam. The key idea of this scheme is to encode the searching flow for all pixels inside each beam. While backprojecting, all pixels may be visited without any overhead due to using the coded searching flow as the a priori information. The proposed coded searching flow scheme has been implemented on a Sun Sparc 10 and a Sun Sparc 20 workstations. The implementation results show that the proposed scheme is 1.45--2.0 times faster than the original searching flow scheme for most cases tested.
Linear array implementation of the EM algorithm for PET image reconstruction
Rajan, K.; Patnaik, L.M.; Ramakrishna, J.
1995-08-01
The PET image reconstruction based on the EM algorithm has several attractive advantages over the conventional convolution back projection algorithms. However, the PET image reconstruction based on the EM algorithm is computationally burdensome for today`s single processor systems. In addition, a large memory is required for the storage of the image, projection data, and the probability matrix. Since the computations are easily divided into tasks executable in parallel, multiprocessor configurations are the ideal choice for fast execution of the EM algorithms. In tis study, the authors attempt to overcome these two problems by parallelizing the EM algorithm on a multiprocessor systems. The parallel EM algorithm on a linear array topology using the commercially available fast floating point digital signal processor (DSP) chips as the processing elements (PE`s) has been implemented. The performance of the EM algorithm on a 386/387 machine, IBM 6000 RISC workstation, and on the linear array system is discussed and compared. The results show that the computational speed performance of a linear array using 8 DSP chips as PE`s executing the EM image reconstruction algorithm is about 15.5 times better than that of the IBM 6000 RISC workstation. The novelty of the scheme is its simplicity. The linear array topology is expandable with a larger number of PE`s. The architecture is not dependant on the DSP chip chosen, and the substitution of the latest DSP chip is straightforward and could yield better speed performance.
NASA Astrophysics Data System (ADS)
Obara, Lukasz; Żarnecki, Aleksander Filip
2015-09-01
Pi of the Sky is a system of wide field-of-view robotic telescopes, which search for short timescale astrophysical phenomena, especially for prompt optical GRB emission. The system was designed for autonomous operation, monitoring a large fraction of the sky with 12m-13m range and time resolution of the order of 1 - 100 seconds. LUIZA is a dedicated framework developed for efficient off-line processing of the Pi of the Sky data, implemented in C++. The photometric algorithm based on ASAS photometry was implemented in LUIZA and compared with the algorithm based on the pixel cluster reconstruction and simple aperture photometry algorithm. Optimized photometry algorithms were then applied to the sample of test images, which were modified to include different patterns of variability of the stars (training sample). Different statistical estimators are considered for developing the general variable star identification algorithm. The algorithm will then be used to search for short-period variable stars in the real data.
Algorithmic improvements to the real-time implementation of a synthetic aperture sonar beam former
NASA Astrophysics Data System (ADS)
Freeman, Douglas K.
1997-07-01
Coastal Systems Station has translated its synthetic aperture sonar beamformer from linear processing to parallel processing. The initial implementation included many linear processes delegated to individual processors and neglected algorithmic refinements available to parallel processing. The steps taken to achieve increased computational speed for real-time beam forming are presented.
An implementation of a data-transmission pipelining algorithm on Imote2 platforms
NASA Astrophysics Data System (ADS)
Li, Xu; Dorvash, Siavash; Cheng, Liang; Pakzad, Shamim
2011-04-01
Over the past several years, wireless network systems and sensing technologies have been developed significantly. This has resulted in the broad application of wireless sensor networks (WSNs) in many engineering fields and in particular structural health monitoring (SHM). The movement of traditional SHM toward the new generation of SHM, which utilizes WSNs, relies on the advantages of this new approach such as relatively low costs, ease of implementation and the capability of onboard data processing and management. In the particular case of long span bridge monitoring, a WSN should be capable of transmitting commands and measurement data over long network geometry in a reliable manner. While using single-hop data transmission in such geometry requires a long radio range and consequently a high level of power supply, multi-hop communication may offer an effective and reliable way for data transmissions across the network. Using a multi-hop communication protocol, the network relays data from a remote node to the base station via intermediary nodes. We have proposed a data-transmission pipelining algorithm to enable an effective use of the available bandwidth and minimize the energy consumption and the delay performance by the multi-hop communication protocol. This paper focuses on the implementation aspect of the pipelining algorithm on Imote2 platforms for SHM applications, describes its interaction with underlying routing protocols, and presents the solutions to various implementation issues of the proposed pipelining algorithm. Finally, the performance of the algorithm is evaluated based on the results of an experimental implementation.
Implementation and evaluation of the new wind algorithm in NASA's 50 MHz doppler radar wind profiler
NASA Technical Reports Server (NTRS)
Taylor, Gregory E.; Manobianco, John T.; Schumann, Robin S.; Wheeler, Mark M.; Yersavich, Ann M.
1993-01-01
The purpose of this report is to document the Applied Meteorology Unit's implementation and evaluation of the wind algorithm developed by Marshall Space Flight Center (MSFC) on the data analysis processor (DAP) of NASA's 50 MHz doppler radar wind profiler (DRWP). The report also includes a summary of the 50 MHz DRWP characteristics and performance and a proposed concept of operations for the DRWP.
Zhang, Aizhen; Wen, Ning; Nurushev, Teamour; Burmeister, Jay; Chetty, Indrin J
2013-01-01
A commercial electron Monte Carlo (eMC) dose calculation algorithm has become available in Eclipse treatment planning system. The purpose of this work was to evaluate the eMC algorithm and investigate the clinical implementation of this system. The beam modeling of the eMC algorithm was performed for beam energies of 6, 9, 12, 16, and 20 MeV for a Varian Trilogy and all available applicator sizes in the Eclipse treatment planning system. The accuracy of the eMC algorithm was evaluated in a homogeneous water phantom, solid water phantoms containing lung and bone materials, and an anthropomorphic phantom. In addition, dose calculation accuracy was compared between pencil beam (PB) and eMC algorithms in the same treatment planning system for heterogeneous phantoms. The overall agreement between eMC calculations and measurements was within 3%/2 mm, while the PB algorithm had large errors (up to 25%) in predicting dose distributions in the presence of inhomogeneities such as bone and lung. The clinical implementation of the eMC algorithm was investigated by performing treatment planning for 15 patients with lesions in the head and neck, breast, chest wall, and sternum. The dose distributions were calculated using PB and eMC algorithms with no smoothing and all three levels of 3D Gaussian smoothing for comparison. Based on a routine electron beam therapy prescription method, the number of eMC calculated monitor units (MUs) was found to increase with increased 3D Gaussian smoothing levels. 3D Gaussian smoothing greatly improved the visual usability of dose distributions and produced better target coverage. Differences of calculated MUs and dose distributions between eMC and PB algorithms could be significant when oblique beam incidence, surface irregularities, and heterogeneous tissues were present in the treatment plans. In our patient cases, monitor unit differences of up to 7% were observed between PB and eMC algorithms. Monitor unit calculations were also preformed
Towards the Implementation of an Autonomous Camera Algorithm on the da Vinci Platform.
Eslamian, Shahab; Reisner, Luke A; King, Brady W; Pandya, Abhilash K
2016-01-01
Camera positioning is critical for all telerobotic surgical systems. Inadequate visualization of the remote site can lead to serious errors that can jeopardize the patient. An autonomous camera algorithm has been developed on a medical robot (da Vinci) simulator. It is found to be robust in key scenarios of operation. This system behaves with predictable and expected actions for the camera arm with respect to the tool positions. The implementation of this system is described herein. The simulation closely models the methodology needed to implement autonomous camera control in a real hardware system. The camera control algorithm follows three rules: (1) keep the view centered on the tools, (2) keep the zoom level optimized such that the tools never leave the field of view, and (3) avoid unnecessary movement of the camera that may distract/disorient the surgeon. Our future work will apply this algorithm to the real da Vinci hardware. PMID:27046563
Design and Implementation of Hybrid CORDIC Algorithm Based on Phase Rotation Estimation for NCO
Zhang, Chaozhu; Han, Jinan; Li, Ke
2014-01-01
The numerical controlled oscillator has wide application in radar, digital receiver, and software radio system. Firstly, this paper introduces the traditional CORDIC algorithm. Then in order to improve computing speed and save resources, this paper proposes a kind of hybrid CORDIC algorithm based on phase rotation estimation applied in numerical controlled oscillator (NCO). Through estimating the direction of part phase rotation, the algorithm reduces part phase rotation and add-subtract unit, so that it decreases delay. Furthermore, the paper simulates and implements the numerical controlled oscillator by Quartus II software and Modelsim software. Finally, simulation results indicate that the improvement over traditional CORDIC algorithm is achieved in terms of ease of computation, resource utilization, and computing speed/delay while maintaining the precision. It is suitable for high speed and precision digital modulation and demodulation. PMID:25110750
Design and implementation of hybrid CORDIC algorithm based on phase rotation estimation for NCO.
Zhang, Chaozhu; Han, Jinan; Li, Ke
2014-01-01
The numerical controlled oscillator has wide application in radar, digital receiver, and software radio system. Firstly, this paper introduces the traditional CORDIC algorithm. Then in order to improve computing speed and save resources, this paper proposes a kind of hybrid CORDIC algorithm based on phase rotation estimation applied in numerical controlled oscillator (NCO). Through estimating the direction of part phase rotation, the algorithm reduces part phase rotation and add-subtract unit, so that it decreases delay. Furthermore, the paper simulates and implements the numerical controlled oscillator by Quartus II software and Modelsim software. Finally, simulation results indicate that the improvement over traditional CORDIC algorithm is achieved in terms of ease of computation, resource utilization, and computing speed/delay while maintaining the precision. It is suitable for high speed and precision digital modulation and demodulation. PMID:25110750
Kamath, Jayesh; Wakai, Sara; Zhang, Wanli; Kesten, Karen; Shelton, Deborah; Trestman, Robert
2016-08-01
Use of medication algorithms in the correctional setting may facilitate clinical decision making, improve consistency of care, and reduce polypharmacy. The objective of the present study was to evaluate effectiveness of algorithm (Texas Implementation of Medication Algorithm [TIMA])-driven treatment of bipolar disorder (BD) compared with Treatment as Usual (TAU) in the correctional environment. A total of 61 women inmates with BD were randomized to TIMA (n = 30) or TAU (n = 31) and treated over a 12-week period. The outcome measures included measures of BD symptoms, comorbid symptomatology, quality of life, and psychotropic medication utilization. In comparison with TAU, TIMA-driven treatment reduced polypharmacy, decreased overall psychotropic medication utilization, and significantly decreased use of specific classes of psychotropic medication (antipsychotics and antidepressants). This pilot study confirmed the feasibility and benefits of algorithm-driven treatment of BD in the correctional setting, primarily by enhancing appropriate use of evidence-based treatment. PMID:25829456
Quantum Algorithm for Universal Implementation of the Projective Measurement of Energy
NASA Astrophysics Data System (ADS)
Nakayama, Shojun; Soeda, Akihito; Murao, Mio
2015-05-01
A projective measurement of energy (PME) on a quantum system is a quantum measurement determined by the Hamiltonian of the system. PME protocols exist when the Hamiltonian is given in advance. Unknown Hamiltonians can be identified by quantum tomography, but the time cost to achieve a given accuracy increases exponentially with the size of the quantum system. In this Letter, we improve the time cost by adapting quantum phase estimation, an algorithm designed for computational problems, to measurements on physical systems. We present a PME protocol without quantum tomography for Hamiltonians whose dimension and energy scale are given but which are otherwise unknown. Our protocol implements a PME to arbitrary accuracy without any dimension dependence on its time cost. We also show that another computational quantum algorithm may be used for efficient estimation of the energy scale. These algorithms show that computational quantum algorithms, with suitable modifications, have applications beyond their original context.
Implementation of the Iterative Proportion Fitting Algorithm for Geostatistical Facies Modeling
Li Yupeng Deutsch, Clayton V.
2012-06-15
In geostatistics, most stochastic algorithm for simulation of categorical variables such as facies or rock types require a conditional probability distribution. The multivariate probability distribution of all the grouped locations including the unsampled location permits calculation of the conditional probability directly based on its definition. In this article, the iterative proportion fitting (IPF) algorithm is implemented to infer this multivariate probability. Using the IPF algorithm, the multivariate probability is obtained by iterative modification to an initial estimated multivariate probability using lower order bivariate probabilities as constraints. The imposed bivariate marginal probabilities are inferred from profiles along drill holes or wells. In the IPF process, a sparse matrix is used to calculate the marginal probabilities from the multivariate probability, which makes the iterative fitting more tractable and practical. This algorithm can be extended to higher order marginal probability constraints as used in multiple point statistics. The theoretical framework is developed and illustrated with estimation and simulation example.
Design and Implementation of Broadcast Algorithms for Extreme-Scale Systems
Shamis, Pavel; Graham, Richard L; Gorentla Venkata, Manjunath; Ladd, Joshua
2011-01-01
The scalability and performance of collective communication operations limit the scalability and performance of many scientific applications. This paper presents two new blocking and nonblocking Broadcast algorithms for communicators with arbitrary communication topology, and studies their performance. These algorithms benefit from increased concurrency and a reduced memory footprint, making them suitable for use on large-scale systems. Measuring small, medium, and large data Broadcasts on a Cray-XT5, using 24,576 MPI processes, the Cheetah algorithms outperform the native MPI on that system by 51%, 69%, and 9%, respectively, at the same process count. These results demonstrate an algorithmic approach to the implementation of the important class of collective communications, which is high performing, scalable, and also uses resources in a scalable manner.
A complete implementation of the conjugate gradient algorithm on a reconfigurable supercomputer
Dubois, David H; Dubois, Andrew J; Connor, Carolyn M; Boorman, Thomas M; Poole, Stephen W
2008-01-01
The conjugate gradient is a prominent iterative method for solving systems of sparse linear equations. Large-scale scientific applications often utilize a conjugate gradient solver at their computational core. In this paper we present a field programmable gate array (FPGA) based implementation of a double precision, non-preconditioned, conjugate gradient solver for fmite-element or finite-difference methods. OUf work utilizes the SRC Computers, Inc. MAPStation hardware platform along with the 'Carte' software programming environment to ease the programming workload when working with the hybrid (CPUIFPGA) environment. The implementation is designed to handle large sparse matrices of up to order N x N where N <= 116,394, with up to 7 non-zero, 64-bit elements per sparse row. This implementation utilizes an optimized sparse matrix-vector multiply operation which is critical for obtaining high performance. Direct parallel implementations of loop unrolling and loop fusion are utilized to extract performance from the various vector/matrix operations. Rather than utilize the FPGA devices as function off-load accelerators, our implementation uses the FPGAs to implement the core conjugate gradient algorithm. Measured run-time performance data is presented comparing the FPGA implementation to a software-only version showing that the FPGA can outperform processors running up to 30x the clock rate. In conclusion we take a look at the new SRC-7 system and estimate the performance of this algorithm on that architecture.
On recursive least-squares filtering algorithms and implementations. Ph.D. Thesis
NASA Technical Reports Server (NTRS)
Hsieh, Shih-Fu
1990-01-01
In many real-time signal processing applications, fast and numerically stable algorithms for solving least-squares problems are necessary and important. In particular, under non-stationary conditions, these algorithms must be able to adapt themselves to reflect the changes in the system and take appropriate adjustments to achieve optimum performances. Among existing algorithms, the QR-decomposition (QRD)-based recursive least-squares (RLS) methods have been shown to be useful and effective for adaptive signal processing. In order to increase the speed of processing and achieve high throughput rate, many algorithms are being vectorized and/or pipelined to facilitate high degrees of parallelism. A time-recursive formulation of RLS filtering employing block QRD will be considered first. Several methods, including a new non-continuous windowing scheme based on selectively rejecting contaminated data, were investigated for adaptive processing. Based on systolic triarrays, many other forms of systolic arrays are shown to be capable of implementing different algorithms. Various updating and downdating systolic algorithms and architectures for RLS filtering are examined and compared in details, which include Householder reflector, Gram-Schmidt procedure, and Givens rotation. A unified approach encompassing existing square-root-free algorithms is also proposed. For the sinusoidal spectrum estimation problem, a judicious method of separating the noise from the signal is of great interest. Various truncated QR methods are proposed for this purpose and compared to the truncated SVD method. Computer simulations provided for detailed comparisons show the effectiveness of these methods. This thesis deals with fundamental issues of numerical stability, computational efficiency, adaptivity, and VLSI implementation for the RLS filtering problems. In all, various new and modified algorithms and architectures are proposed and analyzed; the significance of any of the new method depends
Further optimization of SeDDaRA blind image deconvolution algorithm and its DSP implementation
NASA Astrophysics Data System (ADS)
Wen, Bo; Zhang, Qiheng; Zhang, Jianlin
2011-11-01
Efficient algorithm for blind image deconvolution and its high-speed implementation is of great value in practice. Further optimization of SeDDaRA is developed, from algorithm structure to numerical calculation methods. The main optimization covers that, the structure's modularization for good implementation feasibility, reducing the data computation and dependency of 2D-FFT/IFFT, and acceleration of power operation by segmented look-up table. Then the Fast SeDDaRA is proposed and specialized for low complexity. As the final implementation, a hardware system of image restoration is conducted by using the multi-DSP parallel processing. Experimental results show that, the processing time and memory demand of Fast SeDDaRA decreases 50% at least; the data throughput of image restoration system is over 7.8Msps. The optimization is proved efficient and feasible, and the Fast SeDDaRA is able to support the real-time application.
A dual-processor multi-frequency implementation of the FINDS algorithm
NASA Technical Reports Server (NTRS)
Godiwala, Pankaj M.; Caglayan, Alper K.
1987-01-01
This report presents a parallel processing implementation of the FINDS (Fault Inferring Nonlinear Detection System) algorithm on a dual processor configured target flight computer. First, a filter initialization scheme is presented which allows the no-fail filter (NFF) states to be initialized using the first iteration of the flight data. A modified failure isolation strategy, compatible with the new failure detection strategy reported earlier, is discussed and the performance of the new FDI algorithm is analyzed using flight recorded data from the NASA ATOPS B-737 aircraft in a Microwave Landing System (MLS) environment. The results show that low level MLS, IMU, and IAS sensor failures are detected and isolated instantaneously, while accelerometer and rate gyro failures continue to take comparatively longer to detect and isolate. The parallel implementation is accomplished by partitioning the FINDS algorithm into two parts: one based on the translational dynamics and the other based on the rotational kinematics. Finally, a multi-rate implementation of the algorithm is presented yielding significantly low execution times with acceptable estimation and FDI performance.
Madduri, Kamesh; Ediger, David; Jiang, Karl; Bader, David A.; Chavarría-Miranda, Daniel
2009-05-29
We present a new lock-free parallel algorithm for computing betweenness centrality of massive small-world networks. With minor changes to the data structures, our algorithm also achieves better spatial cache locality compared to previous approaches. Betweenness centrality is a key algorithm kernel in the HPCS SSCA#2 Graph Analysis benchmark, which has been extensively used to evaluate the performance of emerging high-performance computing architectures for graph-theoretic computations. We design optimized implementations of betweenness centrality and the SSCA#2 benchmark for two hardware multithreaded systems: a Cray XMT system with the ThreadStorm processor, and a single-socket Sun multicore server with the UltraSparc T2 processor. For a small-world network of 134 million vertices and 1.073 billion edges, the 16-processor XMT system and the 8-core Sun Fire T5120 server achieve TEPS scores (an algorithmic performance count for the SSCA#2 benchmark) of 160 million and 90 million respectively, which corresponds to more than a 2X performance improvement over the previous parallel implementations. To better characterize the performance of these multithreaded systems, we correlate the SSCA#2 performance results with data from the memory-intensive STREAM and RandomAccess benchmarks. Finally, we demonstrate the applicability of our implementation to analyze massive real-world datasets by computing approximate betweenness centrality for a large-scale IMDb movie-actor network.
The design and hardware implementation of a low-power real-time seizure detection algorithm
NASA Astrophysics Data System (ADS)
Raghunathan, Shriram; Gupta, Sumeet K.; Ward, Matthew P.; Worth, Robert M.; Roy, Kaushik; Irazoqui, Pedro P.
2009-10-01
Epilepsy affects more than 1% of the world's population. Responsive neurostimulation is emerging as an alternative therapy for the 30% of the epileptic patient population that does not benefit from pharmacological treatment. Efficient seizure detection algorithms will enable closed-loop epilepsy prostheses by stimulating the epileptogenic focus within an early onset window. Critically, this is expected to reduce neuronal desensitization over time and lead to longer-term device efficacy. This work presents a novel event-based seizure detection algorithm along with a low-power digital circuit implementation. Hippocampal depth-electrode recordings from six kainate-treated rats are used to validate the algorithm and hardware performance in this preliminary study. The design process illustrates crucial trade-offs in translating mathematical models into hardware implementations and validates statistical optimizations made with empirical data analyses on results obtained using a real-time functioning hardware prototype. Using quantitatively predicted thresholds from the depth-electrode recordings, the auto-updating algorithm performs with an average sensitivity and selectivity of 95.3 ± 0.02% and 88.9 ± 0.01% (mean ± SEα = 0.05), respectively, on untrained data with a detection delay of 8.5 s [5.97, 11.04] from electrographic onset. The hardware implementation is shown feasible using CMOS circuits consuming under 350 nW of power from a 250 mV supply voltage from simulations on the MIT 180 nm SOI process.
Parallel Implementations Of The Nelder-Mead Simplex Algorithm For Unconstrained Optimization
NASA Astrophysics Data System (ADS)
Dennis, J. E.; Torczon, Virginia
1988-04-01
We are interested in implementing direct search methods on parallel computers to solve the unconstrained minimization problem: Given a function f : IRn --? IR find an x E En that minimizes 1 (x). Our preliminary work has focused on the Nelder-Mead simplex algorithm. The origin of the algorithm can be found in a 1962 paper by Spendley, Hext and Himsworth;1 Nelder and Meade proposed an adaptive version which proved to be much more robust in practice. Dennis and Woods3 give a clear presentation of the standard Nelder-Mead simplex algorithm; Woods4 includes a more complete discussion of implementation details as well as some preliminary convergence results. Since descriptions of the standard Nelder-Mead simplex algorithm appear in Nelder and Mead,2 Dennis and Woods,3 and Woods,4 we will limit our introductory discussion to the advantages and disadvantages of the algorithm, as well as some of the features which make it so popular. We then outline the approaches we have taken and discuss our preliminary results. We conclude with a discussion of future research and some observations about our findings.
Efficient parallel implementation of active appearance model fitting algorithm on GPU.
Wang, Jinwei; Ma, Xirong; Zhu, Yuanping; Sun, Jizhou
2014-01-01
The active appearance model (AAM) is one of the most powerful model-based object detecting and tracking methods which has been widely used in various situations. However, the high-dimensional texture representation causes very time-consuming computations, which makes the AAM difficult to apply to real-time systems. The emergence of modern graphics processing units (GPUs) that feature a many-core, fine-grained parallel architecture provides new and promising solutions to overcome the computational challenge. In this paper, we propose an efficient parallel implementation of the AAM fitting algorithm on GPUs. Our design idea is fine grain parallelism in which we distribute the texture data of the AAM, in pixels, to thousands of parallel GPU threads for processing, which makes the algorithm fit better into the GPU architecture. We implement our algorithm using the compute unified device architecture (CUDA) on the Nvidia's GTX 650 GPU, which has the latest Kepler architecture. To compare the performance of our algorithm with different data sizes, we built sixteen face AAM models of different dimensional textures. The experiment results show that our parallel AAM fitting algorithm can achieve real-time performance for videos even on very high-dimensional textures. PMID:24723812
Efficient Parallel Implementation of Active Appearance Model Fitting Algorithm on GPU
Wang, Jinwei; Ma, Xirong; Zhu, Yuanping; Sun, Jizhou
2014-01-01
The active appearance model (AAM) is one of the most powerful model-based object detecting and tracking methods which has been widely used in various situations. However, the high-dimensional texture representation causes very time-consuming computations, which makes the AAM difficult to apply to real-time systems. The emergence of modern graphics processing units (GPUs) that feature a many-core, fine-grained parallel architecture provides new and promising solutions to overcome the computational challenge. In this paper, we propose an efficient parallel implementation of the AAM fitting algorithm on GPUs. Our design idea is fine grain parallelism in which we distribute the texture data of the AAM, in pixels, to thousands of parallel GPU threads for processing, which makes the algorithm fit better into the GPU architecture. We implement our algorithm using the compute unified device architecture (CUDA) on the Nvidia's GTX 650 GPU, which has the latest Kepler architecture. To compare the performance of our algorithm with different data sizes, we built sixteen face AAM models of different dimensional textures. The experiment results show that our parallel AAM fitting algorithm can achieve real-time performance for videos even on very high-dimensional textures. PMID:24723812
Rios, A. B.; Valda, A.; Somacal, H.
2007-10-26
Usually tomographic procedure requires a set of projections around the object under study and a mathematical processing of such projections through reconstruction algorithms. An accurate reconstruction requires a proper number of projections (angular sampling) and a proper number of elements in each projection (linear sampling). However in several practical cases it is not possible to fulfill these conditions leading to the so-called problem of few projections. In this case, iterative reconstruction algorithms are more suitable than analytic ones. In this work we present a program written in C++ that provides an environment for two iterative algorithm implementations, one algebraic and the other statistical. The software allows the user a full definition of the acquisition and reconstruction geometries used for the reconstruction algorithms but also to perform projection and backprojection operations. A set of analysis tools was implemented for the characterization of the convergence process. We analyze the performance of the algorithms on numerical phantoms and present the reconstruction of experimental data with few projections coming from transmission X-ray and micro PIXE (Particle-Induced X-Ray Emission) images.
Implementation and analysis of a Navier-Stokes algorithm on parallel computers
NASA Technical Reports Server (NTRS)
Fatoohi, Raad A.; Grosch, Chester E.
1988-01-01
The results of the implementation of a Navier-Stokes algorithm on three parallel/vector computers are presented. The object of this research is to determine how well, or poorly, a single numerical algorithm would map onto three different architectures. The algorithm is a compact difference scheme for the solution of the incompressible, two-dimensional, time-dependent Navier-Stokes equations. The computers were chosen so as to encompass a variety of architectures. They are the following: the MPP, an SIMD machine with 16K bit serial processors; Flex/32, an MIMD machine with 20 processors; and Cray/2. The implementation of the algorithm is discussed in relation to these architectures and measures of the performance on each machine are given. The basic comparison is among SIMD instruction parallelism on the MPP, MIMD process parallelism on the Flex/32, and vectorization of a serial code on the Cray/2. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for this algorithm on these architectures. Finally, conclusions are presented.
Software algorithm and hardware design for real-time implementation of new spectral estimator
2014-01-01
Background Real-time spectral analyzers can be difficult to implement for PC computer-based systems because of the potential for high computational cost, and algorithm complexity. In this work a new spectral estimator (NSE) is developed for real-time analysis, and compared with the discrete Fourier transform (DFT). Method Clinical data in the form of 216 fractionated atrial electrogram sequences were used as inputs. The sample rate for acquisition was 977 Hz, or approximately 1 millisecond between digital samples. Real-time NSE power spectra were generated for 16,384 consecutive data points. The same data sequences were used for spectral calculation using a radix-2 implementation of the DFT. The NSE algorithm was also developed for implementation as a real-time spectral analyzer electronic circuit board. Results The average interval for a single real-time spectral calculation in software was 3.29 μs for NSE versus 504.5 μs for DFT. Thus for real-time spectral analysis, the NSE algorithm is approximately 150× faster than the DFT. Over a 1 millisecond sampling period, the NSE algorithm had the capability to spectrally analyze a maximum of 303 data channels, while the DFT algorithm could only analyze a single channel. Moreover, for the 8 second sequences, the NSE spectral resolution in the 3-12 Hz range was 0.037 Hz while the DFT spectral resolution was only 0.122 Hz. The NSE was also found to be implementable as a standalone spectral analyzer board using approximately 26 integrated circuits at a cost of approximately $500. The software files used for analysis are included as a supplement, please see the Additional files 1 and 2. Conclusions The NSE real-time algorithm has low computational cost and complexity, and is implementable in both software and hardware for 1 millisecond updates of multichannel spectra. The algorithm may be helpful to guide radiofrequency catheter ablation in real time. PMID:24886214
Implementation of an algorithm for cylindrical object identification using range data
NASA Technical Reports Server (NTRS)
Bozeman, Sylvia T.; Martin, Benjamin J.
1989-01-01
One of the problems in 3-D object identification and localization is addressed. In robotic and navigation applications the vision system must be able to distinguish cylindrical or spherical objects as well as those of other geometric shapes. An algorithm was developed to identify cylindrical objects in an image when range data is used. The algorithm incorporates the Hough transform for line detection using edge points which emerge from a Sobel mask. Slices of the data are examined to locate arcs of circles using the normal equations of an over-determined linear system. Current efforts are devoted to testing the computer implementation of the algorithm. Refinements are expected to continue in order to accommodate cylinders in various positions. A technique is sought which is robust in the presence of noise and partial occlusions.
An implementation of the look-ahead Lanczos algorithm for non-Hermitian matrices
NASA Technical Reports Server (NTRS)
Freund, Roland W.; Gutknecht, Martin H.; Nachtigal, Noel M.
1991-01-01
The nonsymmetric Lanczos method can be used to compute eigenvalues of large sparse non-Hermitian matrices or to solve large sparse non-Hermitian linear systems. However, the original Lanczos algorithm is susceptible to possible breakdowns and potential instabilities. An implementation is presented of a look-ahead version of the Lanczos algorithm that, except for the very special situation of an incurable breakdown, overcomes these problems by skipping over those steps in which a breakdown or near-breakdown would occur in the standard process. The proposed algorithm can handle look-ahead steps of any length and requires the same number of matrix-vector products and inner products as the standard Lanczos process without look-ahead.
NASA Astrophysics Data System (ADS)
Chung, Vera Y. Y.; Bergmann, Neil W.
1998-12-01
This paper presents how to implement the block-matching motion estimation algorithm efficiently by Field Programmable Gate Arrays (FPGAs) based Custom Computer Machine (CCM) for video compression. The SPACE2 Custom Computer board consists of up to eight Xilinx XC6216 fine- grain, sea-of-gate FPGA chips. The results show that two Xilinx XC6216 FPGA can perform at 960 MOPs, hence the real- time full-search motion estimation encoder can be easily implemented by our SPACE2 CCM system.
NASA Technical Reports Server (NTRS)
Jaggi, S.; Quattrochi, Dale A.; Lam, Nina S.-N.
1993-01-01
Fractal geometry is increasingly becoming a useful tool for modeling natural phenomena. As an alternative to Euclidean concepts, fractals allow for a more accurate representation of the nature of complexity in natural boundaries and surfaces. The purpose of this paper is to introduce and implement three algorithms in C code for deriving fractal measurement from remotely sensed data. These three methods are: the line-divider method, the variogram method, and the triangular prism method. Remote-sensing data acquired by NASA's Calibrated Airborne Multispectral Scanner (CAMS) are used to compute the fractal dimension using each of the three methods. These data were obtained as a 30 m pixel spatial resolution over a portion of western Puerto Rico in January 1990. A description of the three methods, their implementation in PC-compatible environment, and some results of applying these algorithms to remotely sensed image data are presented.
An Optional Threshold with Svm Cloud Detection Algorithm and Dsp Implementation
NASA Astrophysics Data System (ADS)
Zhou, Guoqing; Zhou, Xiang; Yue, Tao; Liu, Yilong
2016-06-01
This paper presents a method which combines the traditional threshold method and SVM method, to detect the cloud of Landsat-8 images. The proposed method is implemented using DSP for real-time cloud detection. The DSP platform connects with emulator and personal computer. The threshold method is firstly utilized to obtain a coarse cloud detection result, and then the SVM classifier is used to obtain high accuracy of cloud detection. More than 200 cloudy images from Lansat-8 were experimented to test the proposed method. Comparing the proposed method with SVM method, it is demonstrated that the cloud detection accuracy of each image using the proposed algorithm is higher than those of SVM algorithm. The results of the experiment demonstrate that the implementation of the proposed method on DSP can effectively realize the real-time cloud detection accurately.
NASA Technical Reports Server (NTRS)
Tilton, James C.; Plaza, Antonio J. (Editor); Chang, Chein-I. (Editor)
2008-01-01
The hierarchical image segmentation algorithm (referred to as HSEG) is a hybrid of hierarchical step-wise optimization (HSWO) and constrained spectral clustering that produces a hierarchical set of image segmentations. HSWO is an iterative approach to region grooving segmentation in which the optimal image segmentation is found at N(sub R) regions, given a segmentation at N(sub R+1) regions. HSEG's addition of constrained spectral clustering makes it a computationally intensive algorithm, for all but, the smallest of images. To counteract this, a computationally efficient recursive approximation of HSEG (called RHSEG) has been devised. Further improvements in processing speed are obtained through a parallel implementation of RHSEG. This chapter describes this parallel implementation and demonstrates its computational efficiency on a Landsat Thematic Mapper test scene.
NASA Astrophysics Data System (ADS)
Barney, D.; Bialas, W.; Kokkas, P.; Manthos, N.; Reynaud, S.; Sidiropoulos, G.; Vichoudis, P.
2007-03-01
The CMS Endcap Preshower (ES) sub-detector comprises 4288 silicon sensors, each containing 32 strips. The data are transferred from the detector to the counting room via 1208 optical fibres running at 800Mbps. Each fibre carries data from two, three or four sensors. For the readout of the Preshower, a VME-based system, the Endcap Preshower Data Concentrator Card (ES-DCC), is currently under development. The main objective of each readout board is to acquire on-detector data from up to 36 optical links, perform on-line data reduction via zero suppression and pass the concentrated data to the CMS event builder. This document presents the conceptual design of the Reduction Algorithms as well as their implementation in the ES-DCC FPGAs. These algorithms, as implemented in the ES-DCC, result in a data-reduction factor of 20.
Design and Implementation of IIR Algorithms for Control of Longitudinal Coupled-Bunch Instabilities
Teytelman, Dmitry
2000-05-16
The recent installation of third-harmonic RF cavities at the Advanced Light Source has raised instability growth rates, and also caused tune shifts (coherent and incoherent) of more than an octave over the required range of beam currents and energies. The larger growth rates and tune shifts have rendered control by the original bandpass FIR feedback algorithms unreliable. In this paper the authors describe an implementation of an IIR feedback algorithm with more exible response tailoring. A cascade of up to 6 second-order IIR sections (12 poles and 12 zeros) was implemented in the DSPs of the longitudinal feedback system. Filter design has been formulated as an optimization problem and solved using constrained optimization methods. These IIR filters provided 2.4 times the control bandwidth as compared to the original FIR designs. Here the authors demonstrate the performance of the designed filters using transient diagnostic measurements from ALS and DAPNE.
NASA Astrophysics Data System (ADS)
Karabiber, Fethullah; Vecchio, Pietro; Grassi, Giuseppe
2011-12-01
The Bio-inspired (Bi-i) Cellular Vision System is a computing platform consisting of sensing, array sensing-processing, and digital signal processing. The platform is based on the Cellular Neural/Nonlinear Network (CNN) paradigm. This article presents the implementation of a novel CNN-based segmentation algorithm onto the Bi-i system. Each part of the algorithm, along with the corresponding implementation on the hardware platform, is carefully described through the article. The experimental results, carried out for Foreman and Car-phone video sequences, highlight the feasibility of the approach, which provides a frame rate of about 26 frames/s. Comparisons with existing CNN-based methods show that the conceived approach is more accurate, thus representing a good trade-off between real-time requirements and accuracy.
A new morphological anomaly detection algorithm for hyperspectral images and its GPU implementation
NASA Astrophysics Data System (ADS)
Paz, Abel; Plaza, Antonio
2011-10-01
Anomaly detection is considered a very important task for hyperspectral data exploitation. It is now routinely applied in many application domains, including defence and intelligence, public safety, precision agriculture, geology, or forestry. Many of these applications require timely responses for swift decisions which depend upon high computing performance of algorithm analysis. However, with the recent explosion in the amount and dimensionality of hyperspectral imagery, this problem calls for the incorporation of parallel computing techniques. In the past, clusters of computers have offered an attractive solution for fast anomaly detection in hyperspectral data sets already transmitted to Earth. However, these systems are expensive and difficult to adapt to on-board data processing scenarios, in which low-weight and low-power integrated components are essential to reduce mission payload and obtain analysis results in (near) real-time, i.e., at the same time as the data is collected by the sensor. An exciting new development in the field of commodity computing is the emergence of commodity graphics processing units (GPUs), which can now bridge the gap towards on-board processing of remotely sensed hyperspectral data. In this paper, we develop a new morphological algorithm for anomaly detection in hyperspectral images along with an efficient GPU implementation of the algorithm. The algorithm is implemented on latest-generation GPU architectures, and evaluated with regards to other anomaly detection algorithms using hyperspectral data collected by NASA's Airborne Visible Infra-Red Imaging Spectrometer (AVIRIS) over the World Trade Center (WTC) in New York, five days after the terrorist attacks that collapsed the two main towers in the WTC complex. The proposed GPU implementation achieves real-time performance in the considered case study.
Sundareshan, Malur K; Bhattacharjee, Supratik; Inampudi, Radhika; Pang, Ho-Yuen
2002-12-10
Computational complexity is a major impediment to the real-time implementation of image restoration and superresolution algorithms in many applications. Although powerful restoration algorithms have been developed within the past few years utilizing sophisticated mathematical machinery (based on statistical optimization and convex set theory), these algorithms are typically iterative in nature and require a sufficient number of iterations to be executed to achieve the desired resolution improvement that may be needed to meaningfully perform postprocessing image exploitation tasks in practice. Additionally, recent technological breakthroughs have facilitated novel sensor designs (focal plane arrays, for instance) that make it possible to capture megapixel imagery data at video frame rates. A major challenge in the processing of these large-format images is to complete the execution of the image processing steps within the frame capture times and to keep up with the output rate of the sensor so that all data captured by the sensor can be efficiently utilized. Consequently, development of novel methods that facilitate real-time implementation of image restoration and superresolution algorithms is of significant practical interest and is the primary focus of this study. The key to designing computationally efficient processing schemes lies in strategically introducing appropriate preprocessing steps together with the superresolution iterations to tailor optimized overall processing sequences for imagery data of specific formats. For substantiating this assertion, three distinct methods for tailoring a preprocessing filter and integrating it with the superresolution processing steps are outlined. These methods consist of a region-of-interest extraction scheme, a background-detail separation procedure, and a scene-derived information extraction step for implementing a set-theoretic restoration of the image that is less demanding in computation compared with the
NASA Astrophysics Data System (ADS)
Sundareshan, Malur K.; Bhattacharjee, Supratik; Inampudi, Radhika; Pang, Ho-Yuen
2002-12-01
Computational complexity is a major impediment to the real-time implementation of image restoration and superresolution algorithms in many applications. Although powerful restoration algorithms have been developed within the past few years utilizing sophisticated mathematical machinery (based on statistical optimization and convex set theory), these algorithms are typically iterative in nature and require a sufficient number of iterations to be executed to achieve the desired resolution improvement that may be needed to meaningfully perform postprocessing image exploitation tasks in practice. Additionally, recent technological breakthroughs have facilitated novel sensor designs (focal plane arrays, for instance) that make it possible to capture megapixel imagery data at video frame rates. A major challenge in the processing of these large-format images is to complete the execution of the image processing steps within the frame capture times and to keep up with the output rate of the sensor so that all data captured by the sensor can be efficiently utilized. Consequently, development of novel methods that facilitate real-time implementation of image restoration and superresolution algorithms is of significant practical interest and is the primary focus of this study. The key to designing computationally efficient processing schemes lies in strategically introducing appropriate preprocessing steps together with the superresolution iterations to tailor optimized overall processing sequences for imagery data of specific formats. For substantiating this assertion, three distinct methods for tailoring a preprocessing filter and integrating it with the superresolution processing steps are outlined. These methods consist of a region-of-interest extraction scheme, a background-detail separation procedure, and a scene-derived information extraction step for implementing a set-theoretic restoration of the image that is less demanding in computation compared with the
Implementation of a block Lanczos algorithm for Eigenproblem solution of gyroscopic systems
NASA Technical Reports Server (NTRS)
Gupta, Kajal K.; Lawson, Charles L.
1987-01-01
The details of implementation of a general numerical procedure developed for the accurate and economical computation of natural frequencies and associated modes of any elastic structure rotating along an arbitrary axis are described. A block version of the Lanczos algorithm is derived for the solution that fully exploits associated matrix sparsity and employs only real numbers in all relevant computations. It is also capable of determining multiple roots and proves to be most efficient when compared to other, similar, exisiting techniques.
NASA Astrophysics Data System (ADS)
Besrour, Amine; Abdelkefi, Fatma; Siala, Mohamed; Snoussi, Hichem
2015-09-01
Nowadays, the high dynamic range (HDR) imaging represents the subject of the most researches. The major problem lies in the implementation of the best algorithm to acquire the best video quality. In fact, the major constraint is to conceive an optimal fusion which must meet the rapid movement of video frames. The implemented merging algorithms were not quick enough to reconstitute the HDR video. In this paper, we detail each of the previous existing works before detailing our algorithm and presenting results from the acquired HDR images, tone mapped with various techniques. Our proposed algorithm guarantees a more enhanced and faster solution compared to the existing ones. In fact, it has the ability to calculate the saturation matrix related to the saturation rate of the neighboring pixels. The computed coefficients are affected respectively to each picture from the tested ones. This analysis provides faster and efficient results in terms of quality and brightness. The originality of our work remains on its processing method including the pixels saturation in the totality of the captured pictures and their combination in order to obtain the best pictures illustrating all the possible details. These parameters are computed for each zone depending on the contrast and the luminosity of the current pixel and its neighboring. The final HDR image's coefficients are calculated dynamically ensuring the best image quality equilibrating the brightness and contrast values and making the perfect final image.
Implementation of a media synchronization algorithm for multistandard IP set-top box systems
NASA Astrophysics Data System (ADS)
Estévez, Esther; Samper, David; Pescador, Fernando; Juárez, Eduardo; Sanz, César
2009-05-01
Media synchronization at network context minimizes the effects of the network jitter and the skew between the emitter and receiver clocks. Theoretical algorithms cannot always be implemented on real systems for the architecture differences between a real and a theoretical system. In this paper an implementation for an intra-medium and an inter-media synchronization algorithm for a real multistandard IP set-top box is presented. For intra-medium synchronization, the proposed technique is based on controlling the receiver buffer. However for inter-media synchronization, the proposed technique is based on controlling the video playback according the Presentation Time Stamp (PTS) of the media units (audio and video). The proposed synchronizations algorithms has been integrated in an IP-STB and tested in a real environment using DVD movies and TV channels with excellent results. Those results show that the proposed algorithm can achieve media synchronization and meet the requirements of perceived quality of service (P-QoS).
Design And Implementation Of A Multi-Sensor Fusion Algorithm On A Hypercube Computer Architecture
NASA Astrophysics Data System (ADS)
Glover, Charles W.
1990-03-01
A multi-sensor integration (MSI) algorithm written for sequential single processor computer architecture has been transformed into a concurrent algorithm and implemented in parallel on a multi-processor hypercube computer architecture. This paper will present the philosophy and methodologies used in the decomposition of the sequential MSI algorithm, and its transformation into a parallel MSI algorithm. The parallel MSI algorithm was implemented on a NCUBETM hypercube computer. The performance of the parallel MSI algorithm has been measured and compared against its sequential counterpart by running test case scenarios through a simulation program. The simulation program allows the user to define the trajectories of all players in the scenario, and to pick the sensor suites of the players and their operating characteristics. For example, an air-to-air engagement scenario was used as one of the test cases. In this scenario, two friend aircrafts were being attacked by six foe aircraft in a pincer maneuver. Both the friend and foe aircrafts launch missiles at several different time points in the engagement. The sensor suites on each aircraft are dual mode RADAR, dual mode IRST, and ESM sensors. The modes of the sensors are switched as needed throughout the scenario. The RADAR sensor is used only intermittently, thus most of the MSI information is obtained from passive sensing. The maneuvers in this scenario caused aircraft and missile to constantly fly in and out of sensors field-of-view (F0V). This resulted in the MSI algorithm to constantly reacquire, initiate, and delete new tracks as it tracked all objects in the scenario. The objective was to determine performance of the parallel MSI algorithm in such a complex environment, and to determine how many multi-processors (nodes) of the hypercube could be effectively used by an aircraft in such an environment. For the scenario just discussed, a 4-node hypercube was found to be the optimal size and a factor two in speedup
Madduri, Kamesh; Ediger, David; Jiang, Karl; Bader, David A.; Chavarria-Miranda, Daniel
2009-02-15
We present a new lock-free parallel algorithm for computing betweenness centralityof massive small-world networks. With minor changes to the data structures, ouralgorithm also achieves better spatial cache locality compared to previous approaches. Betweenness centrality is a key algorithm kernel in HPCS SSCA#2, a benchmark extensively used to evaluate the performance of emerging high-performance computing architectures for graph-theoretic computations. We design optimized implementations of betweenness centrality and the SSCA#2 benchmark for two hardware multithreaded systems: a Cray XMT system with the Threadstorm processor, and a single-socket Sun multicore server with the UltraSPARC T2 processor. For a small-world network of 134 million vertices and 1.073 billion edges, the 16-processor XMT system and the 8-core Sun Fire T5120 server achieve TEPS scores (an algorithmic performance count for the SSCA#2 benchmark) of 160 million and 90 million respectively, which corresponds to more than a 2X performance improvement over the previous parallel implementations. To better characterize the performance of these multithreaded systems, we correlate the SSCA#2 performance results with data from the memory-intensive STREAM and RandomAccess benchmarks. Finally, we demonstrate the applicability of our implementation to analyze massive real-world datasets by computing approximate betweenness centrality for a large-scale IMDb movie-actor network.
Implementation and evaluation of various demons deformable image registration algorithms on a GPU
NASA Astrophysics Data System (ADS)
Gu, Xuejun; Pan, Hubert; Liang, Yun; Castillo, Richard; Yang, Deshan; Choi, Dongju; Castillo, Edward; Majumdar, Amitava; Guerrero, Thomas; Jiang, Steve B.
2010-01-01
Online adaptive radiation therapy (ART) promises the ability to deliver an optimal treatment in response to daily patient anatomic variation. A major technical barrier for the clinical implementation of online ART is the requirement of rapid image segmentation. Deformable image registration (DIR) has been used as an automated segmentation method to transfer tumor/organ contours from the planning image to daily images. However, the current computational time of DIR is insufficient for online ART. In this work, this issue is addressed by using computer graphics processing units (GPUs). A gray-scale-based DIR algorithm called demons and five of its variants were implemented on GPUs using the compute unified device architecture (CUDA) programming environment. The spatial accuracy of these algorithms was evaluated over five sets of pulmonary 4D CT images with an average size of 256 × 256 × 100 and more than 1100 expert-determined landmark point pairs each. For all the testing scenarios presented in this paper, the GPU-based DIR computation required around 7 to 11 s to yield an average 3D error ranging from 1.5 to 1.8 mm. It is interesting to find out that the original passive force demons algorithms outperform subsequently proposed variants based on the combination of accuracy, efficiency and ease of implementation.
A digital architecture for support vector machines: theory, algorithm, and FPGA implementation.
Anguita, D; Boni, A; Ridella, S
2003-01-01
In this paper, we propose a digital architecture for support vector machine (SVM) learning and discuss its implementation on a field programmable gate array (FPGA). We analyze briefly the quantization effects on the performance of the SVM in classification problems to show its robustness, in the feedforward phase, respect to fixed-point math implementations; then, we address the problem of SVM learning. The architecture described here makes use of a new algorithm for SVM learning which is less sensitive to quantization errors respect to the solution appeared so far in the literature. The algorithm is composed of two parts: the first one exploits a recurrent network for finding the parameters of the SVM; the second one uses a bisection process for computing the threshold. The architecture implementing the algorithm is described in detail and mapped on a real current-generation FPGA (Xilinx Virtex II). Its effectiveness is then tested on a channel equalization problem, where real-time performances are of paramount importance. PMID:18244555
NASA Astrophysics Data System (ADS)
Karthik, Victor U.; Sivasuthan, Sivamayam; Hoole, Samuel Ratnajeevan H.
2014-02-01
The computational algorithms for device synthesis and nondestructive evaluation (NDE) are often the same. In both we have a goal - a particular field configuration yielding the design performance in synthesis or to match exterior measurements in NDE. The geometry of the design or the postulated interior defect is then computed. Several optimization methods are available for this. The most efficient like conjugate gradients are very complex to program for the required derivative information. The least efficient zeroth order algorithms like the genetic algorithm take much computational time but little programming effort. This paper reports launching a Genetic Algorithm kernel on thousands of compute unified device architecture (CUDA) threads exploiting the NVIDIA graphics processing unit (GPU) architecture. The efficiency of parallelization, although below that on shared memory supercomputer architectures, is quite effective in cutting down solution time into the realm of the practicable. We carry this further into multi-physics electro-heat problems where the parameters of description are in the electrical problem and the object function in the thermal problem. Indeed, this is where the derivative of the object function in the heat problem with respect to the parameters in the electrical problem is the most difficult to compute for gradient methods, and where the genetic algorithm is most easily implemented.
Implementation and evaluation of two helical CT reconstruction algorithms in CIVA
NASA Astrophysics Data System (ADS)
Banjak, H.; Costin, M.; Vienne, C.; Kaftandjian, V.
2016-02-01
The large majority of industrial CT systems reconstruct the 3D volume by using an acquisition on a circular trajec-tory. However, when inspecting long objects which are highly anisotropic, this scanning geometry creates severe artifacts in the reconstruction. For this reason, the use of an advanced CT scanning method like helical data acquisition is an efficient way to address this aspect known as the long-object problem. Recently, several analytically exact and quasi-exact inversion formulas for helical cone-beam reconstruction have been proposed. Among them, we identified two algorithms of interest for our case. These algorithms are exact and of filtered back-projection structure. In this work we implemented the filtered-backprojection (FBP) and backprojection-filtration (BPF) algorithms of Zou and Pan (2004). For performance evaluation, we present a numerical compari-son of the two selected algorithms with the helical FDK algorithm using both complete (noiseless and noisy) and truncated data generated by CIVA (the simulation platform for non-destructive testing techniques developed at CEA).
An implementation of the Gillespie algorithm for RNA kinetics with logarithmic time update
Dykeman, Eric C.
2015-01-01
In this paper I outline a fast method called KFOLD for implementing the Gillepie algorithm to stochastically sample the folding kinetics of an RNA molecule at single base-pair resolution. In the same fashion as the KINFOLD algorithm, which also uses the Gillespie algorithm to predict folding kinetics, KFOLD stochastically chooses a new RNA secondary structure state that is accessible from the current state by a single base-pair addition/deletion following the Gillespie procedure. However, unlike KINFOLD, the KFOLD algorithm utilizes the fact that many of the base-pair addition/deletion reactions and their corresponding rates do not change between each step in the algorithm. This allows KFOLD to achieve a substantial speed-up in the time required to compute a prediction of the folding pathway and, for a fixed number of base-pair moves, performs logarithmically with sequence size. This increase in speed opens up the possibility of studying the kinetics of much longer RNA sequences at single base-pair resolution while also allowing for the RNA folding statistics of smaller RNA sequences to be computed much more quickly. PMID:25990741
NASA Astrophysics Data System (ADS)
Liu, Qitao; Li, Yingchun; Sun, Huayan; Zhao, Yanzhong
2008-03-01
Laser active imaging system, which is of high resolution, anti-jamming and can be three-dimensional (3-D) imaging, has been used widely. But its imagery is usually affected by speckle noise which makes the grayscale of pixels change violently, hides the subtle details and makes the imaging resolution descend greatly. Removing speckle noise is one of the most difficult problems encountered in this system because of the poor statistical property of speckle. Based on the analysis of the statistical characteristic of speckle and morphological filtering algorithm, in this paper, an improved multistage morphological filtering algorithm is studied and implemented on TMS320C6416 DSP. The algorithm makes the morphological open-close and close-open transformation by using two different linear structure elements respectively, and then takes a weighted average over the above transformational results. The weighted coefficients are decided by the statistical characteristic of speckle. This algorithm is implemented on the TMS320C6416 DSPs after simulation on computer. The procedure of software design is fully presented. The methods are fully illustrated to achieve and optimize the algorithm in the research of the structural characteristic of TMS320C6416 DSP and feature of the algorithm. In order to fully benefit from such devices and increase the performance of the whole system, it is necessary to take a series of steps to optimize the DSP programs. This paper introduces some effective methods, including refining code structure, eliminating memory dependence, optimizing assembly code via linear assembly and so on, for TMS320C6x C language optimization and then offers the results of the application in a real-time implementation. The results of processing to the images blurred by speckle noise shows that the algorithm can not only effectively suppress speckle noise but also preserve the geometrical features of images. The results of the optimized code running on the DSP platform
Ariyawansa, K.A.
1991-04-01
A benchmark parallel implementation of the Van Slyke and Wets algorithm for stochastic linear programs, and the results of a carefully designed numerical experiment on the Sequent/Balance using the implementation are presented. An important use of this implementation is as a benchmark to assess the performance of approximation algorithms for stochastic linear programs. These approximation algorithms are best suited for implementation on parallel vector processes like the Alliant FX/8. Therefore, the performance of the benchmark implementation on the Alliant FX/8 is of interest. In this paper, we present results observed when a portion of the numerical experiment is performed on the Alliant FX/8. These results indicate that the implementation makes satisfactory use of the concurrency capabilities of the Alliant FX/8. They also indicate that the vectorization capabilities of the Alliant FX/8 are not satisfactorily utilized by the implementation. 9 refs., 9 tabs.
An implementation of differential search algorithm (DSA) for inversion of surface wave data
NASA Astrophysics Data System (ADS)
Song, Xianhai; Li, Lei; Zhang, Xueqiang; Shi, Xinchun; Huang, Jianquan; Cai, Jianchao; Jin, Si; Ding, Jianping
2014-12-01
Surface wave dispersion analysis is widely used in geophysics to infer near-surface shear (S)-wave velocity profiles for a wide variety of applications. However, inversion of surface wave data is challenging for most local-search methods due to its high nonlinearity and to its multimodality. In this work, we proposed and implemented a new Rayleigh wave dispersion curve inversion scheme based on differential search algorithm (DSA), one of recently developed swarm intelligence-based algorithms. DSA is inspired from seasonal migration behavior of species of the living beings throughout the year for solving highly nonlinear, multivariable, and multimodal optimization problems. The proposed inverse procedure is applied to nonlinear inversion of fundamental-mode Rayleigh wave dispersion curves for near-surface S-wave velocity profiles. To evaluate calculation efficiency and stability of DSA, four noise-free and four noisy synthetic data sets are firstly inverted. Then, the performance of DSA is compared with that of genetic algorithms (GA) by two noise-free synthetic data sets. Finally, a real-world example from a waste disposal site in NE Italy is inverted to examine the applicability and robustness of the proposed approach on surface wave data. Furthermore, the performance of DSA is compared against that of GA by real data to further evaluate scores of the inverse procedure described here. Simulation results from both synthetic and actual field data demonstrate that differential search algorithm (DSA) applied to nonlinear inversion of surface wave data should be considered good not only in terms of the accuracy but also in terms of the convergence speed. The great advantages of DSA are that the algorithm is simple, robust and easy to implement. Also there are fewer control parameters to tune.
Realization of a scalable coherent quantum Fourier transform
NASA Astrophysics Data System (ADS)
Debnath, Shantanu; Linke, Norbert; Figgatt, Caroline; Landsman, Kevin; Wright, Ken; Monroe, Chris
2016-05-01
The exponential speed-up in some quantum algorithms is a direct result of parallel function-evaluation paths that interfere through a quantum Fourier transform (QFT). We report the implementation of a fully coherent QFT on five trapped Yb+ atomic qubits using sequences of fundamental quantum logic gates. These modular gates can be used to program arbitrary sequences nearly independent of system size and distance between qubits. We use this capability to first perform a Deutsch-Jozsa algorithm where several instances of three-qubit balanced and constant functions are implemented and then examined using single qubit QFTs. Secondly, we apply a fully coherent five-qubit QFT as a part of a quantum phase estimation protocol. Here, the QFT operates on a five-qubit superposition state with a particular phase modulation of its coefficients and directly produces the corresponding phase to five-bit precision. Finally, we examine the performance of the QFT in the period finding problem in the context of Shor's factorization algorithm. This work is supported by the ARO with funding from the IARPA MQCO program and the AFOSR MURI on Quantum Measurement and Verification.
FPGA implementation of the hyperspectral Lossy Compression for Exomars (LCE) algorithm
NASA Astrophysics Data System (ADS)
García, Aday; Santos, L.; López, S.; Callicó, G. M.; López, J. F.; Sarmiento, R.
2014-10-01
The increase of data rates and data volumes in present remote sensing payload instruments, together with the restrictions imposed in the downlink connection requirements, represent at the same time a challenge and a must in the field of data and image compression. This is especially true for the case of hyperspectral images, in which both, reduction of spatial and spectral redundancy is mandatory. Recently the Consultative Committee for Space Data Systems (CCSDS) published the Lossless Multispectral and Hyperespectral Image Compression recommendation (CCSDS 123), a prediction-based technique resulted from the consensus of its members. Although this standard offers a good trade-off between coding performance and computational complexity, the appearance of future hyperspectral and ultraspectral sensors with vast amount of data imposes further efforts from the scientific community to ensure optimal transmission to ground stations based on greater compression rates. Furthermore, hardware implementations with specific features to deal with solar radiation problems play an important role in order to achieve real time applications. In this scenario, the Lossy Compression for Exomars (LCE) algorithm emerges as a good candidate to achieve these characteristics. Its good quality/compression ratio together with its low complexity facilitates the implementation in hardware platforms such as FPGAs or ASICs. In this work the authors present the implementation of the LCE algorithm into an antifuse-based FPGA and the optimizations carried out to obtain the RTL description code using CatapultC, a High Level Synthesis (HLS) Tool. Experimental results show an area occupancy of 75% in an RTAX2000 FPGA from Microsemi, with an operating frequency of 18 MHz. Additionally, the power budget obtained is presented giving an idea of the suitability of the proposed algorithm implementation for onboard compression applications.
Software for implementing trigger algorithms on the upgraded CMS Global Trigger System
NASA Astrophysics Data System (ADS)
Matsushita, Takashi; Arnold, Bernhard
2015-12-01
The Global Trigger is the final step of the CMS Level-1 Trigger and implements a trigger menu, a set of selection requirements applied to the final list of trigger objects. The conditions for trigger object selection, with possible topological requirements on multiobject triggers, are combined by simple combinatorial logic to form the algorithms. The LHC has resumed its operation in 2015, the collision-energy will be increased to 13 TeV with the luminosity expected to go up to 2x1034 cm-2s-1. The CMS Level-1 trigger system will be upgraded to improve its performance for selecting interesting physics events and to operate within the predefined data-acquisition rate in the challenging environment expected at LHC Run 2. The Global Trigger will be re-implemented on modern FPGAs on an Advanced Mezzanine Card in MicroTCA crate. The upgraded system will benefit from the ability to process complex algorithms with DSP slices and increased processing resources with optical links running at 10 Gbit/s, enabling more algorithms at a time than previously possible and allowing CMS to be more flexible in how it handles the trigger bandwidth. In order to handle the increased complexity of the trigger menu implemented on the upgraded Global Trigger, a set of new software has been developed. The software allows a physicist to define a menu with analysis-like triggers using intuitive user interface. The menu is then realised on FPGAs with further software processing, instantiating predefined firmware blocks. The design and implementation of the software for preparing a menu for the upgraded CMS Global Trigger system are presented.
An implementation of differential evolution algorithm for inversion of geoelectrical data
NASA Astrophysics Data System (ADS)
Balkaya, Çağlayan
2013-11-01
Differential evolution (DE), a population-based evolutionary algorithm (EA) has been implemented to invert self-potential (SP) and vertical electrical sounding (VES) data sets. The algorithm uses three operators including mutation, crossover and selection similar to genetic algorithm (GA). Mutation is the most important operator for the success of DE. Three commonly used mutation strategies including DE/best/1 (strategy 1), DE/rand/1 (strategy 2) and DE/rand-to-best/1 (strategy 3) were applied together with a binomial type crossover. Evolution cycle of DE was realized without boundary constraints. For the test studies performed with SP data, in addition to both noise-free and noisy synthetic data sets two field data sets observed over the sulfide ore body in the Malachite mine (Colorado) and over the ore bodies in the Neem-Ka Thana cooper belt (India) were considered. VES test studies were carried out using synthetically produced resistivity data representing a three-layered earth model and a field data set example from Gökçeada (Turkey), which displays a seawater infiltration problem. Mutation strategies mentioned above were also extensively tested on both synthetic and field data sets in consideration. Of these, strategy 1 was found to be the most effective strategy for the parameter estimation by providing less computational cost together with a good accuracy. The solutions obtained by DE for the synthetic cases of SP were quite consistent with particle swarm optimization (PSO) which is a more widely used population-based optimization algorithm than DE in geophysics. Estimated parameters of SP and VES data were also compared with those obtained from Metropolis-Hastings (M-H) sampling algorithm based on simulated annealing (SA) without cooling to clarify uncertainties in the solutions. Comparison to the M-H algorithm shows that DE performs a fast approximate posterior sampling for the case of low-dimensional inverse geophysical problems.
Implementation of Complex Signal Processing Algorithms for Position-Sensitive Microcalorimeters
NASA Technical Reports Server (NTRS)
Smith, Stephen J.
2008-01-01
We have recently reported on a theoretical digital signal-processing algorithm for improved energy and position resolution in position-sensitive, transition-edge sensor (POST) X-ray detectors [Smith et al., Nucl, lnstr and Meth. A 556 (2006) 2371. PoST's consists of one or more transition-edge sensors (TES's) on a large continuous or pixellated X-ray absorber and are under development as an alternative to arrays of single pixel TES's. PoST's provide a means to increase the field-of-view for the fewest number of read-out channels. In this contribution we extend the theoretical correlated energy position optimal filter (CEPOF) algorithm (originally developed for 2-TES continuous absorber PoST's) to investigate the practical implementation on multi-pixel single TES PoST's or Hydras. We use numerically simulated data for a nine absorber device, which includes realistic detector noise, to demonstrate an iterative scheme that enables convergence on the correct photon absorption position and energy without any a priori assumptions. The position sensitivity of the CEPOF implemented on simulated data agrees very well with the theoretically predicted resolution. We discuss practical issues such as the impact of random arrival phase of the measured data on the performance of the CEPOF. The CEPOF algorithm demonstrates that full-width-at- half-maximum energy resolution of < 8 eV coupled with position-sensitivity down to a few 100 eV should be achievable for a fully optimized device.
Implementation of parallel computational algorithms on a modified CORDIC arithmetic logic
Naseem, A.
1984-01-01
CORDIC (COordinate Rotation Digital Computer) is a powerful technique for evaluating trigonometric, hyperbolic, exponential and logarithmic functions and for performing a variety of plane coordinate transformations. Furthermore, the algorithm is also suitable for other computations such as multiplication, division, and the conversion between binary and mixed-radix number systems. The basis for the algorithm is coordinate rotation in a linear, circular, or hyperbolic coordinate system depending on which function is to be calculated. The algorithm involves iterative procedures that require only additions, shift operations, and recall of prestored constants. However, the iterative nature of the algorithm dictates hardware implementations that are highly sequential in nature, resulting in slow speed of processing. The growing need for processing at high speed has resulted in a constant push for the development of faster computing structures balanced against the constraint to minimize computational complexity. With the advent of VLSI, many processing elements can now be realized on a single chip, and a large collection of processors have therefore become economically feasible. So, with this possibility in mind, the CORDIC iteration equations were modified in order to eliminate their sequential nature and to incorporate more parallelism.
Fujita, Naotaka; Yoshihisa, Tomoki; Tsukamoto, Masahiko
2014-01-01
In recent years, sensors become popular and Home Energy Management System (HEMS) takes an important role in saving energy without decrease in QoL (Quality of Life). Currently, many rule-based HEMSs have been proposed and almost all of them assume “IF-THEN” rules. The Rete algorithm is a typical pattern matching algorithm for IF-THEN rules. Currently, we have proposed a rule-based Home Energy Management System (HEMS) using the Rete algorithm. In the proposed system, rules for managing energy are processed by smart taps in network, and the loads for processing rules and collecting data are distributed to smart taps. In addition, the number of processes and collecting data are reduced by processing rules based on the Rete algorithm. In this paper, we evaluated the proposed system by simulation. In the simulation environment, rules are processed by a smart tap that relates to the action part of each rule. In addition, we implemented the proposed system as HEMS using smart taps. PMID:25136672
Next Generation Aura-OMI SO2 Retrieval Algorithm: Introduction and Implementation Status
NASA Technical Reports Server (NTRS)
Li, Can; Joiner, Joanna; Krotkov, Nickolay A.; Bhartia, Pawan K.
2014-01-01
We introduce our next generation algorithm to retrieve SO2 using radiance measurements from the Aura Ozone Monitoring Instrument (OMI). We employ a principal component analysis technique to analyze OMI radiance spectral in 310.5-340 nm acquired over regions with no significant SO2. The resulting principal components (PCs) capture radiance variability caused by both physical processes (e.g., Rayleigh and Raman scattering, and ozone absorption) and measurement artifacts, enabling us to account for these various interferences in SO2 retrievals. By fitting these PCs along with SO2 Jacobians calculated with a radiative transfer model to OMI-measured radiance spectra, we directly estimate SO2 vertical column density in one step. As compared with the previous generation operational OMSO2 PBL (Planetary Boundary Layer) SO2 product, our new algorithm greatly reduces unphysical biases and decreases the noise by a factor of two, providing greater sensitivity to anthropogenic emissions. The new algorithm is fast, eliminates the need for instrument-specific radiance correction schemes, and can be easily adapted to other sensors. These attributes make it a promising technique for producing long-term, consistent SO2 records for air quality and climate research. We have operationally implemented this new algorithm on OMI SIPS for producing the new generation standard OMI SO2 products.
Kawakami, Tomoya; Fujita, Naotaka; Yoshihisa, Tomoki; Tsukamoto, Masahiko
2014-01-01
In recent years, sensors become popular and Home Energy Management System (HEMS) takes an important role in saving energy without decrease in QoL (Quality of Life). Currently, many rule-based HEMSs have been proposed and almost all of them assume "IF-THEN" rules. The Rete algorithm is a typical pattern matching algorithm for IF-THEN rules. Currently, we have proposed a rule-based Home Energy Management System (HEMS) using the Rete algorithm. In the proposed system, rules for managing energy are processed by smart taps in network, and the loads for processing rules and collecting data are distributed to smart taps. In addition, the number of processes and collecting data are reduced by processing rules based on the Rete algorithm. In this paper, we evaluated the proposed system by simulation. In the simulation environment, rules are processed by a smart tap that relates to the action part of each rule. In addition, we implemented the proposed system as HEMS using smart taps. PMID:25136672
Supercomputer implementation of finite element algorithms for high speed compressible flows
NASA Technical Reports Server (NTRS)
Thornton, E. A.; Ramakrishnan, R.
1986-01-01
Prediction of compressible flow phenomena using the finite element method is of recent origin and considerable interest. Two shock capturing finite element formulations for high speed compressible flows are described. A Taylor-Galerkin formulation uses a Taylor series expansion in time coupled with a Galerkin weighted residual statement. The Taylor-Galerkin algorithms use explicit artificial dissipation, and the performance of three dissipation models are compared. A Petrov-Galerkin algorithm has as its basis the concepts of streamline upwinding. Vectorization strategies are developed to implement the finite element formulations on the NASA Langley VPS-32. The vectorization scheme results in finite element programs that use vectors of length of the order of the number of nodes or elements. The use of the vectorization procedure speeds up processing rates by over two orders of magnitude. The Taylor-Galerkin and Petrov-Galerkin algorithms are evaluated for 2D inviscid flows on criteria such as solution accuracy, shock resolution, computational speed and storage requirements. The convergence rates for both algorithms are enhanced by local time-stepping schemes. Extension of the vectorization procedure for predicting 2D viscous and 3D inviscid flows are demonstrated. Conclusions are drawn regarding the applicability of the finite element procedures for realistic problems that require hundreds of thousands of nodes.
Movie approximation technique for the implementation of fast bandwidth-smoothing algorithms
NASA Astrophysics Data System (ADS)
Feng, Wu-chi; Lam, Chi C.; Liu, Ming
1997-12-01
Bandwidth smoothing algorithms can effectively reduce the network resource requirements for the delivery of compressed video streams. For stored video, a large number of bandwidth smoothing algorithms have been introduced that are optimal under certain constraints but require access to all the frame size data in order to achieve their optimal properties. This constraint, however, can be both resource and computationally expensive, especially for moderately priced set-top-boxes. In this paper, we introduce a movie approximation technique for the representation of the frame sizes of a video, reducing the complexity of the bandwidth smoothing algorithms and the amount of frame data that must be transmitted prior to the start of playback. Our results show that the proposed approximation technique can accurately approximate the frame data with a small number of piece-wise linear segments without affecting the performance measures that the bandwidth soothing algorithms are attempting to achieve by more than 1%. In addition, we show that implementations of this technique can speed up execution times by 100 to 400 times, allowing the bandwidth plan calculation times to be reduced to tens of milliseconds. Evaluation using a compressed full-length motion-JPEG video is provided.
The mGA1.0: A common LISP implementation of a messy genetic algorithm
NASA Technical Reports Server (NTRS)
Goldberg, David E.; Kerzic, Travis
1990-01-01
Genetic algorithms (GAs) are finding increased application in difficult search, optimization, and machine learning problems in science and engineering. Increasing demands are being placed on algorithm performance, and the remaining challenges of genetic algorithm theory and practice are becoming increasingly unavoidable. Perhaps the most difficult of these challenges is the so-called linkage problem. Messy GAs were created to overcome the linkage problem of simple genetic algorithms by combining variable-length strings, gene expression, messy operators, and a nonhomogeneous phasing of evolutionary processing. Results on a number of difficult deceptive test functions are encouraging with the mGA always finding global optima in a polynomial number of function evaluations. Theoretical and empirical studies are continuing, and a first version of a messy GA is ready for testing by others. A Common LISP implementation called mGA1.0 is documented and related to the basic principles and operators developed by Goldberg et. al. (1989, 1990). Although the code was prepared with care, it is not a general-purpose code, only a research version. Important data structures and global variations are described. Thereafter brief function descriptions are given, and sample input data are presented together with sample program output. A source listing with comments is also included.
SU-E-T-67: Clinical Implementation and Evaluation of the Acuros Dose Calculation Algorithm
Yan, C; Combine, T; Dickens, K; Wynn, R; Pavord, D; Huq, M
2014-06-01
Purpose: The main aim of the current study is to present a detailed description of the implementation of the Acuros XB Dose Calculation Algorithm, and subsequently evaluate its clinical impacts by comparing it with AAA algorithm. Methods: The source models for both Acuros XB and AAA were configured by importing the same measured beam data into Eclipse treatment planning system. Both algorithms were evaluated by comparing calculated dose with measured dose on a homogeneous water phantom for field sizes ranging from 6cm × 6cm to 40cm × 40cm. Central axis and off-axis points with different depths were chosen for the comparison. Similarly, wedge fields with wedge angles from 15 to 60 degree were used. In addition, variable field sizes for a heterogeneous phantom were used to evaluate the Acuros algorithm. Finally, both Acuros and AAA were tested on VMAT patient plans for various sites. Does distributions and calculation time were compared. Results: On average, computation time is reduced by at least 50% by Acuros XB compared with AAA on single fields and VMAT plans. When used for open 6MV photon beams on homogeneous water phantom, both Acuros XB and AAA calculated doses were within 1% of measurement. For 23 MV photon beams, the calculated doses were within 1.5% of measured doses for Acuros XB and 2% for AAA. When heterogeneous phantom was used, Acuros XB also improved on accuracy. Conclusion: Compared with AAA, Acuros XB can improve accuracy while significantly reduce computation time for VMAT plans.
A Survey on GPU-Based Implementation of Swarm Intelligence Algorithms.
Tan, Ying; Ding, Ke
2016-09-01
Inspired by the collective behavior of natural swarm, swarm intelligence algorithms (SIAs) have been developed and widely used for solving optimization problems. When applied to complex problems, a large number of fitness function evaluations are needed to obtain an acceptable solution. To tackle this vital issue, graphical processing units (GPUs) have been used to accelerate the optimization procedure of SIAs. Thanks to their inherent parallelism, SIAs are very suitable for parallel implementation under the GPU platform which have achieved a great success in recent years. This paper presents a comprehensive review of GPU-based parallel SIAs in accordance with a newly proposed taxonomy. Critical concerns for the efficient parallel implementation of SIAs are also described in detail. Moreover, novel criteria are also proposed to evaluate and compare the parallel implementation and algorithm performance universally. The rationality and practicability of the proposed optimization methodology and criteria are verified by careful case study. Finally, our opinions and perspectives on the trends and prospects on the relatively new research domain are also presented for future development. PMID:26571543
Implementation of a cone-beam backprojection algorithm on the cell broadband engine processor
NASA Astrophysics Data System (ADS)
Bockenbach, Olivier; Knaup, Michael; Kachelrieß, Marc
2007-03-01
Tomographic image reconstruction is computationally very demanding. In all cases the backprojection represents the performance bottleneck due to the high operational count and due to the high demand put on the memory subsystem. In the past, solving this problem has lead to the implementation of specific architectures, connecting Application Specific Integrated Circuits (ASICs) or Field Programmable Gate Arrays (FPGAs) to memory through dedicated high speed busses. More recently, there have also been attempt to use Graphic Processing Units (GPUs) to perform the backprojection step. Originally aimed at the gaming market, IBM, Toshiba and Sony have introduced the Cell Broadband Engine (CBE) processor, often considered as a multicomputer on a chip. Clocked at 3 GHz, the Cell allows for a theoretical performance of 192 GFlops and a peak data transfer rate over the internal bus of 200 GB/s. This performance indeed makes the Cell a very attractive architecture for implementing tomographic image reconstruction algorithms. In this study, we investigate the relative performance of a perspective backprojection algorithm when implemented on a standard PC and on the Cell processor. We compare these results to the performance achievable with FPGAs based boards and high end GPUs. The cone-beam backprojection performance was assessed by backprojecting a full circle scan of 512 projections of 1024x1024 pixels into a volume of size 512x512x512 voxels. It took 3.2 minutes on the PC (single CPU) and is as fast as 13.6 seconds on the Cell.
Zhou, Ning; Huang, Zhenyu; Tuffner, Francis K.; Jin, Shuangshuang; Lin, Jenglung; Hauer, Matthew L.
2010-02-28
Small signal stability problems are one of the major threats to grid stability and reliability. Prony analysis has been successfully applied on ringdown data to monitor electromechanical modes of a power system using phasor measurement unit (PMU) data. To facilitate an on-line application of mode estimation, this paper develops a recursive algorithm for implementing Prony analysis and proposed an oscillation detection method to detect ringdown data in real time. By automatically detecting ringdown data, the proposed method helps guarantee that Prony analysis is applied properly and timely on the ringdown data. Thus, the mode estimation results can be performed reliably and timely. The proposed method is tested using Monte Carlo simulations based on a 17-machine model and is shown to be able to properly identify the oscillation data for on-line application of Prony analysis. In addition, the proposed method is applied to field measurement data from WECC to show the performance of the proposed algorithm.
NASA Astrophysics Data System (ADS)
Bhole, Gaurav; Anjusha, V. S.; Mahesh, T. S.
2016-04-01
A robust control over quantum dynamics is of paramount importance for quantum technologies. Many of the existing control techniques are based on smooth Hamiltonian modulations involving repeated calculations of basic unitaries resulting in time complexities scaling rapidly with the length of the control sequence. Here we show that bang-bang controls need one-time calculation of basic unitaries and hence scale much more efficiently. By employing a global optimization routine such as the genetic algorithm, it is possible to synthesize not only highly intricate unitaries, but also certain nonunitary operations. We demonstrate the unitary control through the implementation of the optimal fixed-point quantum search algorithm in a three-qubit nuclear magnetic resonance (NMR) system. Moreover, by combining the bang-bang pulses with the crusher gradients, we also demonstrate nonunitary transformations of thermal equilibrium states into effective pure states in three- as well as five-qubit NMR systems.
Barazzetti, Luigi; Scaioni, Marco
2010-01-01
This paper presents the development and implementation of three image-based methods used to detect and measure the displacements of a vast number of points in the case of laboratory testing on construction materials. Starting from the needs of structural engineers, three ad hoc tools for crack measurement in fibre-reinforced specimens and 2D or 3D deformation analysis through digital images were implemented and tested. These tools make use of advanced image processing algorithms and can integrate or even substitute some traditional sensors employed today in most laboratories. In addition, the automation provided by the implemented software, the limited cost of the instruments and the possibility to operate with an indefinite number of points offer new and more extensive analysis in the field of material testing. Several comparisons with other traditional sensors widely adopted inside most laboratories were carried out in order to demonstrate the accuracy of the implemented software. Implementation details, simulations and real applications are reported and discussed in this paper. PMID:22163612
Morita, Kenji; Jitsev, Jenia; Morrison, Abigail
2016-09-15
Value-based action selection has been suggested to be realized in the corticostriatal local circuits through competition among neural populations. In this article, we review theoretical and experimental studies that have constructed and verified this notion, and provide new perspectives on how the local-circuit selection mechanisms implement reinforcement learning (RL) algorithms and computations beyond them. The striatal neurons are mostly inhibitory, and lateral inhibition among them has been classically proposed to realize "Winner-Take-All (WTA)" selection of the maximum-valued action (i.e., 'max' operation). Although this view has been challenged by the revealed weakness, sparseness, and asymmetry of lateral inhibition, which suggest more complex dynamics, WTA-like competition could still occur on short time scales. Unlike the striatal circuit, the cortical circuit contains recurrent excitation, which may enable retention or temporal integration of information and probabilistic "soft-max" selection. The striatal "max" circuit and the cortical "soft-max" circuit might co-implement an RL algorithm called Q-learning; the cortical circuit might also similarly serve for other algorithms such as SARSA. In these implementations, the cortical circuit presumably sustains activity representing the executed action, which negatively impacts dopamine neurons so that they can calculate reward-prediction-error. Regarding the suggested more complex dynamics of striatal, as well as cortical, circuits on long time scales, which could be viewed as a sequence of short WTA fragments, computational roles remain open: such a sequence might represent (1) sequential state-action-state transitions, constituting replay or simulation of the internal model, (2) a single state/action by the whole trajectory, or (3) probabilistic sampling of state/action. PMID:27173430
AVR microcontroller simulator for software implemented hardware fault tolerance algorithms research
NASA Astrophysics Data System (ADS)
Piotrowski, Adam; Tarnowski, Szymon; Napieralski, Andrzej
2008-01-01
Reliability of new, advanced electronic systems becomes a serious problem especially in places like accelerators and synchrotrons, where sophisticated digital devices operate closely to radiation sources. One of the possible solutions to harden the microprocessor-based system is a strict programming approach known as the Software Implemented Hardware Fault Tolerance. Unfortunately, in real environments it is not possible to perform precise and accurate tests of the new algorithms due to hardware limitation. This paper highlights the AVR-family microcontroller simulator project equipped with an appropriate monitoring and the SEU injection systems.
Implementation of a quantum adiabatic algorithm for factorization on two qudits
Zobov, V. E. Ermilov, A. S.
2012-06-15
Implementation of an adiabatic quantum algorithm for factorization on two qudits with the number of levels d{sub 1} and d{sub 2} is considered. A method is proposed for obtaining a time-dependent effective Hamiltonian by means of a sequence of rotation operators that are selective with respect to the transitions between neighboring levels of a qudit. A sequence of RF magnetic field pulses is obtained, and a factorization of the numbers 35, 21, and 15 is numerically simulated on two quadrupole nuclei with spins 3/2 (d{sub 1} = 4) and 1 (d{sub 2} = 3).
Parallel implementation of the time-evolving block decimation algorithm for the Bose-Hubbard model
NASA Astrophysics Data System (ADS)
Urbanek, Miroslav; Soldán, Pavel
2016-02-01
A system of ultracold atoms in an optical lattice represents a powerful experimental setup for testing the fundamentals of quantum mechanics. While its microscopic interaction mechanisms are well understood, the system behavior for a moderate number of particles is difficult to simulate due to a high dimension of its many-body space. This article presents TEBDOL, a parallel implementation of the time-evolving block decimation (TEBD) algorithm that can efficiently simulate time evolution of a one-dimensional chain of atoms in optical lattices. We investigate the parallelization strategy and the strong and weak scaling with the number of processes.
Implementation of a new iterative learning control algorithm on real data.
Zamanian, Hamed; Koohi, Ardavan
2016-02-01
In this paper, a newly presented approach is proposed for closed-loop automatic tuning of a proportional integral derivative (PID) controller based on iterative learning control (ILC) algorithm. A modified ILC scheme iteratively changes the control signal by adjusting it. Once a satisfactory performance is achieved, a linear compensator is identified in the ILC behavior using casual relationship between the closed loop signals. This compensator is approximated by a PD controller which is used to tune the original PID controller. Results of implementing this approach presented on the experimental data of Damavand tokamak and are consistent with simulation outcome. PMID:26931852
Implementation of a new iterative learning control algorithm on real data
NASA Astrophysics Data System (ADS)
Zamanian, Hamed; Koohi, Ardavan
2016-02-01
In this paper, a newly presented approach is proposed for closed-loop automatic tuning of a proportional integral derivative (PID) controller based on iterative learning control (ILC) algorithm. A modified ILC scheme iteratively changes the control signal by adjusting it. Once a satisfactory performance is achieved, a linear compensator is identified in the ILC behavior using casual relationship between the closed loop signals. This compensator is approximated by a PD controller which is used to tune the original PID controller. Results of implementing this approach presented on the experimental data of Damavand tokamak and are consistent with simulation outcome.
Implementation of QR-decomposition based on CORDIC for unitary MUSIC algorithm
NASA Astrophysics Data System (ADS)
Lounici, Merwan; Luan, Xiaoming; Saadi, Wahab
2013-07-01
The DOA (Direction Of Arrival) estimation with subspace methods such as MUSIC (MUltiple SIgnal Classification) and ESPRIT (Estimation of Signal Parameters via Rotational Invariance Technique) is based on an accurate estimation of the eigenvalues and eigenvectors of covariance matrix. QR decomposition is implemented with the Coordinate Rotation DIgital Computer (CORDIC) algorithm. QRD requires only additions and shifts [6], so it is faster and more regular than other methods. In this article the hardware architecture of an EVD (Eigen Value Decomposition) processor based on TSA (triangular systolic array) for QR decomposition is proposed. Using Xilinx System Generator (XSG), the design is implemented and the estimated logic device resource values are presented for different matrix sizes.
NASA Technical Reports Server (NTRS)
Doxley, Charles A.
2016-01-01
In the current world of applications that use reconfigurable technology implemented on field programmable gate arrays (FPGAs), there is a need for flexible architectures that can grow as the systems evolve. A project has limited resources and a fixed set of requirements that development efforts are tasked to meet. Designers must develop robust solutions that practically meet the current customer demands and also have the ability to grow for future performance. This paper describes the development of a high speed serial data streaming algorithm that allows for transmission of multiple data channels over a single serial link. The technique has the ability to change to meet new applications developed for future design considerations. This approach uses the Xilinx Serial RapidIO LOGICORE Solution to implement a flexible infrastructure to meet the current project requirements with the ability to adapt future system designs.
Implementation of a Point Algorithm for Real-Time Convex Optimization
NASA Technical Reports Server (NTRS)
Acikmese, Behcet; Motaghedi, Shui; Carson, John
2007-01-01
The primal-dual interior-point algorithm implemented in G-OPT is a relatively new and efficient way of solving convex optimization problems. Given a prescribed level of accuracy, the convergence to the optimal solution is guaranteed in a predetermined, finite number of iterations. G-OPT Version 1.0 is a flight software implementation written in C. Onboard application of the software enables autonomous, real-time guidance and control that explicitly incorporates mission constraints such as control authority (e.g. maximum thrust limits), hazard avoidance, and fuel limitations. This software can be used in planetary landing missions (Mars pinpoint landing and lunar landing), as well as in proximity operations around small celestial bodies (moons, asteroids, and comets). It also can be used in any spacecraft mission for thrust allocation in six-degrees-of-freedom control.
Multi-GPU implementation of a VMAT treatment plan optimization algorithm
Tian, Zhen E-mail: Xun.Jia@UTSouthwestern.edu Folkerts, Michael; Tan, Jun; Jia, Xun E-mail: Xun.Jia@UTSouthwestern.edu Jiang, Steve B. E-mail: Xun.Jia@UTSouthwestern.edu; Peng, Fei
2015-06-15
Purpose: Volumetric modulated arc therapy (VMAT) optimization is a computationally challenging problem due to its large data size, high degrees of freedom, and many hardware constraints. High-performance graphics processing units (GPUs) have been used to speed up the computations. However, GPU’s relatively small memory size cannot handle cases with a large dose-deposition coefficient (DDC) matrix in cases of, e.g., those with a large target size, multiple targets, multiple arcs, and/or small beamlet size. The main purpose of this paper is to report an implementation of a column-generation-based VMAT algorithm, previously developed in the authors’ group, on a multi-GPU platform to solve the memory limitation problem. While the column-generation-based VMAT algorithm has been previously developed, the GPU implementation details have not been reported. Hence, another purpose is to present detailed techniques employed for GPU implementation. The authors also would like to utilize this particular problem as an example problem to study the feasibility of using a multi-GPU platform to solve large-scale problems in medical physics. Methods: The column-generation approach generates VMAT apertures sequentially by solving a pricing problem (PP) and a master problem (MP) iteratively. In the authors’ method, the sparse DDC matrix is first stored on a CPU in coordinate list format (COO). On the GPU side, this matrix is split into four submatrices according to beam angles, which are stored on four GPUs in compressed sparse row format. Computation of beamlet price, the first step in PP, is accomplished using multi-GPUs. A fast inter-GPU data transfer scheme is accomplished using peer-to-peer access. The remaining steps of PP and MP problems are implemented on CPU or a single GPU due to their modest problem scale and computational loads. Barzilai and Borwein algorithm with a subspace step scheme is adopted here to solve the MP problem. A head and neck (H and N) cancer case is
The Sustainable Technology Division has recently completed an implementation of the U.S. EPA's Waste Reduction (WAR) Algorithm that can be directly accessed from a Cape-Open compliant process modeling environment. The WAR Algorithm add-in can be used in AmsterChem's COFE (Cape-Op...
McNunn, Gabriel S; Bryden, Kenneth M
2013-01-01
Tarjan's algorithm schedules the solution of systems of equations by noting the coupling and grouping between the equations. Simulating complex systems, e.g., advanced power plants, aerodynamic systems, or the multi-scale design of components, requires the linkage of large groups of coupled models. Currently, this is handled manually in systems modeling packages. That is, the analyst explicitly defines both the method and solution sequence necessary to couple the models. In small systems of models and equations this works well. However, as additional detail is needed across systems and across scales, the number of models grows rapidly. This precludes the manual assembly of large systems of federated models, particularly in systems composed of high fidelity models. This paper examines extending Tarjan's algorithm from sets of equations to sets of models. The proposed implementation of the algorithm is demonstrated using a small one-dimensional system of federated models representing the heat transfer and thermal stress in a gas turbine blade with thermal barrier coating. Enabling the rapid assembly and substitution of different models permits the rapid turnaround needed to support the “what-if” kinds of questions that arise in engineering design.
Parallel Implementation and Scaling of an Adaptive Mesh Discrete Ordinates Algorithm for Transport
Howell, L H
2004-11-29
Block-structured adaptive mesh refinement (AMR) uses a mesh structure built up out of locally-uniform rectangular grids. In the BoxLib parallel framework used by the Raptor code, each processor operates on one or more of these grids at each refinement level. The decomposition of the mesh into grids and the distribution of these grids among processors may change every few timesteps as a calculation proceeds. Finer grids use smaller timesteps than coarser grids, requiring additional work to keep the system synchronized and ensure conservation between different refinement levels. In a paper for NECDC 2002 I presented preliminary results on implementation of parallel transport sweeps on the AMR mesh, conjugate gradient acceleration, accuracy of the AMR solution, and scalar speedup of the AMR algorithm compared to a uniform fully-refined mesh. This paper continues with a more in-depth examination of the parallel scaling properties of the scheme, both in single-level and multi-level calculations. Both sweeping and setup costs are considered. The algorithm scales with acceptable performance to several hundred processors. Trends suggest, however, that this is the limit for efficient calculations with traditional transport sweeps, and that modifications to the sweep algorithm will be increasingly needed as job sizes in the thousands of processors become common.
NASA Astrophysics Data System (ADS)
Falivene, Oriol; Cabello, Patricia; Arbués, Pau; Muñoz, Josep Anton; Cabrera, Lluís
2009-08-01
Valid representations of geological heterogeneity are fundamental inputs for quantitative models used in managing subsurface activities. Consequently, the simulation of realistic facies distributions is a significant aim. Realistic facies distributions are typically obtained by pixel-based, object-based or process-based methods. This work presents a pixel-based geostatistical algorithm suitable for reproducing lateral gradual facies transitions (LGFT) between two adjacent sedimentary bodies. Lateral contact (i.e. interfingering) between distinct depositional facies is a widespread geometric relationship that occurs at different scales in any depositional system. The algorithm is based on the truncation of the sum of a linear expectation trend and a random Gaussian field, and can be conditioned to well data. The implementation introduced herein also includes subroutines to clean and geometrically characterize the obtained LGFT. The cleaned sedimentary body transition provides a more appropriate and realistic facies distribution for some depositional settings. The geometric measures of the LGFT yield an intuitive measure of the morphology of the sedimentary body boundary, which can be compared to analogue data. An example of a LGFT obtained by the algorithm presented herein is also flow simulated, quantitatively demonstrating the importance of realistically reproducing them in subsurface models, if further flow-related accurate predictions are to be made.
Santi, Peter Angelo; Cutler, Theresa Elizabeth; Favalli, Andrea; Koehler, Katrina Elizabeth; Henzl, Vladimir; Henzlova, Daniela; Parker, Robert Francis; Croft, Stephen
2015-12-01
In order to improve the accuracy and capabilities of neutron multiplicity counting, additional quantifiable information is needed in order to address the assumptions that are present in the point model. Extracting and utilizing higher order moments (Quads and Pents) from the neutron pulse train represents the most direct way of extracting additional information from the measurement data to allow for an improved determination of the physical properties of the item of interest. The extraction of higher order moments from a neutron pulse train required the development of advanced dead time correction algorithms which could correct for dead time effects in all of the measurement moments in a self-consistent manner. In addition, advanced analysis algorithms have been developed to address specific assumptions that are made within the current analysis model, namely that all neutrons are created at a single point within the item of interest, and that all neutrons that are produced within an item are created with the same energy distribution. This report will discuss the current status of implementation and initial testing of the advanced dead time correction and analysis algorithms that have been developed in an attempt to utilize higher order moments to improve the capabilities of correlated neutron measurement techniques.
Implementation of the FDK algorithm for cone-beam CT on the cell broadband engine architecture
NASA Astrophysics Data System (ADS)
Scherl, Holger; Koerner, Mario; Hofmann, Hannes; Eckert, Wieland; Kowarschik, Markus; Hornegger, Joachim
2007-03-01
In most of today's commercially available cone-beam CT scanners, the well known FDK method is used for solving the 3D reconstruction task. The computational complexity of this algorithm prohibits its use for many medical applications without hardware acceleration. The brand-new Cell Broadband Engine Architecture (CBEA) with its high level of parallelism is a cost-efficient processor for performing the FDK reconstruction according to the medical requirements. The programming scheme, however, is quite different to any standard personal computer hardware. In this paper, we present an innovative implementation of the most time-consuming parts of the FDK algorithm: filtering and back-projection. We also explain the required transformations to parallelize the algorithm for the CBEA. Our software framework allows to compute the filtering and back-projection in parallel, making it possible to do an on-the-fly-reconstruction. The achieved results demonstrate that a complete FDK reconstruction is computed with the CBEA in less than seven seconds for a standard clinical scenario. Given the fact that scan times are usually much higher, we conclude that reconstruction is finished right after the end of data acquisition. This enables us to present the reconstructed volume to the physician in real-time, immediately after the last projection image has been acquired by the scanning device.
Implementing Quantum Algorithms with Modular Gates in a Trapped Ion Chain
NASA Astrophysics Data System (ADS)
Figgatt, Caroline; Debnath, Shantanu; Linke, Norbert; Landsman, Kevin; Wright, Ken; Monroe, Chris
2016-05-01
We present experimental results on quantum algorithms performed using fully modular one- and two-qubit gates in a linear chain of 5 Yb + ions. This is accomplished through arbitrary qubit addressing and manipulation from stimulated Raman transitions driven by a beat note between counter-propagating beams from a pulsed laser. The Raman beam pairs consist of one global beam and a set of counter-propagating individual addressing beams, one for each ion. This provides arbitrary single-qubit rotations as well as arbitrary selection of ion pairs for a fully-connected system of two-qubit modular XX-entangling gates implemented using a pulse-segmentation scheme. We execute controlled-NOT gates with an average fidelity of 97.0% for all 10 possible pairs. Programming arbitrary sequences of gates allows us to construct any quantum algorithm, making this system a universal quantum computer. As an example, we present experimental results for the Bernstein-Vazirani algorithm using 4 control qubits and 1 ancilla, performed with concatenated gates that can be reconfigured to construct all 16 possible oracles, and obtain a process fidelity of 90.3%. This work is supported by the ARO with funding from the IARPA MQCO program and the AFOSR MURI on Quantum Measurement and Verification.
A Dynamic Era-Based Time-Symmetric Block Time-Step Algorithm with Parallel Implementations
NASA Astrophysics Data System (ADS)
Kaplan, Murat; Saygin, Hasan
2012-06-01
The time-symmetric block time-step (TSBTS) algorithm is a newly developed efficient scheme for N-body integrations. It is constructed on an era-based iteration. In this work, we re-designed the TSBTS integration scheme with a dynamically changing era size. A number of numerical tests were performed to show the importance of choosing the size of the era, especially for long-time integrations. Our second aim was to show that the TSBTS scheme is as suitable as previously known schemes for developing parallel N-body codes. In this work, we relied on a parallel scheme using the copy algorithm for the time-symmetric scheme. We implemented a hybrid of data and task parallelization for force calculation to handle load balancing problems that can appear in practice. Using the Plummer model initial conditions for different numbers of particles, we obtained the expected efficiency and speedup for a small number of particles. Although parallelization of the direct N-body codes is negatively affected by the communication/calculation ratios, we obtained good load-balanced results. Moreover, we were able to conserve the advantages of the algorithm (e.g., energy conservation for long-term simulations).
NASA Astrophysics Data System (ADS)
Sham Kumar, S.; C. V., Sreehari; Vivek, K.; T. V., Praveen; Moosad, K. P. B.; Rajesh, R.
2015-06-01
This paper discusses the detailed experimental investigations on the performance of interferometer based fiber optic hydrophones with different Phase Generated Carrier (PGC) demodulation algorithms for their interrogation. The study covers the effect on different parametric variations in the PGC implementations by comparison through Signal to Noise And Distortion (SINAD) and Total Harmonic Distortion (THD) analysis. This paper discusses experiments on most popular algorithms based on PGC like Arctangent, Differential Cross Multiplication (DCM) and Ameliorated PGC. A Distributed Feed-Back Fiber Lasers (DFB-FL) based fiber optic hydrophone, with Mach-Zehnder Interferometer having active phase modulator in reference arm and mechanism to cater polarization related intensity fading were used for the experiments. Experiments were carried out to study the effects of various parameters like the type and configuration of low pass filter, frequency of the modulation signal, frequency of acoustic signal etc. It is observed that all the three factors viz. the type of low pass filter, frequency of modulating and acoustic signal plays important role in retrieving the acoustic signal, based on the type of algorithms used and are discussed here.
Pre-Hardware Optimization and Implementation Of Fast Optics Closed Control Loop Algorithms
NASA Technical Reports Server (NTRS)
Kizhner, Semion; Lyon, Richard G.; Herman, Jay R.; Abuhassan, Nader
2004-01-01
One of the main heritage tools used in scientific and engineering data spectrum analysis is the Fourier Integral Transform and its high performance digital equivalent - the Fast Fourier Transform (FFT). The FFT is particularly useful in two-dimensional (2-D) image processing (FFT2) within optical systems control. However, timing constraints of a fast optics closed control loop would require a supercomputer to run the software implementation of the FFT2 and its inverse, as well as other image processing representative algorithm, such as numerical image folding and fringe feature extraction. A laboratory supercomputer is not always available even for ground operations and is not feasible for a night project. However, the computationally intensive algorithms still warrant alternative implementation using reconfigurable computing technologies (RC) such as Digital Signal Processors (DSP) and Field Programmable Gate Arrays (FPGA), which provide low cost compact super-computing capabilities. We present a new RC hardware implementation and utilization architecture that significantly reduces the computational complexity of a few basic image-processing algorithm, such as FFT2, image folding and phase diversity for the NASA Solar Viewing Interferometer Prototype (SVIP) using a cluster of DSPs and FPGAs. The DSP cluster utilization architecture also assures avoidance of a single point of failure, while using commercially available hardware. This, combined with the control algorithms pre-hardware optimization, or the first time allows construction of image-based 800 Hertz (Hz) optics closed control loops on-board a spacecraft, based on the SVIP ground instrument. That spacecraft is the proposed Earth Atmosphere Solar Occultation Imager (EASI) to study greenhouse gases CO2, C2H, H2O, O3, O2, N2O from Lagrange-2 point in space. This paper provides an advanced insight into a new type of science capabilities for future space exploration missions based on on-board image processing
NASA Technical Reports Server (NTRS)
Kizhner, Semion; Flatley, Thomas P.; Hestnes, Phyllis; Jentoft-Nilsen, Marit; Petrick, David J.; Day, John H. (Technical Monitor)
2001-01-01
Spacecraft telemetry rates have steadily increased over the last decade presenting a problem for real-time processing by ground facilities. This paper proposes a solution to a related problem for the Geostationary Operational Environmental Spacecraft (GOES-8) image processing application. Although large super-computer facilities are the obvious heritage solution, they are very costly, making it imperative to seek a feasible alternative engineering solution at a fraction of the cost. The solution is based on a Personal Computer (PC) platform and synergy of optimized software algorithms and re-configurable computing hardware technologies, such as Field Programmable Gate Arrays (FPGA) and Digital Signal Processing (DSP). It has been shown in [1] and [2] that this configuration can provide superior inexpensive performance for a chosen application on the ground station or on-board a spacecraft. However, since this technology is still maturing, intensive pre-hardware steps are necessary to achieve the benefits of hardware implementation. This paper describes these steps for the GOES-8 application, a software project developed using Interactive Data Language (IDL) (Trademark of Research Systems, Inc.) on a Workstation/UNIX platform. The solution involves converting the application to a PC/Windows/RC platform, selected mainly by the availability of low cost, adaptable high-speed RC hardware. In order for the hybrid system to run, the IDL software was modified to account for platform differences. It was interesting to examine the gains and losses in performance on the new platform, as well as unexpected observations before implementing hardware. After substantial pre-hardware optimization steps, the necessity of hardware implementation for bottleneck code in the PC environment became evident and solvable beginning with the methodology described in [1], [2], and implementing a novel methodology for this specific application [6]. The PC-RC interface bandwidth problem for the
A time-efficient algorithm for implementing the Catmull-Clark subdivision method
NASA Astrophysics Data System (ADS)
Ioannou, G.; Savva, A.; Stylianou, V.
2015-10-01
Splines are the most popular methods in Figure Modeling and CAGD (Computer Aided Geometric Design) in generating smooth surfaces from a number of control points. The control points define the shape of a figure and splines calculate the required number of points which when displayed on a computer screen the result is a smooth surface. However, spline methods are based on a rectangular topological structure of points, i.e., a two-dimensional table of vertices, and thus cannot generate complex figures, such as the human and animal bodies that their complex structure does not allow them to be defined by a regular rectangular grid. On the other hand surface subdivision methods, which are derived by splines, generate surfaces which are defined by an arbitrary topology of control points. This is the reason that during the last fifteen years subdivision methods have taken the lead over regular spline methods in all areas of modeling in both industry and research. The cost of executing computer software developed to read control points and calculate the surface is run-time, due to the fact that the surface-structure required for handling arbitrary topological grids is very complicate. There are many software programs that have been developed related to the implementation of subdivision surfaces however, not many algorithms are documented in the literature, to support developers for writing efficient code. This paper aims to assist programmers by presenting a time-efficient algorithm for implementing subdivision splines. The Catmull-Clark which is the most popular of the subdivision methods has been employed to illustrate the algorithm.
NASA Astrophysics Data System (ADS)
Bernabe, Sergio; Igual, Francisco D.; Botella, Guillermo; Prieto-Matias, Manuel; Plaza, Antonio
2015-10-01
In the last decade, the issue of endmember variability has received considerable attention, particularly when each pixel is modeled as a linear combination of endmembers or pure materials. As a result, several models and algorithms have been developed for considering the effect of endmember variability in spectral unmixing and possibly include multiple endmembers in the spectral unmixing stage. One of the most popular approach for this purpose is the multiple endmember spectral mixture analysis (MESMA) algorithm. The procedure executed by MESMA can be summarized as follows: (i) First, a standard linear spectral unmixing (LSU) or fully constrained linear spectral unmixing (FCLSU) algorithm is run in an iterative fashion; (ii) Then, we use different endmember combinations, randomly selected from a spectral library, to decompose each mixed pixel; (iii) Finally, the model with the best fit, i.e., with the lowest root mean square error (RMSE) in the reconstruction of the original pixel, is adopted. However, this procedure can be computationally very expensive due to the fact that several endmember combinations need to be tested and several abundance estimation steps need to be conducted, a fact that compromises the use of MESMA in applications under real-time constraints. In this paper we develop (for the first time in the literature) an efficient implementation of MESMA on different platforms using OpenCL, an open standard for parallel programing on heterogeneous systems. Our experiments have been conducted using a simulated data set and the clMAGMA mathematical library. This kind of implementations with the same descriptive language on different architectures are very important in order to actually calibrate the possibility of using heterogeneous platforms for efficient hyperspectral imaging processing in real remote sensing missions.
NASA Astrophysics Data System (ADS)
Paz, Abel; Plaza, Antonio
2010-08-01
Automatic target and anomaly detection are considered very important tasks for hyperspectral data exploitation. These techniques are now routinely applied in many application domains, including defence and intelligence, public safety, precision agriculture, geology, or forestry. Many of these applications require timely responses for swift decisions which depend upon high computing performance of algorithm analysis. However, with the recent explosion in the amount and dimensionality of hyperspectral imagery, this problem calls for the incorporation of parallel computing techniques. In the past, clusters of computers have offered an attractive solution for fast anomaly and target detection in hyperspectral data sets already transmitted to Earth. However, these systems are expensive and difficult to adapt to on-board data processing scenarios, in which low-weight and low-power integrated components are essential to reduce mission payload and obtain analysis results in (near) real-time, i.e., at the same time as the data is collected by the sensor. An exciting new development in the field of commodity computing is the emergence of commodity graphics processing units (GPUs), which can now bridge the gap towards on-board processing of remotely sensed hyperspectral data. In this paper, we describe several new GPU-based implementations of target and anomaly detection algorithms for hyperspectral data exploitation. The parallel algorithms are implemented on latest-generation Tesla C1060 GPU architectures, and quantitatively evaluated using hyperspectral data collected by NASA's AVIRIS system over the World Trade Center (WTC) in New York, five days after the terrorist attacks that collapsed the two main towers in the WTC complex.
2014-01-01
Background The huge quantity of data produced in Biomedical research needs sophisticated algorithmic methodologies for its storage, analysis, and processing. High Performance Computing (HPC) appears as a magic bullet in this challenge. However, several hard to solve parallelization and load balancing problems arise in this context. Here we discuss the HPC-oriented implementation of a general purpose learning algorithm, originally conceived for DNA analysis and recently extended to treat uncertainty on data (U-BRAIN). The U-BRAIN algorithm is a learning algorithm that finds a Boolean formula in disjunctive normal form (DNF), of approximately minimum complexity, that is consistent with a set of data (instances) which may have missing bits. The conjunctive terms of the formula are computed in an iterative way by identifying, from the given data, a family of sets of conditions that must be satisfied by all the positive instances and violated by all the negative ones; such conditions allow the computation of a set of coefficients (relevances) for each attribute (literal), that form a probability distribution, allowing the selection of the term literals. The great versatility that characterizes it, makes U-BRAIN applicable in many of the fields in which there are data to be analyzed. However the memory and the execution time required by the running are of O(n3) and of O(n5) order, respectively, and so, the algorithm is unaffordable for huge data sets. Results We find mathematical and programming solutions able to lead us towards the implementation of the algorithm U-BRAIN on parallel computers. First we give a Dynamic Programming model of the U-BRAIN algorithm, then we minimize the representation of the relevances. When the data are of great size we are forced to use the mass memory, and depending on where the data are actually stored, the access times can be quite different. According to the evaluation of algorithmic efficiency based on the Disk Model, in order to
Dongarra, J.J.; Hewitt, T.
1985-08-01
This note describes some experiments on simple, dense linear algebra algorithms. These experiments show that the CRAY X-MP is capable of small-grain multitasking arising from standard implementations of LU and Cholesky decomposition. The implementation described here provides the ''fastest'' execution rate for LU decomposition, 718 MFLOPS for a matrix of order 1000.
FPGA-based implementation for steganalysis: a JPEG-compatibility algorithm
NASA Astrophysics Data System (ADS)
Gutierrez-Fernandez, E.; Portela-García, M.; Lopez-Ongil, C.; Garcia-Valderas, M.
2013-05-01
Steganalysis is a process to detect hidden data in cover documents, like digital images, videos, audio files, etc. This is the inverse process of steganography, which is the used method to hide secret messages. The widely use of computers and network technologies make digital files very easy-to-use means for storing secret data or transmitting secret messages through the Internet. Depending on the cover medium used to embed the data, there are different steganalysis methods. In case of images, many of the steganalysis and steganographic methods are focused on JPEG image formats, since JPEG is one of the most common formats. One of the main important handicaps of steganalysis methods is the processing speed, since it is usually necessary to process huge amount of data or it can be necessary to process the on-going internet traffic in real-time. In this paper, a JPEG steganalysis system is implemented in an FPGA in order to speed-up the detection process with respect to software-based implementations and to increase the throughput. In particular, the implemented method is the JPEG-compatibility detection algorithm that is based on the fact that when a JPEG image is modified, the resulting image is incompatible with the JPEG compression process.
NASA Astrophysics Data System (ADS)
Tang, Yu-Hang; Karniadakis, George; Crunch Team
2014-03-01
We present a scalable dissipative particle dynamics simulation code, fully implemented on the Graphics Processing Units (GPUs) using a hybrid CUDA/MPI programming model, which achieves 10-30 times speedup on a single GPU over 16 CPU cores and almost linear weak scaling across a thousand nodes. A unified framework is developed within which the efficient generation of the neighbor list and maintaining particle data locality are addressed. Our algorithm generates strictly ordered neighbor lists in parallel, while the construction is deterministic and makes no use of atomic operations or sorting. Such neighbor list leads to optimal data loading efficiency when combined with a two-level particle reordering scheme. A faster in situ generation scheme for Gaussian random numbers is proposed using precomputed binary signatures. We designed custom transcendental functions that are fast and accurate for evaluating the pairwise interaction. Computer benchmarks demonstrate the speedup of our implementation over the CPU implementation as well as strong and weak scalability. A large-scale simulation of spontaneous vesicle formation consisting of 128 million particles was conducted to illustrate the practicality of our code in real-world applications. This work was supported by the new Department of Energy Collaboratory on Mathematics for Mesoscopic Modeling of Materials (CM4). Simulations were carried out at the Oak Ridge Leadership Computing Facility through the INCITE program under project BIP017.
FPGA implementation of 2-D discrete cosine transforms algorithm using systemC
NASA Astrophysics Data System (ADS)
Liu, Yifei; Ding, Mingyue
2007-12-01
Discrete Cosine Transform (DCT) is widely applied in image and video compression. This paper presented the software and hardware co-design method based on SystemC. As a case of study, a two dimension (2D) DCT Algorithm was implemented on Programmable Gate Arrays (FPGAs) chip. The short simulation time and verification process greatly increases the design efficiency of SystemC, making the product designed by SystemC more quickly into the market. The design effect using SystemC is compared between the expertise hardware designer and the software designer with little hardware knowledge. The result shows SystemC is an excellent and high efficiency hardware design method for an expertise hardware designer.
Dragonfly: an implementation of the expand–maximize–compress algorithm for single-particle imaging1
Ayyer, Kartik; Lan, Ti-Yen; Elser, Veit; Loh, N. Duane
2016-01-01
Single-particle imaging (SPI) with X-ray free-electron lasers has the potential to change fundamentally how biomacromolecules are imaged. The structure would be derived from millions of diffraction patterns, each from a different copy of the macromolecule before it is torn apart by radiation damage. The challenges posed by the resultant data stream are staggering: millions of incomplete, noisy and un-oriented patterns have to be computationally assembled into a three-dimensional intensity map and then phase reconstructed. In this paper, the Dragonfly software package is described, based on a parallel implementation of the expand–maximize–compress reconstruction algorithm that is well suited for this task. Auxiliary modules to simulate SPI data streams are also included to assess the feasibility of proposed SPI experiments at the Linac Coherent Light Source, Stanford, California, USA. PMID:27504078
Recursive multiport schemes for implementing quantum algorithms with photonic integrated circuits
NASA Astrophysics Data System (ADS)
Tabia, Gelo Noel M.
2016-01-01
We present recursive multiport schemes for implementing quantum Fourier transforms and the inversion step in Grover's algorithm on an integrated linear optics device. In particular, each scheme shows how to execute a quantum operation on 2 d modes using a pair of circuits for the same operation on d modes. The circuits operate on path-encoded qudits and realize d -dimensional unitary transformations on these states using linear optical networks with O (d2) optical elements. To evaluate the schemes against realistic errors, we ran simulations of proof-of-principle experiments using a simple fabrication model of silicon-based photonic integrated devices that employ directional couplers and thermo-optic modulators for beam splitters and phase shifters, respectively. We find that high-fidelity performance is achievable with our multiport circuits for 2-qubit and 3-qubit quantum Fourier transforms, and for quantum search on four-item and eight-item databases.
NASA Astrophysics Data System (ADS)
Fenner, Jonathan W.; Simon, Solomon H.; Eden, Dayton D.
1994-07-01
As new IR focal plane array (IRFPA) technologies become available, improved methods for coping with array errors must be developed. Traditional methods of nonuniformity correction using simple calibration mode are not adequate to compensate for the inherent nonuniformity and 1/f noise in some arrays. In an effort to compensate for nonuniformity in a HgCdTe IRFPA, and to reduce the effects of 1/f noise over a time interval, a new dynamic neural network (NN) based algorithm was implemented. The algorithm compensates for nonuniformities, and corrects for 1/f noise. A gradient descent algorithm is used with nearest neighbor feedback for training, creating a dynamic model of the IRFPA's gains and offsets, then updating and correcting them continuously. Improvements to the NN include implementation on a IBM 486 computer system, and a close examination of simulated scenes to test the algorithms boundaries. Preliminary designs for a real-time hardware prototype have been developed as well. Simulations were implemented to test the algorithm's ability to correct under a variety of conditions. A wide range of background noise, 1/f noise, object intensities, and background intensities were used. Results indicate that this algorithm can correct efficiently down to the background noise. Our conclusions are that NN based adaptive algorithms will supplement the effectiveness of IRFPA's.
NASA Astrophysics Data System (ADS)
Yu, Biin; Peng, Xiang; Tian, Jindong; Niu, Hanben
2006-01-01
A cascaded iterative angular spectrum approach (CIASA) based on the methodology of virtual optics is presented for optical security applications. The technique encodes the target image into two different phase only masks (POM) using a concept of free-space angular spectrum propagation. The two phase-masks are designed and located in any two arbitrary planes interrelated through the free space propagation domain in order to implement the optical encryption or authenticity verification. And both phase masks can serve as enciphered texts. Compared with previous methods, the proposed algorithm employs an improved searching strategy: modifying the phase-distributions of both masks synchronously as well as enlarging the searching space. And with such a scheme, we make use of a high performance floating-point Digital Signal Processor (DSP) to accomplish a design of multiple-locks and multiple-keys optical image encryption system. An evaluation of the system performance is made and it is shown that the algorithm results in much faster convergence and better image quality for the recovered image. And two masks and system parameters can be used to design keys for image encryption, therefore the decrypted image can be obtained only when all these keys are under authorization. This key-assignment strategy may reduce the risk of being intruded and show a high security level. These characters may introduce a high level security that makes the encrypted image more difficult to be decrypted by an unauthorized person.