Implementing the Deutsch-Jozsa algorithm with macroscopic ensembles
NASA Astrophysics Data System (ADS)
Semenenko, Henry; Byrnes, Tim
2016-05-01
Quantum computing implementations under consideration today typically deal with systems with microscopic degrees of freedom such as photons, ions, cold atoms, and superconducting circuits. The quantum information is stored typically in low-dimensional Hilbert spaces such as qubits, as quantum effects are strongest in such systems. It has, however, been demonstrated that quantum effects can be observed in mesoscopic and macroscopic systems, such as nanomechanical systems and gas ensembles. While few-qubit quantum information demonstrations have been performed with such macroscopic systems, a quantum algorithm showing exponential speedup over classical algorithms is yet to be shown. Here, we show that the Deutsch-Jozsa algorithm can be implemented with macroscopic ensembles. The encoding that we use avoids the detrimental effects of decoherence that normally plagues macroscopic implementations. We discuss two mapping procedures which can be chosen depending upon the constraints of the oracle and the experiment. Both methods have an exponential speedup over the classical case, and only require control of the ensembles at the level of the total spin of the ensembles. It is shown that both approaches reproduce the qubit Deutsch-Jozsa algorithm, and are robust under decoherence.
Quantum computation with classical light: Implementation of the Deutsch-Jozsa algorithm
NASA Astrophysics Data System (ADS)
Perez-Garcia, Benjamin; McLaren, Melanie; Goyal, Sandeep K.; Hernandez-Aranda, Raul I.; Forbes, Andrew; Konrad, Thomas
2016-05-01
We propose an optical implementation of the Deutsch-Jozsa Algorithm using classical light in a binary decision-tree scheme. Our approach uses a ring cavity and linear optical devices in order to efficiently query the oracle functional values. In addition, we take advantage of the intrinsic Fourier transforming properties of a lens to read out whether the function given by the oracle is balanced or constant.
Scheme for implementing the Deutsch-Jozsa algorithm in cavity QED
Zheng Shibiao
2004-09-01
We propose a scheme for realizing the Deutsch-Jozsa algorithm in cavity QED. The scheme is based on the resonant interaction of atoms with a cavity mode. The required experimental techniques are within the scope of what can be obtained in the microwave cavity QED setup. The experimental implementation of the scheme would be an important step toward more complex quantum computation in cavity QED.
Implementing Deutsch-Jozsa algorithm using light shifts and atomic ensembles
Dasgupta, Shubhrangshu; Biswas, Asoka; Agarwal, G.S.
2005-01-01
We present an optical scheme to implement the Deutsch-Jozsa algorithm using ac Stark shifts. The scheme uses an atomic ensemble consisting of four-level atoms interacting dispersively with a field. This leads to a Hamiltonian in the atom-field basis which is quite suitable for quantum computation. We show how one can implement the algorithm by performing proper one- and two-qubit operations. We emphasize that in our model the decoherence is expected to be minimal due to our usage of atomic ground states and freely propagating photon.
Kessel, Alexander R.; Yakovleva, Natalia M.
2002-12-01
Schemes of experimental realization of the main two-qubit processors for quantum computers and the Deutsch-Jozsa algorithm are derived in virtual spin representation. The results are applicable for every four quantum states allowing the required properties for quantum processor implementation if for qubit encoding, virtual spin representation is used. A four-dimensional Hilbert space of nuclear spin 3/2 is considered in detail for this aim.
Shi, Fazhan; Rong, Xing; Xu, Nanyang; Wang, Ya; Wu, Jie; Chong, Bo; Peng, Xinhua; Kniepert, Juliane; Schoenfeld, Rolf-Simon; Harneit, Wolfgang; Feng, Mang; Du, Jiangfeng
2010-07-23
The nitrogen-vacancy defect center (N-V center) is a promising candidate for quantum information processing due to the possibility of coherent manipulation of individual spins in the absence of the cryogenic requirement. We report a room-temperature implementation of the Deutsch-Jozsa algorithm by encoding both a qubit and an auxiliary state in the electron spin of a single N-V center. By thus exploiting the specific S=1 character of the spin system, we demonstrate how even scarce quantum resources can be used for test-bed experiments on the way towards a large-scale quantum computing architecture. PMID:20867828
Unifying parameter estimation and the Deutsch-Jozsa algorithm for continuous variables
Zwierz, Marcin; Perez-Delgado, Carlos A.; Kok, Pieter
2010-10-15
We reveal a close relationship between quantum metrology and the Deutsch-Jozsa algorithm on continuous-variable quantum systems. We develop a general procedure, characterized by two parameters, that unifies parameter estimation and the Deutsch-Jozsa algorithm. Depending on which parameter we keep constant, the procedure implements either the parameter-estimation protocol or the Deutsch-Jozsa algorithm. The parameter-estimation part of the procedure attains the Heisenberg limit and is therefore optimal. Due to the use of approximate normalizable continuous-variable eigenstates, the Deutsch-Jozsa algorithm is probabilistic. The procedure estimates a value of an unknown parameter and solves the Deutsch-Jozsa problem without the use of any entanglement.
Collins, David
2010-05-15
A general framework for regarding oracle-assisted quantum algorithms as tools for discriminating among unitary transformations is described. This framework is applied to the Deutsch-Jozsa problem and all possible quantum algorithms which solve the problem with certainty using oracle unitaries in a particular form are derived. It is also used to show that any quantum algorithm that solves the Deutsch-Jozsa problem starting with a quantum system in a particular class of initial, thermal equilibrium-based states of the type encountered in solution-state NMR can only succeed with greater probability than a classical algorithm when the problem size n exceeds {approx}10{sup 5}.
Using the Deutsch-Jozsa algorithm to partition arrays
NASA Astrophysics Data System (ADS)
Lipovaca, Samir
2010-03-01
Using the Deutsch-Jozsa algorithm, we will develop a method for solving a class of problems in which we need to determine parts of an array and then apply a specified function to each independent part. Since present quantum computers are not robust enough for code writing and execution, we will build a model of a vector quantum computer that implements the Deutsch-Jozsa algorithm from a machine language view using the APL2 programming language. The core of the method is an operator (DJBOX) which allows evaluation of an arbitrary function f by the Deutsch-Jozsa algorithm. Two key functions of the method are GET/PARTITION and CALC/WITH/PARTITIONS. The GET/PARTITION function determines parts of an array based on the function f. The CALC/WITH/PARTITIONS function determines parts of an array based on the function f and then applies another function to each independent part. We will imagine the method is implemented on the above vector quantum computer. We will show that the method can be successfully executed.
Experimental demonstration of the Deutsch-Jozsa algorithm in homonuclear multispin systems
Wu Zhen; Luo Jun; Feng Mang; Li Jun; Zheng Wenqiang; Peng Xinhua
2011-10-15
Despite early experimental tests of the Deutsch-Jozsa (DJ) algorithm, there have been only a very few nontrivial balanced functions tested for register number n>3. In this paper, we experimentally demonstrate the DJ algorithm in four- and five-qubit homonuclear spin systems by the nuclear-magnetic-resonance technique, by which we encode the one function evaluation into a long shaped pulse with the application of the gradient ascent algorithm. Our work, dramatically reducing the accumulated errors due to gate imperfections and relaxation, demonstrates a better implementation of the DJ algorithm.
Experimental realization of the Deutsch-Jozsa algorithm with a six-qubit cluster state
Vallone, Giuseppe; Donati, Gaia; Bruno, Natalia; Chiuri, Andrea; Mataloni, Paolo
2010-05-15
We describe an experimental realization of the Deutsch-Jozsa quantum algorithm to evaluate the properties of a two-bit Boolean function in the framework of one-way quantum computation. For this purpose, a two-photon six-qubit cluster state was engineered. Its peculiar topological structure is the basis of the original measurement pattern allowing the algorithm realization. The good agreement of the experimental results with the theoretical predictions, obtained at {approx}1 kHz success rate, demonstrates the correct implementation of the algorithm.
Tame, M. S.; Kim, M. S.
2010-09-15
We show that fundamental versions of the Deutsch-Jozsa and Bernstein-Vazirani quantum algorithms can be performed using a small entangled cluster state resource of only six qubits. We then investigate the minimal resource states needed to demonstrate general n-qubit versions and a scalable method to produce them. For this purpose, we propose a versatile photonic on-chip setup.
NMR tomography of the three-qubit Deutsch-Jozsa algorithm
Mangold, Oliver; Heidebrecht, Andreas; Mehring, Michael
2004-10-01
The optimized version of the Deutsch-Jozsa algorithm proposed by Collins et al. was implemented using the three {sup 19}F nuclear spins of 2,3,4-trifluoroaniline as qubits. To emulate the behavior of pure quantum-mechanical states pseudopure states of the ensemble were prepared prior to execution of the algorithm. Full tomography of the density matrix was employed to obtain detailed information about initial, intermediate, and final states. Information, thus obtained, was applied to optimize the pulse sequences used. It is shown that substantial improvement of the fidelity of the preparation may be achieved by compensating the effects caused by the different relaxation behavior of the different substates of the density matrix. All manipulations of the quantum states were performed under the conditions of unresolved spin-spin interactions.
Coherence as a resource in decision problems: The Deutsch-Jozsa algorithm and a variation
NASA Astrophysics Data System (ADS)
Hillery, Mark
2016-01-01
That superpositions of states can be useful for performing tasks in quantum systems has been known since the early days of quantum information, but only recently has a quantitative theory of quantum coherence been proposed. Here we apply that theory to an analysis of the Deutsch-Jozsa algorithm, which depends on quantum coherence for its operation. The Deutsch-Jozsa algorithm solves a decision problem, and we focus on a probabilistic version of that problem, comparing probability of being correct for both classical and quantum procedures. In addition, we study a related decision problem in which the quantum procedure has one-sided error while the classical procedure has two-sided error. The role of coherence on the quantum success probabilities in both of these problems is examined.
Anderson, Brandon M.; Collins, David
2005-10-15
We compare the failure probabilities of ensemble implementations of quantum algorithms which use pseudopure initial states, quantified by their polarization, to those of competing classical probabilistic algorithms. Specifically we consider a class algorithms which require only one bit to output the solution to problems. For large ensemble sizes, we present a general scheme to determine a critical polarization beneath which the quantum algorithm fails with greater probability than its classical competitor. We apply this to the Deutsch-Jozsa algorithm and show that the critical polarization is 86.6%.
Optical simulation of quantum algorithms using programmable liquid-crystal displays
Puentes, Graciana; La Mela, Cecilia; Ledesma, Silvia; Iemmi, Claudio; Paz, Juan Pablo; Saraceno, Marcos
2004-04-01
We present a scheme to perform an all optical simulation of quantum algorithms and maps. The main components are lenses to efficiently implement the Fourier transform and programmable liquid-crystal displays to introduce space dependent phase changes on a classical optical beam. We show how to simulate Deutsch-Jozsa and Grover's quantum algorithms using essentially the same optical array programmed in two different ways.
Multipartite entanglement in quantum algorithms
Bruss, D.; Macchiavello, C.
2011-05-15
We investigate the entanglement features of the quantum states employed in quantum algorithms. In particular, we analyze the multipartite entanglement properties in the Deutsch-Jozsa, Grover, and Simon algorithms. Our results show that for these algorithms most instances involve multipartite entanglement.
Realization of Simple Quantum Algorithms with Circuit Quantum Electrodynamics
NASA Astrophysics Data System (ADS)
Dicarlo, Leonardo
2010-03-01
Superconducting circuits have made considerable progress in the requirements of quantum coherence, universal gate operations and qubit readout necessary to realize a quantum computer. However, simultaneously meeting these requirements makes the solid-state realization of few-qubit processors, as previously implemented in nuclear magnetic resonance, ion-trap and optical systems, an exciting challenge. We present the realization of a two-qubit superconducting processor based on circuit quantum electrodynamics (cQED), and report progress by the Yale cQED team towards a four-qubit upgrade. The architecture employs a microwave transmission-line cavity as a quantum bus coupling multiple transmon qubits. Unitary control is achieved by concatenation of high-fidelity single-qubit rotations induced via resonant microwave tones, and multi-qubit adiabatic phase gates realized by local flux control of qubit frequencies. Qubit readout uses the cavity as a quadratic detector, such that a single, calibrated measurement channel gives direct access to multi-qubit correlations. We present generation of Bell states; entanglement quantification by strong violation of Clauser-Horne-Shimony-Holt inequalities; and implementations of the Grover search and Deutsch-Jozsa algorithms. We report experimental progress in extending adiabatic phase gates and joint readout to four qubits, and improving qubit coherence on the road to realizing more complex quantum algorithms. Research done in collaboration with J. M. Chow, J. M. Gambetta, Lev S. Bishop, B. R. Johnson, D. I. Schuster, A. Nunnenkamp, J. Majer, A. Blais, L. Frunzio, M. H. Devoret, S. M. Girvin, and R. J. Schoelkopf.
Systolic algorithms and their implementation
Kung, H.T.
1984-01-01
Very high performance computer systems must rely heavily on parallelism since there are severe physical and technological limits on the ultimate speed of any single processor. The systolic array concept developed in the last several years allows effective use of a very large number of processors in parallel. This article illustrates the basic ideas by reviewing a systolic array design for matrix triangularization and describing its use in the on-the-fly updating of Cholesky decomposition of covariance matrices-a crucial computation in adaptive signal processing. Following this are discussions on issues related to the hardware implementation of systolic algorithms in general, and some guidelines for designing systolic algorithms that will be convenient for implementation. 33 references.
Linear Bregman algorithm implemented in parallel GPU
NASA Astrophysics Data System (ADS)
Li, Pengyan; Ke, Jue; Sui, Dong; Wei, Ping
2015-08-01
At present, most compressed sensing (CS) algorithms have poor converging speed, thus are difficult to run on PC. To deal with this issue, we use a parallel GPU, to implement a broadly used compressed sensing algorithm, the Linear Bregman algorithm. Linear iterative Bregman algorithm is a reconstruction algorithm proposed by Osher and Cai. Compared with other CS reconstruction algorithms, the linear Bregman algorithm only involves the vector and matrix multiplication and thresholding operation, and is simpler and more efficient for programming. We use C as a development language and adopt CUDA (Compute Unified Device Architecture) as parallel computing architectures. In this paper, we compared the parallel Bregman algorithm with traditional CPU realized Bregaman algorithm. In addition, we also compared the parallel Bregman algorithm with other CS reconstruction algorithms, such as OMP and TwIST algorithms. Compared with these two algorithms, the result of this paper shows that, the parallel Bregman algorithm needs shorter time, and thus is more convenient for real-time object reconstruction, which is important to people's fast growing demand to information technology.
Java implementation of Class Association Rule algorithms
Energy Science and Technology Software Center (ESTSC)
2007-08-30
Java implementation of three Class Association Rule mining algorithms, NETCAR, CARapriori, and clustering based rule mining. NETCAR algorithm is a novel algorithm developed by Makio Tamura. The algorithm is discussed in a paper: UCRL-JRNL-232466-DRAFT, and would be published in a peer review scientific journal. The software is used to extract combinations of genes relevant with a phenotype from a phylogenetic profile and a phenotype profile. The phylogenetic profiles is represented by a binary matrix andmore » a phenotype profile is represented by a binary vector. The present application of this software will be in genome analysis, however, it could be applied more generally.« less
Java implementation of Class Association Rule algorithms
Tamura, Makio
2007-08-30
Java implementation of three Class Association Rule mining algorithms, NETCAR, CARapriori, and clustering based rule mining. NETCAR algorithm is a novel algorithm developed by Makio Tamura. The algorithm is discussed in a paper: UCRL-JRNL-232466-DRAFT, and would be published in a peer review scientific journal. The software is used to extract combinations of genes relevant with a phenotype from a phylogenetic profile and a phenotype profile. The phylogenetic profiles is represented by a binary matrix and a phenotype profile is represented by a binary vector. The present application of this software will be in genome analysis, however, it could be applied more generally.
Implementation of the phase gradient algorithm
Wahl, D.E.; Eichel, P.H.; Jakowatz, C.V. Jr.
1990-01-01
The recently introduced Phase Gradient Autofocus (PGA) algorithm is a non-parametric autofocus technique which has been shown to be quite effective for phase correction of Synthetic Aperture Radar (SAR) imagery. This paper will show that this powerful algorithm can be executed at near real-time speeds and also be implemented in a relatively small piece of hardware. A brief review of the PGA will be presented along with an overview of some critical implementation considerations. In addition, a demonstration of the PGA algorithm running on a 7 in. {times} 10 in. printed circuit board containing a TMS320C30 digital signal processing (DSP) chip will be given. With this system, using only the 20 range bins which contain the brightest points in the image, the algorithm can correct a badly degraded 256 {times} 256 image in as little as 3 seconds. Using all range bins, the algorithm can correct the image in 9 seconds. 4 refs., 2 figs.
Implementing Shor's algorithm on Josephson charge qubits
Vartiainen, Juha J.; Salomaa, Martti M.; Niskanen, Antti O.; Nakahara, Mikio
2004-07-01
We investigate the physical implementation of Shor's factorization algorithm on a Josephson charge qubit register. While we pursue a universal method to factor a composite integer of any size, the scheme is demonstrated for the number 21. We consider both the physical and algorithmic requirements for an optimal implementation when only a small number of qubits are available. These aspects of quantum computation are usually the topics of separate research communities; we present a unifying discussion of both of these fundamental features bridging Shor's algorithm to its physical realization using Josephson junction qubits. In order to meet the stringent requirements set by a short decoherence time, we accelerate the algorithm by decomposing the quantum circuit into tailored two- and three-qubit gates and we find their physical realizations through numerical optimization.
Deterministic quantum computation with one photonic qubit
NASA Astrophysics Data System (ADS)
Hor-Meyll, M.; Tasca, D. S.; Walborn, S. P.; Ribeiro, P. H. Souto; Santos, M. M.; Duzzioni, E. I.
2015-07-01
We show that deterministic quantum computing with one qubit (DQC1) can be experimentally implemented with a spatial light modulator, using the polarization and the transverse spatial degrees of freedom of light. The scheme allows the computation of the trace of a high-dimension matrix, being limited by the resolution of the modulator panel and the technical imperfections. In order to illustrate the method, we compute the normalized trace of unitary matrices and implement the Deutsch-Jozsa algorithm. The largest matrix that can be manipulated with our setup is 1080 ×1920 , which is able to represent a system with approximately 21 qubits.
CORDIC algorithms for SVM FPGA implementation
NASA Astrophysics Data System (ADS)
Gimeno Sarciada, Jesús; Lamel Rivera, Horacio; Jiménez, Matías
2010-04-01
Support Vector Machines are currently one of the best classification algorithms used in a wide number of applications. The ability to extract a classification function from a limited number of learning examples keeping in the structural risk low has demonstrated to be a clear alternative to other neural networks. However, the calculations involved in computing the kernel and the repetition of the process for all support vectors in the classification problem are certainly intensive, requiring time or power consumption in order to function correctly. This problem could be a drawback in certain applications with limited resources or time. Therefore simple algorithms circumventing this problem are needed. In this paper we analyze an FPGA implementation of a SVM which uses a CORDIC algorithm for simplifying the calculation of as specific kernel greatly reducing the time and hardware requirements needed for the classification, allowing for powerful in-field portable applications. The algorithm is and its calculation capabilities are shown. The full SVM classifier using this algorithm is implemented in an FPGA and its in-field use assessed for high speed low power classification.
Automating parallel implementation of neural learning algorithms.
Rana, O F
2000-06-01
Neural learning algorithms generally involve a number of identical processing units, which are fully or partially connected, and involve an update function, such as a ramp, a sigmoid or a Gaussian function for instance. Some variations also exist, where units can be heterogeneous, or where an alternative update technique is employed, such as a pulse stream generator. Associated with connections are numerical values that must be adjusted using a learning rule, and and dictated by parameters that are learning rule specific, such as momentum, a learning rate, a temperature, amongst others. Usually, neural learning algorithms involve local updates, and a global interaction between units is often discouraged, except in instances where units are fully connected, or involve synchronous updates. In all of these instances, concurrency within a neural algorithm cannot be fully exploited without a suitable implementation strategy. A design scheme is described for translating a neural learning algorithm from inception to implementation on a parallel machine using PVM or MPI libraries, or onto programmable logic such as FPGAs. A designer must first describe the algorithm using a specialised Neural Language, from which a Petri net (PN) model is constructed automatically for verification, and building a performance model. The PN model can be used to study issues such as synchronisation points, resource sharing and concurrency within a learning rule. Specialised constructs are provided to enable a designer to express various aspects of a learning rule, such as the number and connectivity of neural nodes, the interconnection strategies, and information flows required by the learning algorithm. A scheduling and mapping strategy is then used to translate this PN model onto a multiprocessor template. We demonstrate our technique using a Kohonen and backpropagation learning rules, implemented on a loosely coupled workstation cluster, and a dedicated parallel machine, with PVM libraries
Terascale spectral element algorithms and implementations.
Fischer, P. F.; Tufo, H. M.
1999-08-17
We describe the development and implementation of an efficient spectral element code for multimillion gridpoint simulations of incompressible flows in general two- and three-dimensional domains. We review basic and recently developed algorithmic underpinnings that have resulted in good parallel and vector performance on a broad range of architectures, including the terascale computing systems now coming online at the DOE labs. Sustained performance of 219 GFLOPS has been recently achieved on 2048 nodes of the Intel ASCI-Red machine at Sandia.
Parallel Implementation of Katsevich's FBP Algorithm
Guo, Xiaohu; Kong, Qiang; Zhou, Tie; Jiang, Ming
2006-01-01
For spiral cone-beam CT, parallel computing is an effective approach to resolving the problem of heavy computation burden. It is well known that the major computation time is spent in the backprojection step for either filtered-backprojection (FBP) or backprojected-filtration (BPF) algorithms. By the cone-beam cover method [1], the backprojection procedure is driven by cone-beam projections, and every cone-beam projection can be backprojected independently. Basing on this fact, we develop a parallel implementation of Katsevich's FBP algorithm. We do all the numerical experiments on a Linux cluster. In one typical experiment, the sequential reconstruction time is 781.3 seconds, while the parallel reconstruction time is 25.7 seconds with 32 processors. PMID:23165019
A Fast Implementation of the ISOCLUS Algorithm
NASA Technical Reports Server (NTRS)
Memarsadeghi, Nargess; Mount, David M.; Netanyahu, Nathan S.; LeMoigne, Jacqueline
2003-01-01
Unsupervised clustering is a fundamental building block in numerous image processing applications. One of the most popular and widely used clustering schemes for remote sensing applications is the ISOCLUS algorithm, which is based on the ISODATA method. The algorithm is given a set of n data points in d-dimensional space, an integer k indicating the initial number of clusters, and a number of additional parameters. The general goal is to compute the coordinates of a set of cluster centers in d-space, such that those centers minimize the mean squared distance from each data point to its nearest center. This clustering algorithm is similar to another well-known clustering method, called k-means. One significant feature of ISOCLUS over k-means is that the actual number of clusters reported might be fewer or more than the number supplied as part of the input. The algorithm uses different heuristics to determine whether to merge lor split clusters. As ISOCLUS can run very slowly, particularly on large data sets, there has been a growing .interest in the remote sensing community in computing it efficiently. We have developed a faster implementation of the ISOCLUS algorithm. Our improvement is based on a recent acceleration to the k-means algorithm of Kanungo, et al. They showed that, by using a kd-tree data structure for storing the data, it is possible to reduce the running time of k-means. We have adapted this method for the ISOCLUS algorithm, and we show that it is possible to achieve essentially the same results as ISOCLUS on large data sets, but with significantly lower running times. This adaptation involves computing a number of cluster statistics that are needed for ISOCLUS but not for k-means. Both the k-means and ISOCLUS algorithms are based on iterative schemes, in which nearest neighbors are calculated until some convergence criterion is satisfied. Each iteration requires that the nearest center for each data point be computed. Naively, this requires O
Quantum adiabatic algorithm for factorization and its experimental implementation.
Peng, Xinhua; Liao, Zeyang; Xu, Nanyang; Qin, Gan; Zhou, Xianyi; Suter, Dieter; Du, Jiangfeng
2008-11-28
We propose an adiabatic quantum algorithm capable of factorizing numbers, using fewer qubits than Shor's algorithm. We implement the algorithm in a NMR quantum information processor and experimentally factorize the number 21. In the range that our classical computer could simulate, the quantum adiabatic algorithm works well, providing evidence that the running time of this algorithm scales polynomially with the problem size. PMID:19113467
Implementation of a watershed algorithm on FPGAs
NASA Astrophysics Data System (ADS)
Zahirazami, Shahram; Akil, Mohamed
1998-10-01
In this article we present an implementation of a watershed algorithm on a multi-FPGA architecture. This implementation is based on an hierarchical FIFO. A separate FIFO for each gray level. The gray scale value of a pixel is taken for the altitude of the point. In this way we look at the image as a relief. We proceed by a flooding step. It's like as we immerse the relief in a lake. The water begins to come up and when the water of two different catchment basins reach each other, we will construct a separator or a `Watershed'. This approach is data dependent, hence the process time is different for different images. The H-FIFO is used to guarantee the nature of immersion, it means that we need two types of priority. All the points of an altitude `n' are processed before any point of altitude `n + 1'. And inside an altitude water propagates with a constant velocity in all directions from the source. This operator needs two images as input. An original image or it's gradient and the marker image. A classic way to construct the marker image is to build an image of minimal regions. Each minimal region has it's unique label. This label is the color of the water and will be used to see whether two different water touch each other. The algorithm at first fill the hierarchy FIFO with neighbors of all the regions who are not colored. Next it fetches the first pixel from the first non-empty FIFO and treats this pixel. This pixel will take the color of its neighbor, and all the neighbors who are not already in the H-FIFO are put in their correspondent FIFO. The process is over when the H-FIFO is empty. The result is a segmented and labeled image.
Categorizing Variations of Student-Implemented Sorting Algorithms
ERIC Educational Resources Information Center
Taherkhani, Ahmad; Korhonen, Ari; Malmi, Lauri
2012-01-01
In this study, we examined freshmen students' sorting algorithm implementations in data structures and algorithms' course in two phases: at the beginning of the course before the students received any instruction on sorting algorithms, and after taking a lecture on sorting algorithms. The analysis revealed that many students have insufficient…
Implementation of a Wavefront-Sensing Algorithm
NASA Technical Reports Server (NTRS)
Smith, Jeffrey S.; Dean, Bruce; Aronstein, David
2013-01-01
A computer program has been written as a unique implementation of an image-based wavefront-sensing algorithm reported in "Iterative-Transform Phase Retrieval Using Adaptive Diversity" (GSC-14879-1), NASA Tech Briefs, Vol. 31, No. 4 (April 2007), page 32. This software was originally intended for application to the James Webb Space Telescope, but is also applicable to other segmented-mirror telescopes. The software is capable of determining optical-wavefront information using, as input, a variable number of irradiance measurements collected in defocus planes about the best focal position. The software also uses input of the geometrical definition of the telescope exit pupil (otherwise denoted the pupil mask) to identify the locations of the segments of the primary telescope mirror. From the irradiance data and mask information, the software calculates an estimate of the optical wavefront (a measure of performance) of the telescope generally and across each primary mirror segment specifically. The software is capable of generating irradiance data, wavefront estimates, and basis functions for the full telescope and for each primary-mirror segment. Optionally, each of these pieces of information can be measured or computed outside of the software and incorporated during execution of the software.
Parallel optimization algorithms and their implementation in VLSI design
NASA Technical Reports Server (NTRS)
Lee, G.; Feeley, J. J.
1991-01-01
Two new parallel optimization algorithms based on the simplex method are described. They may be executed by a SIMD parallel processor architecture and be implemented in VLSI design. Several VLSI design implementations are introduced. An application example is reported to demonstrate that the algorithms are effective.
Implementation and analysis of a fast backprojection algorithm
NASA Astrophysics Data System (ADS)
Gorham, LeRoy A.; Majumder, Uttam K.; Buxa, Peter; Backues, Mark J.; Lindgren, Andrew C.
2006-05-01
The convolution backprojection algorithm is an accurate synthetic aperture radar imaging technique, but it has seen limited use in the radar community due to its high computational costs. Therefore, significant research has been conducted for a fast backprojection algorithm, which surrenders some image quality for increased computational efficiency. This paper describes an implementation of both a standard convolution backprojection algorithm and a fast backprojection algorithm optimized for use on a Linux cluster and a field-programmable gate array (FPGA) based processing system. The performance of the different implementations is compared using synthetic ideal point targets and the SPIE XPatch Backhoe dataset.
An adaptive, lossless data compression algorithm and VLSI implementations
NASA Technical Reports Server (NTRS)
Venbrux, Jack; Zweigle, Greg; Gambles, Jody; Wiseman, Don; Miller, Warner H.; Yeh, Pen-Shu
1993-01-01
This paper first provides an overview of an adaptive, lossless, data compression algorithm originally devised by Rice in the early '70s. It then reports the development of a VLSI encoder/decoder chip set developed which implements this algorithm. A recent effort in making a space qualified version of the encoder is described along with several enhancements to the algorithm. The performance of the enhanced algorithm is compared with those from other currently available lossless compression techniques on multiple sets of test data. The results favor our implemented technique in many applications.
New algorithms for phase unwrapping: implementation and testing
NASA Astrophysics Data System (ADS)
Kotlicki, Krzysztof
1998-11-01
In this paper it is shown how the regularization theory was used for the new noise immune algorithm for phase unwrapping. The algorithm were developed by M. Servin, J.L. Marroquin and F.J. Cuevas in Centro de Investigaciones en Optica A.C. and Centro de Investigacion en Matematicas A.C. in Mexico. The theory was presented. The objective of the work was to implement the algorithm into the software able to perform the off-line unwrapping on the fringe pattern. The algorithms are present as well as the result and also the software developed for the implementation.
Implementation of the IDEA algorithm for image encryption
NASA Astrophysics Data System (ADS)
Dang, Philip P.; Chau, Paul M.
2000-11-01
In this paper, we present an implementation of the IDEA algorithm for image encryption. The image encryption is incorporated into the compression algorithm for transmission over a data network. In the proposed method, Embedded Wavelet Zero-tree Coding is used for image compression. Experimental results show that our proposed scheme enhances data security and reduces the network bandwidth required for video transmissions. A software implementation and system architecture for hardware implementation of the IDEA image encryption algorithm based on Field Programmable Gate Array (FPGA) technology are presented in this paper.
An Efficient Implementation of the Gliding Box Lacunarity Algorithm
Charles R. Tolle,; Timothy R. McJunkin; David J. Gorsich
2008-03-01
Lacunarity is a measure of how data fills space. It complements fractal dimension, which measures how much space is filled. Currently, many researchers use the gliding box algorithm for calculating lacunarity. This paper introduces a fast algorithm for making this calculation. The algorithm presented is akin to fast box counting algorithms used by some researchers in estimating fractal dimension. A simplified gliding box measure equation along with key pseudo code implementations for the algorithm are presented. Applications for the gliding box lacunarity measure have included subjects that range from biological community modeling to target detection.
An Agent Inspired Reconfigurable Computing Implementation of a Genetic Algorithm
NASA Technical Reports Server (NTRS)
Weir, John M.; Wells, B. Earl
2003-01-01
Many software systems have been successfully implemented using an agent paradigm which employs a number of independent entities that communicate with one another to achieve a common goal. The distributed nature of such a paradigm makes it an excellent candidate for use in high speed reconfigurable computing hardware environments such as those present in modem FPGA's. In this paper, a distributed genetic algorithm that can be applied to the agent based reconfigurable hardware model is introduced. The effectiveness of this new algorithm is evaluated by comparing the quality of the solutions found by the new algorithm with those found by traditional genetic algorithms. The performance of a reconfigurable hardware implementation of the new algorithm on an FPGA is compared to traditional single processor implementations.
Multiple Lookup Table-Based AES Encryption Algorithm Implementation
NASA Astrophysics Data System (ADS)
Gong, Jin; Liu, Wenyi; Zhang, Huixin
Anew AES (Advanced Encryption Standard) encryption algorithm implementation was proposed in this paper. It is based on five lookup tables, which are generated from S-box(the substitution table in AES). The obvious advantages are reducing the code-size, improving the implementation efficiency, and helping new learners to understand the AES encryption algorithm and GF(28) multiplication which are necessary to correctly implement AES[1]. This method can be applied on processors with word length 32 or above, FPGA and others. And correspondingly we can implement it by VHDL, Verilog, VB and other languages.
Nuclear magnetic resonance implementation of a quantum clock synchronization algorithm
Zhang Jingfu; Long, G.C; Liu Wenzhang; Deng Zhiwei; Lu Zhiheng
2004-12-01
The quantum clock synchronization (QCS) algorithm proposed by Chuang [Phys. Rev. Lett. 85, 2006 (2000)] has been implemented in a three qubit nuclear magnetic resonance quantum system. The time difference between two separated clocks can be determined by measuring the output states. The experimental realization of the QCS algorithm also demonstrates an application of the quantum phase estimation.
A fast portable implementation of the Secure Hash Algorithm, III.
McCurley, Kevin S.
1992-10-01
In 1992, NIST announced a proposed standard for a collision-free hash function. The algorithm for producing the hash value is known as the Secure Hash Algorithm (SHA), and the standard using the algorithm in known as the Secure Hash Standard (SHS). Later, an announcement was made that a scientist at NSA had discovered a weakness in the original algorithm. A revision to this standard was then announced as FIPS 180-1, and includes a slight change to the algorithm that eliminates the weakness. This new algorithm is called SHA-1. In this report we describe a portable and efficient implementation of SHA-1 in the C language. Performance information is given, as well as tips for porting the code to other architectures. We conclude with some observations on the efficiency of the algorithm, and a discussion of how the efficiency of SHA might be improved.
Algorithm implementation on the Navier-Stokes computer
NASA Technical Reports Server (NTRS)
Krist, Steven E.; Zang, Thomas A.
1987-01-01
The Navier-Stokes Computer is a multi-purpose parallel-processing supercomputer which is currently under development at Princeton University. It consists of multiple local memory parallel processors, called Nodes, which are interconnected in a hypercube network. Details of the procedures involved in implementing an algorithm on the Navier-Stokes computer are presented. The particular finite difference algorithm considered in this analysis was developed for simulation of laminar-turbulent transition in wall bounded shear flows. Projected timing results for implementing this algorithm indicate that operation rates in excess of 42 GFLOPS are feasible on a 128 Node machine.
Implementing a self-structuring data learning algorithm
NASA Astrophysics Data System (ADS)
Graham, James; Carson, Daniel; Ternovskiy, Igor
2016-05-01
In this paper, we elaborate on what we did to implement our self-structuring data learning algorithm. To recap, we are working to develop a data learning algorithm that will eventually be capable of goal driven pattern learning and extrapolation of more complex patterns from less complex ones. At this point we have developed a conceptual framework for the algorithm, but have yet to discuss our actual implementation and the consideration and shortcuts we needed to take to create said implementation. We will elaborate on our initial setup of the algorithm and the scenarios we used to test our early stage algorithm. While we want this to be a general algorithm, it is necessary to start with a simple scenario or two to provide a viable development and testing environment. To that end, our discussion will be geared toward what we include in our initial implementation and why, as well as what concerns we may have. In the future, we expect to be able to apply our algorithm to a more general approach, but to do so within a reasonable time, we needed to pick a place to start.
Rapid algorithm prototyping and implementation for power quality measurement
NASA Astrophysics Data System (ADS)
Kołek, Krzysztof; Piątek, Krzysztof
2015-12-01
This article presents a Model-Based Design (MBD) approach to rapidly implement power quality (PQ) metering algorithms. Power supply quality is a very important aspect of modern power systems and will become even more important in future smart grids. In this case, maintaining the PQ parameters at the desired level will require efficient implementation methods of the metering algorithms. Currently, the development of new, advanced PQ metering algorithms requires new hardware with adequate computational capability and time intensive, cost-ineffective manual implementations. An alternative, considered here, is an MBD approach. The MBD approach focuses on the modelling and validation of the model by simulation, which is well-supported by a Computer-Aided Engineering (CAE) packages. This paper presents two algorithms utilized in modern PQ meters: a phase-locked loop based on an Enhanced Phase Locked Loop (EPLL), and the flicker measurement according to the IEC 61000-4-15 standard. The algorithms were chosen because of their complexity and non-trivial development. They were first modelled in the MATLAB/Simulink package, then tested and validated in a simulation environment. The models, in the form of Simulink diagrams, were next used to automatically generate C code. The code was compiled and executed in real-time on the Zynq Xilinx platform that combines a reconfigurable Field Programmable Gate Array (FPGA) with a dual-core processor. The MBD development of PQ algorithms, automatic code generation, and compilation form a rapid algorithm prototyping and implementation path for PQ measurements. The main advantage of this approach is the ability to focus on the design, validation, and testing stages while skipping over implementation issues. The code generation process renders production-ready code that can be easily used on the target hardware. This is especially important when standards for PQ measurement are in constant development, and the PQ issues in emerging smart
Testing of hardware implementation of infrared image enhancing algorithm
NASA Astrophysics Data System (ADS)
Dulski, R.; Sosnowski, T.; PiÄ tkowski, T.; Trzaskawka, P.; Kastek, M.; Kucharz, J.
2012-10-01
The interpretation of IR images depends on radiative properties of observed objects and surrounding scenery. Skills and experience of an observer itself are also of great importance. The solution to improve the effectiveness of observation is utilization of algorithm of image enhancing capable to improve the image quality and the same effectiveness of object detection. The paper presents results of testing the hardware implementation of IR image enhancing algorithm based on histogram processing. Main issue in hardware implementation of complex procedures for image enhancing algorithms is high computational cost. As a result implementation of complex algorithms using general purpose processors and software usually does not bring satisfactory results. Because of high efficiency requirements and the need of parallel operation, the ALTERA's EP2C35F672 FPGA device was used. It provides sufficient processing speed combined with relatively low power consumption. A digital image processing and control module was designed and constructed around two main integrated circuits: a FPGA device and a microcontroller. Programmable FPGA device performs image data processing operations which requires considerable computing power. It also generates the control signals for array readout, performs NUC correction and bad pixel mapping, generates the control signals for display module and finally executes complex image processing algorithms. Implemented adaptive algorithm is based on plateau histogram equalization. Tests were performed on real IR images of different types of objects registered in different spectral bands. The simulations and laboratory experiments proved the correct operation of the designed system in executing the sophisticated image enhancement.
Efficient Implementation of the Backpropagation Algorithm in FPGAs and Microcontrollers.
Ortega-Zamorano, Francisco; Jerez, Jose M; Urda Munoz, Daniel; Luque-Baena, Rafael M; Franco, Leonardo
2016-09-01
The well-known backpropagation learning algorithm is implemented in a field-programmable gate array (FPGA) board and a microcontroller, focusing in obtaining efficient implementations in terms of a resource usage and computational speed. The algorithm was implemented in both cases using a training/validation/testing scheme in order to avoid overfitting problems. For the case of the FPGA implementation, a new neuron representation that reduces drastically the resource usage was introduced by combining the input and first hidden layer units in a single module. Further, a time-division multiplexing scheme was implemented for carrying out product computations taking advantage of the built-in digital signal processor cores. In both implementations, the floating-point data type representation normally used in a personal computer (PC) has been changed to a more efficient one based on a fixed-point scheme, reducing system memory variable usage and leading to an increase in computation speed. The results show that the modifications proposed produced a clear increase in computation speed in comparison with the standard PC-based implementation, demonstrating the usefulness of the intrinsic parallelism of FPGAs in neurocomputational tasks and the suitability of both implementations of the algorithm for its application to the real world problems. PMID:26277004
Outline of a fast hardware implementation of Winograd's DFT algorithm
NASA Technical Reports Server (NTRS)
Zohar, S.
1980-01-01
The main characteristics of the discrete Fourier transform (DFT) algorithm considered by Winograd (1976) is a significant reduction in the number of multiplications. Its primary disadvantage is a higher structural complexity. It is, therefore, difficult to translate the reduced number of multiplications into faster execution of the DFT by means of a software implementation of the algorithm. For this reason, a hardware implementation is considered in the current study, taking into account a design based on the algorithm prescription discussed by Zohar (1979). The hardware implementation of a FORTRAN subroutine is proposed, giving attention to a pipelining scheme in which 5 consecutive data batches are being operated on simultaneously, each batch undergoing one of 5 processing phases.
Implementation of modified SPIHT algorithm for Compression of images
NASA Astrophysics Data System (ADS)
Kurume, A. V.; Yana, D. M.
2011-12-01
We present a throughput-efficient FPGA implementation of the Set Partitioning in Hierarchical Trees (SPIHT) algorithm for compression of images. The SPIHT uses inherent redundancy among wavelet coefficients and suited for both grey and color images. The SPIHT algorithm uses dynamic data structure which hinders hardware realization. we have modified basic SPIHT in two ways, one by using static (fixed) mappings which represent significant information and the other by interchanging the sorting and refinement passes.
On the design, analysis, and implementation of efficient parallel algorithms
Sohn, S.M.
1989-01-01
There is considerable interest in developing algorithms for a variety of parallel computer architectures. This is not a trivial problem, although for certain models great progress has been made. Recently, general-purpose parallel machines have become available commercially. These machines possess widely varying interconnection topologies and data/instruction access schemes. It is important, therefore, to develop methodologies and design paradigms for not only synthesizing parallel algorithms from initial problem specifications, but also for mapping algorithms between different architectures. This work has considered both of these problems. A systolic array consists of a large collection of simple processors that are interconnected in a uniform pattern. The author has studied in detain the problem of mapping systolic algorithms onto more general-purpose parallel architectures such as the hypercube. The hypercube architecture is notable due to its symmetry and high connectivity, characteristics which are conducive to the efficient embedding of parallel algorithms. Although the parallel-to-parallel mapping techniques have yielded efficient target algorithms, it is not surprising that an algorithm designed directly for a particular parallel model would achieve superior performance. In this context, the author has developed hypercube algorithms for some important problems in speech and signal processing, text processing, language processing and artificial intelligence. These algorithms were implemented on a 64-node NCUBE/7 hypercube machine in order to evaluate their performance.
Implementation and testing of algorithms for data fitting
NASA Astrophysics Data System (ADS)
Monahan, Alison; Engelhardt, Larry
2012-03-01
This poster will describe an undergraduate senior research project involving the creation and testing of a java class to implement the Nelder-Mead algorithm, which can be used for data fitting. The performance between the Nelder-Mead algorithm and the Levenberg-Marquardt algorithm will be compared using a variety of different data. The new class will be made available at http://www.compadre.org/osp/items/detail.cfm?ID=11593. At the time of the presentation, this project will be nearing completion; and I will discuss my progress, successes, and challenges.
Efficient implementation of the adaptive scale pixel decomposition algorithm
NASA Astrophysics Data System (ADS)
Zhang, L.; Bhatnagar, S.; Rau, U.; Zhang, M.
2016-08-01
Context. Most popular algorithms in use to remove the effects of a telescope's point spread function (PSF) in radio astronomy are variants of the CLEAN algorithm. Most of these algorithms model the sky brightness using the delta-function basis, which results in undesired artefacts when used to image extended emission. The adaptive scale pixel decomposition (Asp-Clean) algorithm models the sky brightness on a scale-sensitive basis and thus gives a significantly better imaging performance when imaging fields that contain both resolved and unresolved emission. Aims: However, the runtime cost of Asp-Clean is higher than that of scale-insensitive algorithms. In this paper, we identify the most expensive step in the original Asp-Clean algorithm and present an efficient implementation of it, which significantly reduces the computational cost while keeping the imaging performance comparable to the original algorithm. The PSF sidelobe levels of modern wide-band telescopes are significantly reduced, allowing us to make approximations to reduce the computational cost, which in turn allows for the deconvolution of larger images on reasonable timescales. Methods: As in the original algorithm, scales in the image are estimated through function fitting. Here we introduce an analytical method to model extended emission, and a modified method for estimating the initial values used for the fitting procedure, which ultimately leads to a lower computational cost. Results: The new implementation was tested with simulated EVLA data and the imaging performance compared well with the original Asp-Clean algorithm. Tests show that the current algorithm can recover features at different scales with lower computational cost.
NMR implementation of adiabatic SAT algorithm using strongly modulated pulses.
Mitra, Avik; Mahesh, T S; Kumar, Anil
2008-03-28
NMR implementation of adiabatic algorithms face severe problems in homonuclear spin systems since the qubit selective pulses are long and during this period, evolution under the Hamiltonian and decoherence cause errors. The decoherence destroys the answer as it causes the final state to evolve to mixed state and in homonuclear systems, evolution under the internal Hamiltonian causes phase errors preventing the initial state to converge to the solution state. The resolution of these issues is necessary before one can proceed to implement an adiabatic algorithm in a large system where homonuclear coupled spins will become a necessity. In the present work, we demonstrate that by using "strongly modulated pulses" (SMPs) for the creation of interpolating Hamiltonian, one can circumvent both the problems and successfully implement the adiabatic SAT algorithm in a homonuclear three qubit system. This work also demonstrates that the SMPs tremendously reduce the time taken for the implementation of the algorithm, can overcome problems associated with decoherence, and will be the modality in future implementation of quantum information processing by NMR. PMID:18376911
NMR implementation of adiabatic SAT algorithm using strongly modulated pulses
NASA Astrophysics Data System (ADS)
Mitra, Avik; Mahesh, T. S.; Kumar, Anil
2008-03-01
NMR implementation of adiabatic algorithms face severe problems in homonuclear spin systems since the qubit selective pulses are long and during this period, evolution under the Hamiltonian and decoherence cause errors. The decoherence destroys the answer as it causes the final state to evolve to mixed state and in homonuclear systems, evolution under the internal Hamiltonian causes phase errors preventing the initial state to converge to the solution state. The resolution of these issues is necessary before one can proceed to implement an adiabatic algorithm in a large system where homonuclear coupled spins will become a necessity. In the present work, we demonstrate that by using "strongly modulated pulses" (SMPs) for the creation of interpolating Hamiltonian, one can circumvent both the problems and successfully implement the adiabatic SAT algorithm in a homonuclear three qubit system. This work also demonstrates that the SMPs tremendously reduce the time taken for the implementation of the algorithm, can overcome problems associated with decoherence, and will be the modality in future implementation of quantum information processing by NMR.
A novel pipeline based FPGA implementation of a genetic algorithm
NASA Astrophysics Data System (ADS)
Thirer, Nonel
2014-05-01
To solve problems when an analytical solution is not available, more and more bio-inspired computation techniques have been applied in the last years. Thus, an efficient algorithm is the Genetic Algorithm (GA), which imitates the biological evolution process, finding the solution by the mechanism of "natural selection", where the strong has higher chances to survive. A genetic algorithm is an iterative procedure which operates on a population of individuals called "chromosomes" or "possible solutions" (usually represented by a binary code). GA performs several processes with the population individuals to produce a new population, like in the biological evolution. To provide a high speed solution, pipelined based FPGA hardware implementations are used, with a nstages pipeline for a n-phases genetic algorithm. The FPGA pipeline implementations are constraints by the different execution time of each stage and by the FPGA chip resources. To minimize these difficulties, we propose a bio-inspired technique to modify the crossover step by using non identical twins. Thus two of the chosen chromosomes (parents) will build up two new chromosomes (children) not only one as in classical GA. We analyze the contribution of this method to reduce the execution time in the asynchronous and synchronous pipelines and also the possibility to a cheaper FPGA implementation, by using smaller populations. The full hardware architecture for a FPGA implementation to our target ALTERA development card is presented and analyzed.
Implementing Linear Algebra Related Algorithms on the TI-92+ Calculator.
ERIC Educational Resources Information Center
Alexopoulos, John; Abraham, Paul
2001-01-01
Demonstrates a less utilized feature of the TI-92+: its natural and powerful programming language. Shows how to implement several linear algebra related algorithms including the Gram-Schmidt process, Least Squares Approximations, Wronskians, Cholesky Decompositions, and Generalized Linear Least Square Approximations with QR Decompositions.…
Efficient implementations of hyperspectral chemical-detection algorithms
NASA Astrophysics Data System (ADS)
Brett, Cory J. C.; DiPietro, Robert S.; Manolakis, Dimitris G.; Ingle, Vinay K.
2013-10-01
Many military and civilian applications depend on the ability to remotely sense chemical clouds using hyperspectral imagers, from detecting small but lethal concentrations of chemical warfare agents to mapping plumes in the aftermath of natural disasters. Real-time operation is critical in these applications but becomes diffcult to achieve as the number of chemicals we search for increases. In this paper, we present efficient CPU and GPU implementations of matched-filter based algorithms so that real-time operation can be maintained with higher chemical-signature counts. The optimized C++ implementations show between 3x and 9x speedup over vectorized MATLAB implementations.
A distributed Canny edge detector: algorithm and FPGA implementation.
Xu, Qian; Varadarajan, Srenivas; Chakrabarti, Chaitali; Karam, Lina J
2014-07-01
The Canny edge detector is one of the most widely used edge detection algorithms due to its superior performance. Unfortunately, not only is it computationally more intensive as compared with other edge detection algorithms, but it also has a higher latency because it is based on frame-level statistics. In this paper, we propose a mechanism to implement the Canny algorithm at the block level without any loss in edge detection performance compared with the original frame-level Canny algorithm. Directly applying the original Canny algorithm at the block-level leads to excessive edges in smooth regions and to loss of significant edges in high-detailed regions since the original Canny computes the high and low thresholds based on the frame-level statistics. To solve this problem, we present a distributed Canny edge detection algorithm that adaptively computes the edge detection thresholds based on the block type and the local distribution of the gradients in the image block. In addition, the new algorithm uses a nonuniform gradient magnitude histogram to compute block-based hysteresis thresholds. The resulting block-based algorithm has a significantly reduced latency and can be easily integrated with other block-based image codecs. It is capable of supporting fast edge detection of images and videos with high resolutions, including full-HD since the latency is now a function of the block size instead of the frame size. In addition, quantitative conformance evaluations and subjective tests show that the edge detection performance of the proposed algorithm is better than the original frame-based algorithm, especially when noise is present in the images. Finally, this algorithm is implemented using a 32 computing engine architecture and is synthesized on the Xilinx Virtex-5 FPGA. The synthesized architecture takes only 0.721 ms (including the SRAM READ/WRITE time and the computation time) to detect edges of 512 × 512 images in the USC SIPI database when clocked at 100
Implementation of realistic image rendition algorithm based on DSP
NASA Astrophysics Data System (ADS)
Lv, Lily; Gao, Kun; Ni, Guoqiang; Zhou, Liwei; Shao, Xiaoguang
2010-11-01
Realistic image rendition is to reproduce the human perception of natural scenes. Retinex is a classical algorithm that simultaneously provides high dynamic range compression contrast and color constancy of an image. In this paper, we discuss a design of a digital signal processor (DSP) implementation of the single scale monochromatic Retinex algorithm. The target processor is Texas Instruments TMS320DM642, a 32-bit fix point DSP which is clocked at 600 MHz. This DSP hardware platform designed is of powerful consumption and video image processing capability. We give an overview of the DSP hardware and software, and discuss some feasible optimizations to achieve a real-time version of the Retinex algorithm. In the end, the performance of the algorithm executing on DSP platform is shown.
Experimental implementation of Grover's search algorithm with neutral atom qubits
NASA Astrophysics Data System (ADS)
Sun, Yuan; Lichtman, Martin; Baker, Kevin; Saffman, Mark
2016-05-01
Grover's algorithm for searching an unsorted data base provides a provable speedup over the best possible classical search and is therefore a test bed for demonstrating the power of quantum computation. The algorithm has been demonstrated with NMR, trapped ion, photonic, and superconducting hardware, but only with two qubits encoding a four element database. We report on progress towards experimental demonstration of Grover's algorithm using two and three neutral atom qubits encoding a database with up to eight elements. Our approach uses a Rydberg blockade Ck NOT gate for efficient implementation of the Grover iterations. Quantum Monte Carlo simulations of the algorithm performance that account for gate errors and decoherence rates are compared with experimental results. Work supported by the IARPA MQCO program.
The Sys-Rem Detrending Algorithm: Implementation and Testing
NASA Astrophysics Data System (ADS)
Mazeh, T.; Tamuz, O.; Zucker, S.
2007-07-01
Sys-Rem (Tamuz, Mazeh & Zucker 2005) is a detrending algorithm designed to remove systematic effects in a large set of light curves obtained by a photometric survey. The algorithm works without any prior knowledge of the effects, as long as they appear in many stars of the sample. This paper presents the basic principles of Sys-Rem and discusses a parameterization used to determine the number of effects removed. We assess the performance of Sys-Rem on simulated transits injected into WHAT survey data. This test is proposed as a general scheme to assess the effectiveness of detrending algorithms. Application of Sys-Rem to the OGLE dataset demonstrates the power of the algorithm. We offer a coded implementation of Sys-Rem to the community.
Implementing a Gaussian Process Learning Algorithm in Mixed Parallel Environment
Chandola, Varun; Vatsavai, Raju
2011-01-01
In this paper, we present a scalability analysis of a parallel Gaussian process training algorithm to simultaneously analyze a massive number of time series. We study three different parallel implementations: using threads, MPI, and a hybrid implementation using threads and MPI. We compare the scalability for the multi-threaded implementation on three different hardware platforms: a Mac desktop with two quad-core Intel Xeon processors (16 virtual cores), a Linux cluster node with four quad-core 2.3 GHz AMD Opteron processors, and SGI Altix ICE 8200 cluster node with two quad-core Intel Xeon processors (16 virtual cores). We also study the scalability of the MPI based and the hybrid MPI and thread based implementations on the SGI cluster with 128 nodes (2048 cores). Experimental results show that the hybrid implementation scales better than the multi-threaded and MPI based implementations. The hybrid implementation, using 1536 cores, can analyze a remote sensing data set with over 4 million time series in nearly 5 seconds while the serial algorithm takes nearly 12 hours to process the same data set.
3D video sequence reconstruction algorithms implemented on a DSP
NASA Astrophysics Data System (ADS)
Ponomaryov, V. I.; Ramos-Diaz, E.
2011-03-01
A novel approach for 3D image and video reconstruction is proposed and implemented. This is based on the wavelet atomic functions (WAF) that have demonstrated better approximation properties in different processing problems in comparison with classical wavelets. Disparity maps using WAF are formed, and then they are employed in order to present 3D visualization using color anaglyphs. Additionally, the compression via Pth law is performed to improve the disparity map quality. Other approaches such as optical flow and stereo matching algorithm are also implemented as the comparative approaches. Numerous simulation results have justified the efficiency of the novel framework. The implementation of the proposed algorithm on the Texas Instruments DSP TMS320DM642 permits to demonstrate possible real time processing mode during 3D video reconstruction for images and video sequences.
Bomble, Laeetitia; Pellegrini, Philippe; Ghesquiere, Pierre; Desouter-Lecomte, Michele
2010-12-15
We numerically investigate the possibilities of driving quantum algorithms with laser pulses in a register of ultracold NaCs polar molecules in a static electric field. We focus on the possibilities of performing scalable logical operations by considering circuits that involve intermolecular gates (implemented on adjacent interacting molecules) to enable the transfer of information from one molecule to another during conditional laser-driven population inversions. We study the implementation of an arithmetic operation (the addition of 0 or 1 on a binary digit and a carry in) which requires population inversions only and the Deutsch-Jozsa algorithm which requires a control of the phases. Under typical experimental conditions, our simulations show that high-fidelity logical operations involving several qubits can be performed in a time scale of a few hundreds of microseconds, opening promising perspectives for the manipulation of a large number of qubits in these systems.
Efficient hardware implementation of the lightweight block encryption algorithm LEA.
Lee, Donggeon; Kim, Dong-Chan; Kwon, Daesung; Kim, Howon
2014-01-01
Recently, due to the advent of resource-constrained trends, such as smartphones and smart devices, the computing environment is changing. Because our daily life is deeply intertwined with ubiquitous networks, the importance of security is growing. A lightweight encryption algorithm is essential for secure communication between these kinds of resource-constrained devices, and many researchers have been investigating this field. Recently, a lightweight block cipher called LEA was proposed. LEA was originally targeted for efficient implementation on microprocessors, as it is fast when implemented in software and furthermore, it has a small memory footprint. To reflect on recent technology, all required calculations utilize 32-bit wide operations. In addition, the algorithm is comprised of not complex S-Box-like structures but simple Addition, Rotation, and XOR operations. To the best of our knowledge, this paper is the first report on a comprehensive hardware implementation of LEA. We present various hardware structures and their implementation results according to key sizes. Even though LEA was originally targeted at software efficiency, it also shows high efficiency when implemented as hardware. PMID:24406859
Efficient Hardware Implementation of the Lightweight Block Encryption Algorithm LEA
Lee, Donggeon; Kim, Dong-Chan; Kwon, Daesung; Kim, Howon
2014-01-01
Recently, due to the advent of resource-constrained trends, such as smartphones and smart devices, the computing environment is changing. Because our daily life is deeply intertwined with ubiquitous networks, the importance of security is growing. A lightweight encryption algorithm is essential for secure communication between these kinds of resource-constrained devices, and many researchers have been investigating this field. Recently, a lightweight block cipher called LEA was proposed. LEA was originally targeted for efficient implementation on microprocessors, as it is fast when implemented in software and furthermore, it has a small memory footprint. To reflect on recent technology, all required calculations utilize 32-bit wide operations. In addition, the algorithm is comprised of not complex S-Box-like structures but simple Addition, Rotation, and XOR operations. To the best of our knowledge, this paper is the first report on a comprehensive hardware implementation of LEA. We present various hardware structures and their implementation results according to key sizes. Even though LEA was originally targeted at software efficiency, it also shows high efficiency when implemented as hardware. PMID:24406859
FPGA implementation of digital down converter using CORDIC algorithm
NASA Astrophysics Data System (ADS)
Agarwal, Ashok; Lakshmi, Boppana
2013-01-01
In radio receivers, Digital Down Converters (DDC) are used to translate the signal from Intermediate Frequency level to baseband. It also decimates the oversampled signal to a lower sample rate, eliminating the need of a high end digital signal processors. In this paper we have implemented architecture for DDC employing CORDIC algorithm, which down converts an IF signal of 70MHz (3G) to 200 KHz baseband GSM signal, with an SFDR greater than 100dB. The implemented architecture reduces the hardware resource requirements by 15 percent when compared with other architecture available in the literature due to elimination of explicit multipliers and a quadrature phase shifter for mixing.
VIRTEX-5 Fpga Implementation of Advanced Encryption Standard Algorithm
NASA Astrophysics Data System (ADS)
Rais, Muhammad H.; Qasim, Syed M.
2010-06-01
In this paper, we present an implementation of Advanced Encryption Standard (AES) cryptographic algorithm using state-of-the-art Virtex-5 Field Programmable Gate Array (FPGA). The design is coded in Very High Speed Integrated Circuit Hardware Description Language (VHDL). Timing simulation is performed to verify the functionality of the designed circuit. Performance evaluation is also done in terms of throughput and area. The design implemented on Virtex-5 (XC5VLX50FFG676-3) FPGA achieves a maximum throughput of 4.34 Gbps utilizing a total of 399 slices.
Implementing wide baseline matching algorithms on a graphics processing unit.
Rothganger, Fredrick H.; Larson, Kurt W.; Gonzales, Antonio Ignacio; Myers, Daniel S.
2007-10-01
Wide baseline matching is the state of the art for object recognition and image registration problems in computer vision. Though effective, the computational expense of these algorithms limits their application to many real-world problems. The performance of wide baseline matching algorithms may be improved by using a graphical processing unit as a fast multithreaded co-processor. In this paper, we present an implementation of the difference of Gaussian feature extractor, based on the CUDA system of GPU programming developed by NVIDIA, and implemented on their hardware. For a 2000x2000 pixel image, the GPU-based method executes nearly thirteen times faster than a comparable CPU-based method, with no significant loss of accuracy.
Implementation and Optimization of Image Processing Algorithms on Embedded GPU
NASA Astrophysics Data System (ADS)
Singhal, Nitin; Yoo, Jin Woo; Choi, Ho Yeol; Park, In Kyu
In this paper, we analyze the key factors underlying the implementation, evaluation, and optimization of image processing and computer vision algorithms on embedded GPU using OpenGL ES 2.0 shader model. First, we present the characteristics of the embedded GPU and its inherent advantage when compared to embedded CPU. Additionally, we propose techniques to achieve increased performance with optimized shader design. To show the effectiveness of the proposed techniques, we employ cartoon-style non-photorealistic rendering (NPR), speeded-up robust feature (SURF) detection, and stereo matching as our example algorithms. Performance is evaluated in terms of the execution time and speed-up achieved in comparison with the implementation on embedded CPU.
A high performance hardware implementation image encryption with AES algorithm
NASA Astrophysics Data System (ADS)
Farmani, Ali; Jafari, Mohamad; Miremadi, Seyed Sohrab
2011-06-01
This paper describes implementation of a high-speed encryption algorithm with high throughput for encrypting the image. Therefore, we select a highly secured symmetric key encryption algorithm AES(Advanced Encryption Standard), in order to increase the speed and throughput using pipeline technique in four stages, control unit based on logic gates, optimal design of multiplier blocks in mixcolumn phase and simultaneous production keys and rounds. Such procedure makes AES suitable for fast image encryption. Implementation of a 128-bit AES on FPGA of Altra company has been done and the results are as follow: throughput, 6 Gbps in 471MHz. The time of encrypting in tested image with 32*32 size is 1.15ms.
Control algorithm implementation for a redundant degree of freedom manipulator
NASA Technical Reports Server (NTRS)
Cohan, Steve
1991-01-01
This project's purpose is to develop and implement control algorithms for a kinematically redundant robotic manipulator. The manipulator is being developed concurrently by Odetics Inc., under internal research and development funding. This SBIR contract supports algorithm conception, development, and simulation, as well as software implementation and integration with the manipulator hardware. The Odetics Dexterous Manipulator is a lightweight, high strength, modular manipulator being developed for space and commercial applications. It has seven fully active degrees of freedom, is electrically powered, and is fully operational in 1 G. The manipulator consists of five self-contained modules. These modules join via simple quick-disconnect couplings and self-mating connectors which allow rapid assembly/disassembly for reconfiguration, transport, or servicing. Each joint incorporates a unique drive train design which provides zero backlash operation, is insensitive to wear, and is single fault tolerant to motor or servo amplifier failure. The sensing system is also designed to be single fault tolerant. Although the initial prototype is not space qualified, the design is well-suited to meeting space qualification requirements. The control algorithm design approach is to develop a hierarchical system with well defined access and interfaces at each level. The high level endpoint/configuration control algorithm transforms manipulator endpoint position/orientation commands to joint angle commands, providing task space motion. At the same time, the kinematic redundancy is resolved by controlling the configuration (pose) of the manipulator, using several different optimizing criteria. The center level of the hierarchy servos the joints to their commanded trajectories using both linear feedback and model-based nonlinear control techniques. The lowest control level uses sensed joint torque to close torque servo loops, with the goal of improving the manipulator dynamic behavior
Decoding the brain's algorithm for categorization from its neural implementation.
Mack, Michael L; Preston, Alison R; Love, Bradley C
2013-10-21
Acts of cognition can be described at different levels of analysis: what behavior should characterize the act, what algorithms and representations underlie the behavior, and how the algorithms are physically realized in neural activity [1]. Theories that bridge levels of analysis offer more complete explanations by leveraging the constraints present at each level [2-4]. Despite the great potential for theoretical advances, few studies of cognition bridge levels of analysis. For example, formal cognitive models of category decisions accurately predict human decision making [5, 6], but whether model algorithms and representations supporting category decisions are consistent with underlying neural implementation remains unknown. This uncertainty is largely due to the hurdle of forging links between theory and brain [7-9]. Here, we tackle this critical problem by using brain response to characterize the nature of mental computations that support category decisions to evaluate two dominant, and opposing, models of categorization. We found that brain states during category decisions were significantly more consistent with latent model representations from exemplar [5] rather than prototype theory [10, 11]. Representations of individual experiences, not the abstraction of experiences, are critical for category decision making. Holding models accountable for behavior and neural implementation provides a means for advancing more complete descriptions of the algorithms of cognition. PMID:24094852
A bioinspired collision detection algorithm for VLSI implementation
NASA Astrophysics Data System (ADS)
Cuadri, J.; Linan, G.; Stafford, R.; Keil, M. S.; Roca, E.
2005-06-01
In this paper a bioinspired algorithm for collision detection is proposed, based on previous models of the locust (Locusta migratoria) visual system reported by F.C. Rind and her group, in the University of Newcastle-upon-Tyne. The algorithm is suitable for VLSI implementation in standard CMOS technologies as a system-on-chip for automotive applications. The working principle of the algorithm is to process a video stream that represents the current scenario, and to fire an alarm whenever an object approaches on a collision course. Moreover, it establishes a scale of warning states, from no danger to collision alarm, depending on the activity detected in the current scenario. In the worst case, the minimum time before collision at which the model fires the collision alarm is 40 msec (1 frame before, at 25 frames per second). Since the average time to successfully fire an airbag system is 2 msec, even in the worst case, this algorithm would be very helpful to more efficiently arm the airbag system, or even take some kind of collision avoidance countermeasures. Furthermore, two additional modules have been included: a "Topological Feature Estimator" and an "Attention Focusing Algorithm". The former takes into account the shape of the approaching object to decide whether it is a person, a road line or a car. This helps to take more adequate countermeasures and to filter false alarms. The latter centres the processing power into the most active zones of the input frame, thus saving memory and processing time resources.
Multiangle Implementation of Atmospheric Correction (MAIAC): 2. Aerosol Algorithm
NASA Technical Reports Server (NTRS)
Lyapustin, A.; Wang, Y.; Laszlo, I.; Kahn, R.; Korkin, S.; Remer, L.; Levy, R.; Reid, J. S.
2011-01-01
An aerosol component of a new multiangle implementation of atmospheric correction (MAIAC) algorithm is presented. MAIAC is a generic algorithm developed for the Moderate Resolution Imaging Spectroradiometer (MODIS), which performs aerosol retrievals and atmospheric correction over both dark vegetated surfaces and bright deserts based on a time series analysis and image-based processing. The MAIAC look-up tables explicitly include surface bidirectional reflectance. The aerosol algorithm derives the spectral regression coefficient (SRC) relating surface bidirectional reflectance in the blue (0.47 micron) and shortwave infrared (2.1 micron) bands; this quantity is prescribed in the MODIS operational Dark Target algorithm based on a parameterized formula. The MAIAC aerosol products include aerosol optical thickness and a fine-mode fraction at resolution of 1 km. This high resolution, required in many applications such as air quality, brings new information about aerosol sources and, potentially, their strength. AERONET validation shows that the MAIAC and MOD04 algorithms have similar accuracy over dark and vegetated surfaces and that MAIAC generally improves accuracy over brighter surfaces due to the SRC retrieval and explicit bidirectional reflectance factor characterization, as demonstrated for several U.S. West Coast AERONET sites. Due to its generic nature and developed angular correction, MAIAC performs aerosol retrievals over bright deserts, as demonstrated for the Solar Village Aerosol Robotic Network (AERONET) site in Saudi Arabia.
Implementation of several mathematical algorithms to breast tissue density classification
NASA Astrophysics Data System (ADS)
Quintana, C.; Redondo, M.; Tirao, G.
2014-02-01
The accuracy of mammographic abnormality detection methods is strongly dependent on breast tissue characteristics, where a dense breast tissue can hide lesions causing cancer to be detected at later stages. In addition, breast tissue density is widely accepted to be an important risk indicator for the development of breast cancer. This paper presents the implementation and the performance of different mathematical algorithms designed to standardize the categorization of mammographic images, according to the American College of Radiology classifications. These mathematical techniques are based on intrinsic properties calculations and on comparison with an ideal homogeneous image (joint entropy, mutual information, normalized cross correlation and index Q) as categorization parameters. The algorithms evaluation was performed on 100 cases of the mammographic data sets provided by the Ministerio de Salud de la Provincia de Córdoba, Argentina—Programa de Prevención del Cáncer de Mama (Department of Public Health, Córdoba, Argentina, Breast Cancer Prevention Program). The obtained breast classifications were compared with the expert medical diagnostics, showing a good performance. The implemented algorithms revealed a high potentiality to classify breasts into tissue density categories.
Experience with a Genetic Algorithm Implemented on a Multiprocessor Computer
NASA Technical Reports Server (NTRS)
Plassman, Gerald E.; Sobieszczanski-Sobieski, Jaroslaw
2000-01-01
Numerical experiments were conducted to find out the extent to which a Genetic Algorithm (GA) may benefit from a multiprocessor implementation, considering, on one hand, that analyses of individual designs in a population are independent of each other so that they may be executed concurrently on separate processors, and, on the other hand, that there are some operations in a GA that cannot be so distributed. The algorithm experimented with was based on a gaussian distribution rather than bit exchange in the GA reproductive mechanism, and the test case was a hub frame structure of up to 1080 design variables. The experimentation engaging up to 128 processors confirmed expectations of radical elapsed time reductions comparing to a conventional single processor implementation. It also demonstrated that the time spent in the non-distributable parts of the algorithm and the attendant cross-processor communication may have a very detrimental effect on the efficient utilization of the multiprocessor machine and on the number of processors that can be used effectively in a concurrent manner. Three techniques were devised and tested to mitigate that effect, resulting in efficiency increasing to exceed 99 percent.
Developing and Implementing the Data Mining Algorithms in RAVEN
Sen, Ramazan Sonat; Maljovec, Daniel Patrick; Alfonsi, Andrea; Rabiti, Cristian
2015-09-01
The RAVEN code is becoming a comprehensive tool to perform probabilistic risk assessment, uncertainty quantification, and verification and validation. The RAVEN code is being developed to support many programs and to provide a set of methodologies and algorithms for advanced analysis. Scientific computer codes can generate enormous amounts of data. To post-process and analyze such data might, in some cases, take longer than the initial software runtime. Data mining algorithms/methods help in recognizing and understanding patterns in the data, and thus discover knowledge in databases. The methodologies used in the dynamic probabilistic risk assessment or in uncertainty and error quantification analysis couple system/physics codes with simulation controller codes, such as RAVEN. RAVEN introduces both deterministic and stochastic elements into the simulation while the system/physics code model the dynamics deterministically. A typical analysis is performed by sampling values of a set of parameter values. A major challenge in using dynamic probabilistic risk assessment or uncertainty and error quantification analysis for a complex system is to analyze the large number of scenarios generated. Data mining techniques are typically used to better organize and understand data, i.e. recognizing patterns in the data. This report focuses on development and implementation of Application Programming Interfaces (APIs) for different data mining algorithms, and the application of these algorithms to different databases.
Algorithms and implementations of APT resonant control system
Wang, Yi-Ming; Regan, A.
1997-08-01
A digital signal processor is implemented to control resonant frequency of the RFQ prototype in APT/LEDA. Two schemes are implemented to calculate the resonant frequency of RFQ. One uses the measurement of the forward and reflected fields. The other uses the measurement of the forward and transmitted fields. The former is sensitive and accurate when the operation frequency is relatively far from the resonant frequency. The latter gives accurate results when the operation frequency is close to the resonant frequency. Linearized algorithms are derived to calculate the resonant frequency of the RFQ efficiently using a fixed-point DSP. The control frequency range is about 100kHz for 350MHz operation frequency. A frequency agile scheme is employed using a dual direct digital synthesizer to drive klystron at the cavity`s resonant frequency (not necessarily the required beam resonant frequency) in power-up mode to quickly the cavity to the desired resonant frequency. This paper will address the algorithm implementation, error analysis, as well as related hardware design issues.
Infrared Jitter Imaging Data Reduction: Algorithms and Implementation
NASA Astrophysics Data System (ADS)
Devillard, Nicolas
Jitter imaging (also known as microscanning) is probably one of the most efficient ways to perform astronomical observations in the infrared. It requires very efficient filtering and recentering methods to produce the best possible output from raw data. This paper discusses issues attached to Poisson offset generation, efficient infrared sky filtering, offset recovery between planes through cross-correlation and/or point pattern recognition techniques, plane shifting with subpixel resolution through various kernel-based interpolation schemes, and 3D filtering for plane accumulation. Several algorithms are described for each step, having in mind an automatic data processing in pipeline mode (i.e., without user interaction) as intended for the Very Large Telescope. Implementation of these algorithms in optimized ANSI C (the eclipse library) is also described here.
Kodiak: An Implementation Framework for Branch and Bound Algorithms
NASA Technical Reports Server (NTRS)
Smith, Andrew P.; Munoz, Cesar A.; Narkawicz, Anthony J.; Markevicius, Mantas
2015-01-01
Recursive branch and bound algorithms are often used to refine and isolate solutions to several classes of global optimization problems. A rigorous computation framework for the solution of systems of equations and inequalities involving nonlinear real arithmetic over hyper-rectangular variable and parameter domains is presented. It is derived from a generic branch and bound algorithm that has been formally verified, and utilizes self-validating enclosure methods, namely interval arithmetic and, for polynomials and rational functions, Bernstein expansion. Since bounds computed by these enclosure methods are sound, this approach may be used reliably in software verification tools. Advantage is taken of the partial derivatives of the constraint functions involved in the system, firstly to reduce the branching factor by the use of bisection heuristics and secondly to permit the computation of bifurcation sets for systems of ordinary differential equations. The associated software development, Kodiak, is presented, along with examples of three different branch and bound problem types it implements.
Neural network implementations of data association algorithms for sensor fusion
NASA Technical Reports Server (NTRS)
Brown, Donald E.; Pittard, Clarence L.; Martin, Worthy N.
1989-01-01
The paper is concerned with locating a time varying set of entities in a fixed field when the entities are sensed at discrete time instances. At a given time instant a collection of bivariate Gaussian sensor reports is produced, and these reports estimate the location of a subset of the entities present in the field. A database of reports is maintained, which ideally should contain one report for each entity sensed. Whenever a collection of sensor reports is received, the database must be updated to reflect the new information. This updating requires association processing between the database reports and the new sensor reports to determine which pairs of sensor and database reports correspond to the same entity. Algorithms for performing this association processing are presented. Neural network implementation of the algorithms, along with simulation results comparing the approaches are provided.
Neural network fusion capabilities for efficient implementation of tracking algorithms
NASA Astrophysics Data System (ADS)
Sundareshan, Malur K.; Amoozegar, Farid
1996-05-01
The ability to efficiently fuse information of different forms for facilitating intelligent decision-making is one of the major capabilities of trained multilayer neural networks that is being recognized int eh recent times. While development of innovative adaptive control algorithms for nonlinear dynamical plants which attempt to exploit these capabilities seems to be more popular, a corresponding development of nonlinear estimation algorithms using these approaches, particularly for application in target surveillance and guidance operations, has not received similar attention. In this paper we describe the capabilities and functionality of neural network algorithms for data fusion and implementation of nonlinear tracking filters. For a discussion of details and for serving as a vehicle for quantitative performance evaluations, the illustrative case of estimating the position and velocity of surveillance targets is considered. Efficient target tracking algorithms that can utilize data from a host of sensing modalities and are capable of reliably tracking even uncooperative targets executing fast and complex maneuvers are of interest in a number of applications. The primary motivation for employing neural networks in these applications comes form the efficiency with which more features extracted from different sensor measurements can be utilized as inputs for estimating target maneuvers. Such an approach results in an overall nonlinear tracking filter which has several advantages over the popular efforts at designing nonlinear estimation algorithms for tracking applications, the principle one being the reduction of mathematical and computational complexities. A system architecture that efficiently integrates the processing capabilities of a trained multilayer neural net with the tracking performance of a Kalman filter is described in this paper.
A pipelined FPGA implementation of an encryption algorithm based on genetic algorithm
NASA Astrophysics Data System (ADS)
Thirer, Nonel
2013-05-01
With the evolution of digital data storage and exchange, it is essential to protect the confidential information from every unauthorized access. High performance encryption algorithms were developed and implemented by software and hardware. Also many methods to attack the cipher text were developed. In the last years, the genetic algorithm has gained much interest in cryptanalysis of cipher texts and also in encryption ciphers. This paper analyses the possibility to use the genetic algorithm as a multiple key sequence generator for an AES (Advanced Encryption Standard) cryptographic system, and also to use a three stages pipeline (with four main blocks: Input data, AES Core, Key generator, Output data) to provide a fast encryption and storage/transmission of a large amount of data.
DSP Implementation of the Multiscale Retinex Image Enhancement Algorithm
NASA Technical Reports Server (NTRS)
Hines, Glenn D.; Rahman, Zia-ur; Jobson, Daniel J.; Woodell, Glenn A.
2004-01-01
The Retinex is a general-purpose image enhancement algorithm that is used to produce good visual representations of scenes. It performs a non-linear spatial/ spectral transform that synthesizes strong local contrast enhancement and color constancy. A real-time, video frame rate implementation of the Retinex is required to meet the needs of various potential users. Retinex processing contains a relatively large number of complex computations, thus to achieve real-time performance using current technologies requires specialized hardware and software. In this paper we discuss the design and development of a digital signal processor (DSP) implementation of the Retinex. The target processor is a Texas Instruments TMS320C6711 floating point DSP. NTSC video is captured using a dedicated frame grabber card, Retinex processed, and displayed on a standard monitor. We discuss the optimizations used to achieve real-time performance of the Retinex and also describe our future plans on using alternative architectures.
DSP Implementation of the Retinex Image Enhancement Algorithm
NASA Technical Reports Server (NTRS)
Hines, Glenn; Rahman, Zia-Ur; Jobson, Daniel; Woodell, Glenn
2004-01-01
The Retinex is a general-purpose image enhancement algorithm that is used to produce good visual representations of scenes. It performs a non-linear spatial/spectral transform that synthesizes strong local contrast enhancement and color constancy. A real-time, video frame rate implementation of the Retinex is required to meet the needs of various potential users. Retinex processing contains a relatively large number of complex computations, thus to achieve real-time performance using current technologies requires specialized hardware and software. In this paper we discuss the design and development of a digital signal processor (DSP) implementation of the Retinex. The target processor is a Texas Instruments TMS320C6711 floating point DSP. NTSC video is captured using a dedicated frame-grabber card, Retinex processed, and displayed on a standard monitor. We discuss the optimizations used to achieve real-time performance of the Retinex and also describe our future plans on using alternative architectures.
Purgatorio - A new implementation of the Inferno algorithm
Wilson, B; Sonnad, V; Sterne, P; Isaacs, W
2005-03-29
For astrophysical applications, as well as modeling laser-produced plasmas, there is a continual need for equation-of-state data over a wide domain of physical conditions. This paper presents algorithmic aspects for computing the Helmholtz free energy of plasma electrons for temperatures spanning from a few Kelvin to several KeV, and densities ranging from essentially isolated ion conditions to such large compressions that most bound orbitals become delocalized. The objective is high precision results in order to compute pressure and other thermodynamic quantities by numerical differentiation. This approach has the advantage that internal thermodynamic self-consistency is ensured, regardless of the specific physical model, but at the cost of very stringent numerical tolerances for each operation. The computational aspects we address in this paper are faced by any model that relies on input from the quantum mechanical spectrum of a spherically symmetric Hamiltonian operator. The particular physical model we employ is that of INFERNO; of a spherically averaged ion embedded in jellium. An overview of PURGATORIO, a new implementation of the INFERNO equation of state model, is presented. The new algorithm emphasizes a novel decimation scheme for automatically resolving the structure of the continuum density of states, circumventing limitations of the pseudo-R matrix algorithm previously utilized.
Neuromorphic implementations of neurobiological learning algorithms for spiking neural networks.
Walter, Florian; Röhrbein, Florian; Knoll, Alois
2015-12-01
The application of biologically inspired methods in design and control has a long tradition in robotics. Unlike previous approaches in this direction, the emerging field of neurorobotics not only mimics biological mechanisms at a relatively high level of abstraction but employs highly realistic simulations of actual biological nervous systems. Even today, carrying out these simulations efficiently at appropriate timescales is challenging. Neuromorphic chip designs specially tailored to this task therefore offer an interesting perspective for neurorobotics. Unlike Von Neumann CPUs, these chips cannot be simply programmed with a standard programming language. Like real brains, their functionality is determined by the structure of neural connectivity and synaptic efficacies. Enabling higher cognitive functions for neurorobotics consequently requires the application of neurobiological learning algorithms to adjust synaptic weights in a biologically plausible way. In this paper, we therefore investigate how to program neuromorphic chips by means of learning. First, we provide an overview over selected neuromorphic chip designs and analyze them in terms of neural computation, communication systems and software infrastructure. On the theoretical side, we review neurobiological learning techniques. Based on this overview, we then examine on-die implementations of these learning algorithms on the considered neuromorphic chips. A final discussion puts the findings of this work into context and highlights how neuromorphic hardware can potentially advance the field of autonomous robot systems. The paper thus gives an in-depth overview of neuromorphic implementations of basic mechanisms of synaptic plasticity which are required to realize advanced cognitive capabilities with spiking neural networks. PMID:26422422
Cascade Error Projection: A Learning Algorithm for Hardware Implementation
NASA Technical Reports Server (NTRS)
Duong, Tuan A.; Daud, Taher
1996-01-01
In this paper, we workout a detailed mathematical analysis for a new learning algorithm termed Cascade Error Projection (CEP) and a general learning frame work. This frame work can be used to obtain the cascade correlation learning algorithm by choosing a particular set of parameters. Furthermore, CEP learning algorithm is operated only on one layer, whereas the other set of weights can be calculated deterministically. In association with the dynamical stepsize change concept to convert the weight update from infinite space into a finite space, the relation between the current stepsize and the previous energy level is also given and the estimation procedure for optimal stepsize is used for validation of our proposed technique. The weight values of zero are used for starting the learning for every layer, and a single hidden unit is applied instead of using a pool of candidate hidden units similar to cascade correlation scheme. Therefore, simplicity in hardware implementation is also obtained. Furthermore, this analysis allows us to select from other methods (such as the conjugate gradient descent or the Newton's second order) one of which will be a good candidate for the learning technique. The choice of learning technique depends on the constraints of the problem (e.g., speed, performance, and hardware implementation); one technique may be more suitable than others. Moreover, for a discrete weight space, the theoretical analysis presents the capability of learning with limited weight quantization. Finally, 5- to 8-bit parity and chaotic time series prediction problems are investigated; the simulation results demonstrate that 4-bit or more weight quantization is sufficient for learning neural network using CEP. In addition, it is demonstrated that this technique is able to compensate for less bit weight resolution by incorporating additional hidden units. However, generation result may suffer somewhat with lower bit weight quantization.
An overview of SuperLU: Algorithms, implementation, and userinterface
Li, Xiaoye S.
2003-09-30
We give an overview of the algorithms, design philosophy,and implementation techniques in the software SuperLU, for solving sparseunsymmetric linear systems. In particular, we highlight the differencesbetween the sequential SuperLU (including its multithreaded extension)and parallel SuperLU_DIST. These include the numerical pivoting strategy,the ordering strategy for preserving sparsity, the ordering in which theupdating tasks are performed, the numerical kernel, and theparallelization strategy. Because of the scalability concern, theparallel code is drastically different from the sequential one. Wedescribe the user interfaces ofthe libraries, and illustrate how to usethe libraries most efficiently depending on some matrix characteristics.Finally, we give some examples of how the solver has been used inlarge-scale scientific applications, and the performance.
All-Optical Implementation of the Ant Colony Optimization Algorithm
NASA Astrophysics Data System (ADS)
Hu, Wenchao; Wu, Kan; Shum, Perry Ping; Zheludev, Nikolay I.; Soci, Cesare
2016-05-01
We report all-optical implementation of the optimization algorithm for the famous “ant colony” problem. Ant colonies progressively optimize pathway to food discovered by one of the ants through identifying the discovered route with volatile chemicals (pheromones) secreted on the way back from the food deposit. Mathematically this is an important example of graph optimization problem with dynamically changing parameters. Using an optical network with nonlinear waveguides to represent the graph and a feedback loop, we experimentally show that photons traveling through the network behave like ants that dynamically modify the environment to find the shortest pathway to any chosen point in the graph. This proof-of-principle demonstration illustrates how transient nonlinearity in the optical system can be exploited to tackle complex optimization problems directly, on the hardware level, which may be used for self-routing of optical signals in transparent communication networks and energy flow in photonic systems.
All-Optical Implementation of the Ant Colony Optimization Algorithm.
Hu, Wenchao; Wu, Kan; Shum, Perry Ping; Zheludev, Nikolay I; Soci, Cesare
2016-01-01
We report all-optical implementation of the optimization algorithm for the famous "ant colony" problem. Ant colonies progressively optimize pathway to food discovered by one of the ants through identifying the discovered route with volatile chemicals (pheromones) secreted on the way back from the food deposit. Mathematically this is an important example of graph optimization problem with dynamically changing parameters. Using an optical network with nonlinear waveguides to represent the graph and a feedback loop, we experimentally show that photons traveling through the network behave like ants that dynamically modify the environment to find the shortest pathway to any chosen point in the graph. This proof-of-principle demonstration illustrates how transient nonlinearity in the optical system can be exploited to tackle complex optimization problems directly, on the hardware level, which may be used for self-routing of optical signals in transparent communication networks and energy flow in photonic systems. PMID:27222098
All-Optical Implementation of the Ant Colony Optimization Algorithm
Hu, Wenchao; Wu, Kan; Shum, Perry Ping; Zheludev, Nikolay I.; Soci, Cesare
2016-01-01
We report all-optical implementation of the optimization algorithm for the famous “ant colony” problem. Ant colonies progressively optimize pathway to food discovered by one of the ants through identifying the discovered route with volatile chemicals (pheromones) secreted on the way back from the food deposit. Mathematically this is an important example of graph optimization problem with dynamically changing parameters. Using an optical network with nonlinear waveguides to represent the graph and a feedback loop, we experimentally show that photons traveling through the network behave like ants that dynamically modify the environment to find the shortest pathway to any chosen point in the graph. This proof-of-principle demonstration illustrates how transient nonlinearity in the optical system can be exploited to tackle complex optimization problems directly, on the hardware level, which may be used for self-routing of optical signals in transparent communication networks and energy flow in photonic systems. PMID:27222098
FPGA Implementation of Generalized Hebbian Algorithm for Texture Classification
Lin, Shiow-Jyu; Hwang, Wen-Jyi; Lee, Wei-Hao
2012-01-01
This paper presents a novel hardware architecture for principal component analysis. The architecture is based on the Generalized Hebbian Algorithm (GHA) because of its simplicity and effectiveness. The architecture is separated into three portions: the weight vector updating unit, the principal computation unit and the memory unit. In the weight vector updating unit, the computation of different synaptic weight vectors shares the same circuit for reducing the area costs. To show the effectiveness of the circuit, a texture classification system based on the proposed architecture is physically implemented by Field Programmable Gate Array (FPGA). It is embedded in a System-On-Programmable-Chip (SOPC) platform for performance measurement. Experimental results show that the proposed architecture is an efficient design for attaining both high speed performance and low area costs. PMID:22778640
An implementation of continuous genetic algorithm in parameter estimation of predator-prey model
NASA Astrophysics Data System (ADS)
Windarto
2016-03-01
Genetic algorithm is an optimization method based on the principles of genetics and natural selection in life organisms. The main components of this algorithm are chromosomes population (individuals population), parent selection, crossover to produce new offspring, and random mutation. In this paper, continuous genetic algorithm was implemented to estimate parameters in a predator-prey model of Lotka-Volterra type. For simplicity, all genetic algorithm parameters (selection rate and mutation rate) are set to be constant along implementation of the algorithm. It was found that by selecting suitable mutation rate, the algorithms can estimate these parameters well.
An Object-Oriented Collection of Minimum Degree Algorithms: Design, Implementation, and Experiences
NASA Technical Reports Server (NTRS)
Kumfert, Gary; Pothen, Alex
1999-01-01
The multiple minimum degree (MMD) algorithm and its variants have enjoyed 20+ years of research and progress in generating fill-reducing orderings for sparse, symmetric positive definite matrices. Although conceptually simple, efficient implementations of these algorithms are deceptively complex and highly specialized. In this case study, we present an object-oriented library that implements several recent minimum degree-like algorithms. We discuss how object-oriented design forces us to decompose these algorithms in a different manner than earlier codes and demonstrate how this impacts the flexibility and efficiency of our C++ implementation. We compare the performance of our code against other implementations in C or Fortran.
NASA Astrophysics Data System (ADS)
Klimesh, M.; Stanton, V.; Watola, D.
2000-10-01
We describe a hardware implementation of a state-of-the-art lossless image compression algorithm. The algorithm is based on the LOCO-I (low complexity lossless compression for images) algorithm developed by Weinberger, Seroussi, and Sapiro, with modifications to lower the implementation complexity. In this setup, the compression itself is performed entirely in hardware using a field programmable gate array and a small amount of random access memory. The compression speed achieved is 1.33 Mpixels/second. Our algorithm yields about 15 percent better compression than the Rice algorithm.
Improvement and implementation for Canny edge detection algorithm
NASA Astrophysics Data System (ADS)
Yang, Tao; Qiu, Yue-hong
2015-07-01
Edge detection is necessary for image segmentation and pattern recognition. In this paper, an improved Canny edge detection approach is proposed due to the defect of traditional algorithm. A modified bilateral filter with a compensation function based on pixel intensity similarity judgment was used to smooth image instead of Gaussian filter, which could preserve edge feature and remove noise effectively. In order to solve the problems of sensitivity to the noise in gradient calculating, the algorithm used 4 directions gradient templates. Finally, Otsu algorithm adaptively obtain the dual-threshold. All of the algorithm simulated with OpenCV 2.4.0 library in the environments of vs2010, and through the experimental analysis, the improved algorithm has been proved to detect edge details more effectively and with more adaptability.
A modified density-based clustering algorithm and its implementation
NASA Astrophysics Data System (ADS)
Ban, Zhihua; Liu, Jianguo; Yuan, Lulu; Yang, Hua
2015-12-01
This paper presents an improved density-based clustering algorithm based on the paper of clustering by fast search and find of density peaks. A distance threshold is introduced for the purpose of economizing memory. In order to reduce the probability that two points share the same density value, similarity is utilized to define proximity measure. We have tested the modified algorithm on a large data set, several small data sets and shape data sets. It turns out that the proposed algorithm can obtain acceptable results and can be applied more wildly.
Algorithm Implementation for a Prototype Time-Encoded Signature Detector
Mercier, Theresa M.; Runkle, Robert C.; Stephens, Daniel L.; Hyronimus, Brian J.; Morris, Scott J.; Seifert, Allen; Wyatt, Cory R.
2007-12-31
The authors constructed a prototype Time-Encoded Signature (TES) system, complete with automated detection algorithms, useful for the detection of point-like, gamma-ray sources in search applications where detectors observe large variability in background count rates beyond statistical (Poisson) noise. The person-carried TES instrument consists of two Cesium Iodide scintillators placed on opposite sides of a Tungsten shield. This geometry mitigates systematic background variation, and induces a unique signature upon encountering point-like sources. This manuscript focuses on the development of detection algorithms that both identify point-source signatures and are computationally simple. The latter constraint derives from the instrument’s mobile (and thus low power) operation. The authors evaluated algorithms on both simulated and field data. The results of this analysis demonstrate the ability to detect sources at a wide range of source-detector distances using computationally simple algorithms.
An Improved QRS Wave Group Detection Algorithm and Matlab Implementation
NASA Astrophysics Data System (ADS)
Zhang, Hongjun
This paper presents an algorithm using Matlab software to detect QRS wave group of MIT-BIH ECG database. First of all the noise in ECG be Butterworth filtered, and then analysis the ECG signal based on wavelet transform to detect the parameters of the principle of singularity, more accurate detection of the QRS wave group was achieved.
Implementations of back propagation algorithm in ecosystems applications
NASA Astrophysics Data System (ADS)
Ali, Khalda F.; Sulaiman, Riza; Elamir, Amir Mohamed
2015-05-01
Artificial Neural Networks (ANNs) have been applied to an increasing number of real world problems of considerable complexity. Their most important advantage is in solving problems which are too complex for conventional technologies, that do not have an algorithmic solutions or their algorithmic Solutions is too complex to be found. In general, because of their abstraction from the biological brain, ANNs are developed from concept that evolved in the late twentieth century neuro-physiological experiments on the cells of the human brain to overcome the perceived inadequacies with conventional ecological data analysis methods. ANNs have gained increasing attention in ecosystems applications, because of ANN's capacity to detect patterns in data through non-linear relationships, this characteristic confers them a superior predictive ability. In this research, ANNs is applied in an ecological system analysis. The neural networks use the well known Back Propagation (BP) Algorithm with the Delta Rule for adaptation of the system. The Back Propagation (BP) training Algorithm is an effective analytical method for adaptation of the ecosystems applications, the main reason because of their capacity to detect patterns in data through non-linear relationships. This characteristic confers them a superior predicting ability. The BP algorithm uses supervised learning, which means that we provide the algorithm with examples of the inputs and outputs we want the network to compute, and then the error is calculated. The idea of the back propagation algorithm is to reduce this error, until the ANNs learns the training data. The training begins with random weights, and the goal is to adjust them so that the error will be minimal. This research evaluated the use of artificial neural networks (ANNs) techniques in an ecological system analysis and modeling. The experimental results from this research demonstrate that an artificial neural network system can be trained to act as an expert
An implementation of the look-ahead Lanczos algorithm for non-Hermitian matrices, part 2
NASA Technical Reports Server (NTRS)
Freund, Roland W.; Nachtigal, Noel M.
1990-01-01
It is shown how the look-ahead Lanczos process (combined with a quasi-minimal residual QMR) approach) can be used to develop a robust black box solver for large sparse non-Hermitian linear systems. Details of an implementation of the resulting QMR algorithm are presented. It is demonstrated that the QMR method is closely related to the biconjugate gradient (BCG) algorithm; however, unlike BCG, the QMR algorithm has smooth convergence curves and good numerical properties. We report numerical experiments with our implementation of the look-ahead Lanczos algorithm, both for eigenvalue problem and linear systems. Also, program listings of FORTRAN implementations of the look-ahead algorithm and the QMR method are included.
A block-wise approximate parallel implementation for ART algorithm on CUDA-enabled GPU.
Fan, Zhongyin; Xie, Yaoqin
2015-01-01
Computed tomography (CT) has been widely used to acquire volumetric anatomical information in the diagnosis and treatment of illnesses in many clinics. However, the ART algorithm for reconstruction from under-sampled and noisy projection is still time-consuming. It is the goal of our work to improve a block-wise approximate parallel implementation for the ART algorithm on CUDA-enabled GPU to make the ART algorithm applicable to the clinical environment. The resulting method has several compelling features: (1) the rays are allotted into blocks, making the rays in the same block parallel; (2) GPU implementation caters to the actual industrial and medical application demand. We test the algorithm on a digital shepp-logan phantom, and the results indicate that our method is more efficient than the existing CPU implementation. The high computation efficiency achieved in our algorithm makes it possible for clinicians to obtain real-time 3D images. PMID:26405857
A Fast Implementation of the ISODATA Clustering Algorithm
NASA Technical Reports Server (NTRS)
Memarsadeghi, Nargess; Mount, David M.; Netanyahu, Nathan S.; LeMoigne, Jacqueline
2005-01-01
Clustering is central to many image processing and remote sensing applications. ISODATA is one of the most popular and widely used clustering methods in geoscience applications, but it can run slowly, particularly with large data sets. We present a more efficient approach to ISODATA clustering, which achieves better running times by storing the points in a kd-tree and through a modification of the way in which the algorithm estimates the dispersion of each cluster. We also present an approximate version of the algorithm which allows the user to further improve the running time, at the expense of lower fidelity in computing the nearest cluster center to each point. We provide both theoretical and empirical justification that our modified approach produces clusterings that are very similar to those produced by the standard ISODATA approach. We also provide empirical studies on both synthetic data and remotely sensed Landsat and MODIS images that show that our approach has significantly lower running times.
Comparative study of fusion algorithms and implementation of new efficient solution
NASA Astrophysics Data System (ADS)
Besrour, Amine; Snoussi, Hichem; Siala, Mohamed; Abdelkefi, Fatma
2014-05-01
High Dynamic Range (HDR) imaging has been the subject of significant researches over the past years, the goal of acquiring the best cinema-quality HDR images of fast-moving scenes using an efficient merging algorithm has not yet been achieved. In fact, through the years, many efficient algorithms have been implemented and developed. However, they are not yet efficient since they don't treat all the situations and they have not enough speed to ensure fast HDR image reconstitution. In this paper, we will present a full comparative analyze and study of the available fusion algorithms. Also, we will implement our personal algorithm which can be more optimized and faster than the existed ones. We will also present our investigated algorithm that has the advantage to be more optimized than the existing ones. This merging algorithm is related to our hardware solution allowing us to obtain four pictures with different exposures.
A simple implementation of the Viterbi algorithm on the Motorola DSP56001
NASA Technical Reports Server (NTRS)
Messer, Dion D.; Park, Sangil
1990-01-01
As systems designers design communication systems with digital rather than analog components to reduce noise and increase channel capacity, they must have the ability to perform traditional communication algorithms digitally. The use of trellis coded modulation as well as the extensive use of convolutional encoding for error detection and correction requires an efficient digital implementation of the Viterbi Algorithm for real time demodulation and decoding. Digital signal processors are now fast enough to implement Viterbi decoding in conjunction with the normal receiver/transmitter functions for lower speed channels on a single chip as well as performing fast decoding for higher speed channels, if the algorithm is implemented efficiently. The purpose of this paper is to identify a good way to implement the Viterbi Algorithm (VA) on the Motorola DSP56001, balancing performance considerations with speed and memory efficiency.
A Novel Implementation of Efficient Algorithms for Quantum Circuit Synthesis
NASA Astrophysics Data System (ADS)
Zeller, Luke
In this project, we design and develop a computer program to effectively approximate arbitrary quantum gates using the discrete set of Clifford Gates together with the T gate (π/8 gate). Employing recent results from Mosca et. al. and Giles and Selinger, we implement a decomposition scheme that outputs a sequence of Clifford, T, and Tt gates that approximate the input to within a specified error range ɛ. Specifically, the given gate is first rounded to an element of Z[1/2, i] with a precision determined by ɛ, and then exact synthesis is employed to produce the resulting gate. It is known that this procedure is optimal in approximating an arbitrary single qubit gate. Our program, written in Matlab and Python, can complete both approximate and exact synthesis of qubits. It can be used to assist in the experimental implementation of an arbitrary fault-tolerant single qubit gate, for which direct implementation isn't feasible.
Investigations into Implementation of an Iterative Feedback Tuning Algorithm into Microcontroller
NASA Astrophysics Data System (ADS)
Himunzowa, Grayson
In this paper, implementation of an Iterative Feedback Tuning (IFT) and Myopic Unfalsified Control (MUC) algorithms into a microcontroller is investigated. First step taken was to search for a suitable hardware to accommodate these complex algorithms. The Motorola DSP56F807C and ARM7024 microcontrollers were selected for use in the research. The algorithms were coded in the C language of the respective microcontrollers and were tested by simulation of the DC Motor models obtained from step response of the motor.
Implementation of Real-Time Feedback Flow Control Algorithms on a Canonical Testbed
NASA Technical Reports Server (NTRS)
Tian, Ye; Song, Qi; Cattafesta, Louis
2005-01-01
This report summarizes the activities on "Implementation of Real-Time Feedback Flow Control Algorithms on a Canonical Testbed." The work summarized consists primarily of two parts. The first part summarizes our previous work and the extensions to adaptive ID and control algorithms. The second part concentrates on the validation of adaptive algorithms by applying them to a vibration beam test bed. Extensions to flow control problems are discussed.
NASA Technical Reports Server (NTRS)
Delaat, J. C.; Merrill, W. C.
1983-01-01
A sensor failure detection, isolation, and accommodation algorithm was developed which incorporates analytic sensor redundancy through software. This algorithm was implemented in a high level language on a microprocessor based controls computer. Parallel processing and state-of-the-art 16-bit microprocessors are used along with efficient programming practices to achieve real-time operation.
FPGA implementation of vision algorithms for small autonomous robots
NASA Astrophysics Data System (ADS)
Anderson, J. D.; Lee, D. J.; Archibald, J. K.
2005-10-01
The use of on-board vision with small autonomous robots has been made possible by the advances in the field of Field Programmable Gate Array (FPGA) technology. By connecting a CMOS camera to an FPGA board, on-board vision has been used to reduce the computation time inherent in vision algorithms. The FPGA board allows the user to create custom hardware in a faster, safer, and more easily verifiable manner that decreases the computation time and allows the vision to be done in real-time. Real-time vision tasks for small autonomous robots include object tracking, obstacle detection and avoidance, and path planning. Competitions were created to demonstrate that our algorithms work with our small autonomous vehicles in dealing with these problems. These competitions include Mouse-Trapped-in-a-Box, where the robot has to detect the edges of a box that it is trapped in and move towards them without touching them; Obstacle Avoidance, where an obstacle is placed at any arbitrary point in front of the robot and the robot has to navigate itself around the obstacle; Canyon Following, where the robot has to move to the center of a canyon and follow the canyon walls trying to stay in the center; the Grand Challenge, where the robot had to navigate a hallway and return to its original position in a given amount of time; and Stereo Vision, where a separate robot had to catch tennis balls launched from an air powered cannon. Teams competed on each of these competitions that were designed for a graduate-level robotic vision class, and each team had to develop their own algorithm and hardware components. This paper discusses one team's approach to each of these problems.
Evaluation of a segmentation algorithm designed for an FPGA implementation
NASA Astrophysics Data System (ADS)
Schwenk, Kurt; Schönermark, Maria; Huber, Felix
2013-10-01
The present work has to be seen in the context of real-time on-board image evaluation of optical satellite data. With on board image evaluation more useful data can be acquired, the time to get requested information can be decreased and new real-time applications are possible. Because of the relative high processing power in comparison to the low power consumption, Field Programmable Gate Array (FPGA) technology has been chosen as an adequate hardware platform for image processing tasks. One fundamental part for image evaluation is image segmentation. It is a basic tool to extract spatial image information which is very important for many applications such as object detection. Therefore a special segmentation algorithm using the advantages of FPGA technology has been developed. The aim of this work is the evaluation of this algorithm. Segmentation evaluation is a difficult task. The most common way for evaluating the performance of a segmentation method is still subjective evaluation, in which human experts determine the quality of a segmentation. This way is not in compliance with our needs. The evaluation process has to provide a reasonable quality assessment, should be objective, easy to interpret and simple to execute. To reach these requirements a so called Segmentation Accuracy Equality norm (SA EQ) was created, which compares the difference of two segmentation results. It can be shown that this norm is capable as a first quality measurement. Due to its objectivity and simplicity the algorithm has been tested on a specially chosen synthetic test model. In this work the most important results of the quality assessment will be presented.
Computer implementation of the Kerov-Kirillov-Reshetikhin algorithm
NASA Astrophysics Data System (ADS)
Jakubczyk, P.; Wal, A.; Jakubczyk, D.; Lulek, T.
2012-06-01
The Kerov-Kirillov-Reshetikhin (KKR) algorithm establishes a bijection between semistandard Weyl tableaux of the shape λ and weight μ and rigged string configurations of type (λ,μ). This algorithm can be applied to Heisenberg magnetic chains and their generalisations by use of Schur-Weyl duality, and gives a way to classify all eigenstates of the chain in a methodological manner. Program summaryProgram title: KKR algorithm Catalogue identifier: AELQ_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AELQ_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 974 No. of bytes in distributed program, including test data, etc.: 11 118 Distribution format: tar.gz Programming language: Maple Computer: PC Operating system: Windows, Linux RAM: 1 GB Classification: 2.7, 4.2 Nature of problem: The method of classification of all eigenstates and eigenvalues is given for the generalised model of the one-dimensional Heisenberg magnet consisting of N nodes. Solution method: We use a prolongation of the RSK bijection [1] to the set of all eigenstates of an integrable model, in terms of rigged string configurations. More specifically, one selects from the irreducible basis b of the Schur-Weyl duality all highest weight states |λty> with respect to the unitary group U(n) and generates a rigged string configurations |ν→,{J}>=KKR(|λty>) for each y∈SYT(λ). Running time: The running time is about 1 ms.
Motion Cueing Algorithm Development: New Motion Cueing Program Implementation and Tuning
NASA Technical Reports Server (NTRS)
Houck, Jacob A. (Technical Monitor); Telban, Robert J.; Cardullo, Frank M.; Kelly, Lon C.
2005-01-01
A computer program has been developed for the purpose of driving the NASA Langley Research Center Visual Motion Simulator (VMS). This program includes two new motion cueing algorithms, the optimal algorithm and the nonlinear algorithm. A general description of the program is given along with a description and flowcharts for each cueing algorithm, and also descriptions and flowcharts for subroutines used with the algorithms. Common block variable listings and a program listing are also provided. The new cueing algorithms have a nonlinear gain algorithm implemented that scales each aircraft degree-of-freedom input with a third-order polynomial. A description of the nonlinear gain algorithm is given along with past tuning experience and procedures for tuning the gain coefficient sets for each degree-of-freedom to produce the desired piloted performance. This algorithm tuning will be needed when the nonlinear motion cueing algorithm is implemented on a new motion system in the Cockpit Motion Facility (CMF) at the NASA Langley Research Center.
A Linac Simulation Code for Macro-Particles Tracking and Steering Algorithm Implementation
sun, yipeng
2012-05-03
In this paper, a linac simulation code written in Fortran90 is presented and several simulation examples are given. This code is optimized to implement linac alignment and steering algorithms, and evaluate the accelerator errors such as RF phase and acceleration gradient, quadrupole and BPM misalignment. It can track a single particle or a bunch of particles through normal linear accelerator elements such as quadrupole, RF cavity, dipole corrector and drift space. One-to-one steering algorithm and a global alignment (steering) algorithm are implemented in this code.
Implementation and performance of a domain decomposition algorithm in Sisal
DeBoni, T.; Feo, J.; Rodrigue, G.; Muller, J.
1993-09-23
Sisal is a general-purpose functional language that hides the complexity of parallel processing, expedites parallel program development, and guarantees determinacy. Parallelism and management of concurrent tasks are realized automatically by the compiler and runtime system. Spatial domain decomposition is a widely-used method that focuses computational resources on the most active, or important, areas of a domain. Many complex programming issues are introduced in paralleling this method including: dynamic spatial refinement, dynamic grid partitioning and fusion, task distribution, data distribution, and load balancing. In this paper, we describe a spatial domain decomposition algorithm programmed in Sisal. We explain the compilation process, and present the execution performance of the resultant code on two different multiprocessor systems: a multiprocessor vector supercomputer, and cache-coherent scalar multiprocessor.
Implementation of jump-diffusion algorithms for understanding FLIR scenes
NASA Astrophysics Data System (ADS)
Lanterman, Aaron D.; Miller, Michael I.; Snyder, Donald L.
1995-07-01
Our pattern theoretic approach to the automated understanding of forward-looking infrared (FLIR) images brings the traditionally separate endeavors of detection, tracking, and recognition together into a unified jump-diffusion process. New objects are detected and object types are recognized through discrete jump moves. Between jumps, the location and orientation of objects are estimated via continuous diffusions. An hypothesized scene, simulated from the emissive characteristics of the hypothesized scene elements, is compared with the collected data by a likelihood function based on sensor statistics. This likelihood is combined with a prior distribution defined over the set of possible scenes to form a posterior distribution. The jump-diffusion process empirically generates the posterior distribution. Both the diffusion and jump operations involve the simulation of a scene produced by a hypothesized configuration. Scene simulation is most effectively accomplished by pipelined rendering engines such as silicon graphics. We demonstrate the execution of our algorithm on a silicon graphics onyx/reality engine.
Implementation and evaluation of ILLIAC 4 algorithms for multispectral image processing
NASA Technical Reports Server (NTRS)
Swain, P. H.
1974-01-01
Data concerning a multidisciplinary and multi-organizational effort to implement multispectral data analysis algorithms on a revolutionary computer, the Illiac 4, are reported. The effectiveness and efficiency of implementing the digital multispectral data analysis techniques for producing useful land use classifications from satellite collected data were demonstrated.
On Distribution Reduction and Algorithm Implementation in Inconsistent Ordered Information Systems
Zhang, Yanqin
2014-01-01
As one part of our work in ordered information systems, distribution reduction is studied in inconsistent ordered information systems (OISs). Some important properties on distribution reduction are studied and discussed. The dominance matrix is restated for reduction acquisition in dominance relations based information systems. Matrix algorithm for distribution reduction acquisition is stepped. And program is implemented by the algorithm. The approach provides an effective tool for the theoretical research and the applications for ordered information systems in practices. For more detailed and valid illustrations, cases are employed to explain and verify the algorithm and the program which shows the effectiveness of the algorithm in complicated information systems. PMID:25258721
NASA Technical Reports Server (NTRS)
Delaat, J. C.
1984-01-01
An advanced, sensor failure detection, isolation, and accomodation algorithm has been developed by NASA for the F100 turbofan engine. The algorithm takes advantage of the analytical redundancy of the sensors to improve the reliability of the sensor set. The method requires the controls computer, to determine when a sensor failure has occurred without the help of redundant hardware sensors in the control system. The controls computer provides an estimate of the correct value of the output of the failed sensor. The algorithm has been programmed in FORTRAN using a real-time microprocessor-based controls computer. A detailed description of the algorithm and its implementation on a microprocessor is given.
Implementation of software-based sensor linearization algorithms on low-cost microcontrollers.
Erdem, Hamit
2010-10-01
Nonlinear sensors and microcontrollers are used in many embedded system designs. As the input-output characteristic of most sensors is nonlinear in nature, obtaining data from a nonlinear sensor by using an integer microcontroller has always been a design challenge. This paper discusses the implementation of six software-based sensor linearization algorithms for low-cost microcontrollers. The comparative study of the linearization algorithms is performed by using a nonlinear optical distance-measuring sensor. The performance of the algorithms is examined with respect to memory space usage, linearization accuracy and algorithm execution time. The implementation and comparison results can be used for selection of a linearization algorithm based on the sensor transfer function, expected linearization accuracy and microcontroller capacity. PMID:20466366
de Paula, Lauro C. M.; Soares, Anderson S.; de Lima, Telma W.; Delbem, Alexandre C. B.; Coelho, Clarimar J.; Filho, Arlindo R. G.
2014-01-01
Several variable selection algorithms in multivariate calibration can be accelerated using Graphics Processing Units (GPU). Among these algorithms, the Firefly Algorithm (FA) is a recent proposed metaheuristic that may be used for variable selection. This paper presents a GPU-based FA (FA-MLR) with multiobjective formulation for variable selection in multivariate calibration problems and compares it with some traditional sequential algorithms in the literature. The advantage of the proposed implementation is demonstrated in an example involving a relatively large number of variables. The results showed that the FA-MLR, in comparison with the traditional algorithms is a more suitable choice and a relevant contribution for the variable selection problem. Additionally, the results also demonstrated that the FA-MLR performed in a GPU can be five times faster than its sequential implementation. PMID:25493625
Implementation on a nonlinear concrete cracking algorithm in NASTRAN
NASA Technical Reports Server (NTRS)
Herting, D. N.; Herendeen, D. L.; Hoesly, R. L.; Chang, H.
1976-01-01
A computer code for the analysis of reinforced concrete structures was developed using NASTRAN as a basis. Nonlinear iteration procedures were developed for obtaining solutions with a wide variety of loading sequences. A direct access file system was used to save results at each load step to restart within the solution module for further analysis. A multi-nested looping capability was implemented to control the iterations and change the loads. The basis for the analysis is a set of mutli-layer plate elements which allow local definition of materials and cracking properties.
Farber, R.M.; Lapedes, A.S.; Rico-Martinez, R.; Kevrekidis, I.G.
1993-06-01
Time-delay mappings constructed using neural networks have proven successful performing nonlinear system identification; however, because of their discrete nature, their use in bifurcation analysis of continuous-tune systems is limited. This shortcoming can be avoided by embedding the neural networks in a training algorithm that mimics a numerical integrator. Both explicit and implicit integrators can be used. The former case is based on repeated evaluations of the network in a feedforward implementation; the latter relies on a recurrent network implementation. Here the algorithms and their implementation on parallel machines (SIMD and MIMD architectures) are discussed.
Farber, R.M.; Lapedes, A.S. ); Rico-Martinez, R.; Kevrekidis, I.G. . Dept. of Chemical Engineering)
1993-01-01
Time-delay mappings constructed using neural networks have proven successful performing nonlinear system identification; however, because of their discrete nature, their use in bifurcation analysis of continuous-tune systems is limited. This shortcoming can be avoided by embedding the neural networks in a training algorithm that mimics a numerical integrator. Both explicit and implicit integrators can be used. The former case is based on repeated evaluations of the network in a feedforward implementation; the latter relies on a recurrent network implementation. Here the algorithms and their implementation on parallel machines (SIMD and MIMD architectures) are discussed.
NASA Astrophysics Data System (ADS)
Jin, Minglei; Jin, Weiqi; Li, Yiyang; Li, Shuo
2015-08-01
In this paper, we propose a novel scene-based non-uniformity correction algorithm for infrared image processing-temporal high-pass non-uniformity correction algorithm based on grayscale mapping (THP and GM). The main sources of non-uniformity are: (1) detector fabrication inaccuracies; (2) non-linearity and variations in the read-out electronics and (3) optical path effects. The non-uniformity will be reduced by non-uniformity correction (NUC) algorithms. The NUC algorithms are often divided into calibration-based non-uniformity correction (CBNUC) algorithms and scene-based non-uniformity correction (SBNUC) algorithms. As non-uniformity drifts temporally, CBNUC algorithms must be repeated by inserting a uniform radiation source which SBNUC algorithms do not need into the view, so the SBNUC algorithm becomes an essential part of infrared imaging system. The SBNUC algorithms' poor robustness often leads two defects: artifacts and over-correction, meanwhile due to complicated calculation process and large storage consumption, hardware implementation of the SBNUC algorithms is difficult, especially in Field Programmable Gate Array (FPGA) platform. The THP and GM algorithm proposed in this paper can eliminate the non-uniformity without causing defects. The hardware implementation of the algorithm only based on FPGA has two advantages: (1) low resources consumption, and (2) small hardware delay: less than 20 lines, it can be transplanted to a variety of infrared detectors equipped with FPGA image processing module, it can reduce the stripe non-uniformity and the ripple non-uniformity.
Implementation of a partitioned algorithm for simulation of large CSI problems
NASA Technical Reports Server (NTRS)
Alvin, Kenneth F.; Park, K. C.
1991-01-01
The implementation of a partitioned numerical algorithm for determining the dynamic response of coupled structure/controller/estimator finite-dimensional systems is reviewed. The partitioned approach leads to a set of coupled first and second-order linear differential equations which are numerically integrated with extrapolation and implicit step methods. The present software implementation, ACSIS, utilizes parallel processing techniques at various levels to optimize performance on a shared-memory concurrent/vector processing system. A general procedure for the design of controller and filter gains is also implemented, which utilizes the vibration characteristics of the structure to be solved. Also presented are: example problems; a user's guide to the software; the procedures and algorithm scripts; a stability analysis for the algorithm; and the source code for the parallel implementation.
NASA Astrophysics Data System (ADS)
Xiong, Yuting; Gu, Zhaojun; Jin, Wei
In this paper, a novel practical algorithmic solution for automatic discovering the physical topology of switched Ethernet was proposed. Our algorithm collects standard SNMP MIB information that is widely supported in modern IP networks and then builds the physical topology of the active network. We described the relative definitions, system model and proved the correctness of the algorithm. Practically, the algorithm was implemented in our visualization network monitoring system. We also presented the main steps of the algorithm, core codes and running results on the lab network. The experimental results clearly validate our approach, demonstrating that our algorithm is simple and effective which can discover the accurate up-to-date physical network topology.
NASA Astrophysics Data System (ADS)
Xu, Dexiang
This dissertation presents a novel method of designing finite word length Finite Impulse Response (FIR) digital filters using a Real Parameter Parallel Genetic Algorithm (RPPGA). This algorithm is derived from basic Genetic Algorithms which are inspired by natural genetics principles. Both experimental results and theoretical studies in this work reveal that the RPPGA is a suitable method for determining the optimal or near optimal discrete coefficients of finite word length FIR digital filters. Performance of RPPGA is evaluated by comparing specifications of filters designed by other methods with filters designed by RPPGA. The parallel and spatial structures of the algorithm result in faster and more robust optimization than basic genetic algorithms. A filter designed by RPPGA is implemented in hardware to attenuate high frequency noise in a data acquisition system for collecting seismic signals. These studies may lead to more applications of the Real Parameter Parallel Genetic Algorithms in Electrical Engineering.
Efficient implementation of Jacobi algorithms and Jacobi sets on distributed memory architectures
NASA Astrophysics Data System (ADS)
Eberlein, P. J.; Park, Haesun
1990-04-01
One-sided methods for implementing Jacobi diagonalization algorithms have been recently proposed for both distributed memory and vector machines. These methods are naturally well suited to distributed memory and vector architectures because of their inherent parallelism and their abundance of vector operations. Also, one-sided methods require substantially less message passing than the two-sided methods, and thus can achieve higher efficiency. We describe in detail the use of the one-sided Jacobi rotation as opposed to the rotation used in the ``Hestenes'' algorithm; we perceive this difference to have been widely misunderstood. Furthermore the one-sided algorithm generalizes to other problems such as the nonsymmetric eigenvalue problem while the Hestenes algorithm does not. We discuss two new implementations for Jacobi sets for a ring connected array of processors and show their isomorphism to the round-robin ordering. Moreover, we show that two implementations produce Jacobi sets in identical orders up to a relabeling. These orderings are optimal in the sense that they complete each sweep in a minimum number of stages with minimal communication. We present implementation results of one-sided Jacobi algorithms using these orderings on the NCUBE/seven hypercube as well as the Intel iPSC/2 hypercube. Finally, we mention how other orderings, and can be, implemented. The number of nonisomorphic Jacobi sets has recently been shown to become infinite with increasing n. The work of this author was supported by National Science Foundation Grant CCR-8813493.
Clinical implementation of a neonatal seizure detection algorithm
Temko, Andriy; Marnane, William; Boylan, Geraldine; Lightbody, Gordon
2015-01-01
Technologies for automated detection of neonatal seizures are gradually moving towards cot-side implementation. The aim of this paper is to present different ways to visualize the output of a neonatal seizure detection system and analyse their influence on performance in a clinical environment. Three different ways to visualize the detector output are considered: a binary output, a probabilistic trace, and a spatio-temporal colormap of seizure observability. As an alternative to visual aids, audified neonatal EEG is also considered. Additionally, a survey on the usefulness and accuracy of the presented methods has been performed among clinical personnel. The main advantages and disadvantages of the presented methods are discussed. The connection between information visualization and different methods to compute conventional metrics is established. The results of the visualization methods along with the system validation results indicate that the developed neonatal seizure detector with its current level of performance would unambiguously be of benefit to clinicians as a decision support system. The results of the survey suggest that a suitable way to visualize the output of neonatal seizure detection systems in a clinical environment is a combination of a binary output and a probabilistic trace. The main healthcare benefits of the tool are outlined. The decision support system with the chosen visualization interface is currently undergoing pre-market European multi-centre clinical investigation to support its regulatory approval and clinical adoption. PMID:25892834
Implementing embedded artificial intelligence rules within algorithmic programming languages
NASA Technical Reports Server (NTRS)
Feyock, Stefan
1988-01-01
Most integrations of artificial intelligence (AI) capabilities with non-AI (usually FORTRAN-based) application programs require the latter to execute separately to run as a subprogram or, at best, as a coroutine, of the AI system. In many cases, this organization is unacceptable; instead, the requirement is for an AI facility that runs in embedded mode; i.e., is called as subprogram by the application program. The design and implementation of a Prolog-based AI capability that can be invoked in embedded mode are described. The significance of this system is twofold: Provision of Prolog-based symbol-manipulation and deduction facilities makes a powerful symbolic reasoning mechanism available to applications programs written in non-AI languages. The power of the deductive and non-procedural descriptive capabilities of Prolog, which allow the user to describe the problem to be solved, rather than the solution, is to a large extent vitiated by the absence of the standard control structures provided by other languages. Embedding invocations of Prolog rule bases in programs written in non-AI languages makes it possible to put Prolog calls inside DO loops and similar control constructs. The resulting merger of non-AI and AI languages thus results in a symbiotic system in which the advantages of both programming systems are retained, and their deficiencies largely remedied.
Quantum computation: algorithms and implementation in quantum dot devices
NASA Astrophysics Data System (ADS)
Gamble, John King
In this thesis, we explore several aspects of both the software and hardware of quantum computation. First, we examine the computational power of multi-particle quantum random walks in terms of distinguishing mathematical graphs. We study both interacting and non-interacting multi-particle walks on strongly regular graphs, proving some limitations on distinguishing powers and presenting extensive numerical evidence indicative of interactions providing more distinguishing power. We then study the recently proposed adiabatic quantum algorithm for Google PageRank, and show that it exhibits power-law scaling for realistic WWW-like graphs. Turning to hardware, we next analyze the thermal physics of two nearby 2D electron gas (2DEG), and show that an analogue of the Coulomb drag effect exists for heat transfer. In some distance and temperature, this heat transfer is more significant than phonon dissipation channels. After that, we study the dephasing of two-electron states in a single silicon quantum dot. Specifically, we consider dephasing due to the electron-phonon coupling and charge noise, separately treating orbital and valley excitations. In an ideal system, dephasing due to charge noise is strongly suppressed due to a vanishing dipole moment. However, introduction of disorder or anharmonicity leads to large effective dipole moments, and hence possibly strong dephasing. Building on this work, we next consider more realistic systems, including structural disorder systems. We present experiment and theory, which demonstrate energy levels that vary with quantum dot translation, implying a structurally disordered system. Finally, we turn to the issues of valley mixing and valley-orbit hybridization, which occurs due to atomic-scale disorder at quantum well interfaces. We develop a new theoretical approach to study these effects, which we name the disorder-expansion technique. We demonstrate that this method successfully reproduces atomistic tight-binding techniques
NASA Astrophysics Data System (ADS)
Trigueros-Espinosa, Blas; Vélez-Reyes, Miguel; Santiago-Santiago, Nayda G.; Rosario-Torres, Samuel
2011-06-01
Hyperspectral sensors can collect hundreds of images taken at different narrow and contiguously spaced spectral bands. This high-resolution spectral information can be used to identify materials and objects within the field of view of the sensor by their spectral signature, but this process may be computationally intensive due to the large data sizes generated by the hyperspectral sensors, typically hundreds of megabytes. This can be an important limitation for some applications where the detection process must be performed in real time (surveillance, explosive detection, etc.). In this work, we developed a parallel implementation of three state-ofthe- art target detection algorithms (RX algorithm, matched filter and adaptive matched subspace detector) using a graphics processing unit (GPU) based on the NVIDIA® CUDA™ architecture. In addition, a multi-core CPUbased implementation of each algorithm was developed to be used as a baseline for the speedups estimation. We evaluated the performance of the GPU-based implementations using an NVIDIA ® Tesla® C1060 GPU card, and the detection accuracy of the implemented algorithms was evaluated using a set of phantom images simulating traces of different materials on clothing. We achieved a maximum speedup in the GPU implementations of around 20x over a multicore CPU-based implementation, which suggests that applications for real-time detection of targets in HSI can greatly benefit from the performance of GPUs as processing hardware.
Efficient implementation of Jacobi algorithms and Jacobi sets on distributed memory architectures
Eberlein, P.J. ); Park, H. )
1990-04-01
One-sided methods for implementing Jacobi diagonalization algorithms have been recently proposed for both distributed memory and vector machines. These methods are naturally well suited to distributed memory and vector architectures because of their inherent parallelism and their abundance of vector operations. Also, one-sided methods require substantially less message passing than the two-sided methods, and thus can achieve higher efficiency. The authors describe in detail the use of the one-sided Jacobi rotation as opposed to the rotation used in the Hestenes algorithm; they perceive the difference to have been widely misunderstood. Furthermore, the one-sided algorithm generalizes to other problems such as the nonsymmetric eigenvalue problem while the Hestenes algorithm does not. The authors discuss two new implementations for Jacobi sets for a ring connected array of processors and show their isomorphism to the round-robin ordering.
NASA Astrophysics Data System (ADS)
Kershaw, James; Hamlyn, Garry
1994-06-01
This paper describes the implementation of algorithms for automatically extracting and matching features in stereo pairs of images. The implementation has been designed to be as modular as possible to allow different algorithms for each stage in the matching process to be combined in the most appropriate manner for each particular problem. The modules have been implemented in the AVS environment but are designed to be portable to any platform. This work has been undertaken as part of task DEF 93/1 63 'Intelligence Analysis of Imagery', and forms part of ITD's contribution to the Visual Processing research program in the Centre for Sensor System and Information Processing. A major aim of both the task and the research program is to produce software to assist intelligence analysts in extracting three dimensional shape from imagery: the algorithms and software described here will form the first part of a module for automatically extracting depth information from stereo image pairs.
An implementation of the look-ahead Lanczos algorithm for non-Hermitian matrices, part 1
NASA Technical Reports Server (NTRS)
Freund, Roland W.; Gutknecht, Martin H.; Nachtigal, Noel M.
1990-01-01
The nonsymmetric Lanczos method can be used to compute eigenvalues of large sparse non-Hermitian matrices or to solve large sparse non-Hermitian linear systems. However, the original Lanczos algorithm is susceptible to possible breakdowns and potential instabilities. We present an implementation of a look-ahead version of the Lanczos algorithm which overcomes these problems by skipping over those steps in which a breakdown or near-breakdown would occur in the standard process. The proposed algorithm can handle look-ahead steps of any length and is not restricted to steps of length 2, as earlier implementations are. Also, our implementation has the feature that it requires roughly the same number of inner products as the standard Lanczos process without look-ahead.
NASA Technical Reports Server (NTRS)
Delaat, J. C.; Merrill, W. C.
1984-01-01
A sensor failure detection, isolation, and accommodation algorithm was developed which incorporates analytic sensor redundancy through software. This algorithm was implemented in a high level language on a microprocessor based controls computer. Parallel processing and state-of-the-art 16-bit microprocessors are used along with efficient programming practices to achieve real-time operation. Previously announced in STAR as N84-13140
Star-field identification algorithm. [for implementation on CCD-based imaging camera
NASA Technical Reports Server (NTRS)
Scholl, M. S.
1993-01-01
A description of a new star-field identification algorithm that is suitable for implementation on CCD-based imaging cameras is presented. The minimum identifiable star pattern element consists of an oriented star triplet defined by three stars, their celestial coordinates, and their visual magnitudes. The algorithm incorporates tolerance to faulty input data, errors in the reference catalog, and instrument-induced systematic errors.
A data-parallel algorithm for three-dimensional Delaunay triangulation and its implementation
Teng, Y.A.; Sullivan, F.; Beichl, I.; Puppo, E.
1993-12-31
In this paper, the authors present a parallel algorithm for constructing the Delaunay triangulation of a set of vertices in three-dimensional space. The algorithm achieves a high degree of parallelism by starting the construction from every vertex and expanding over all open faces thereafter. In the expansion of open faces, the search is made faster by using a bucketing technique. The algorithm is designed under a data-parallel paradigm. It uses segmented list structures and virtual processing for load-balancing. As a result, the algorithm achieves a fast running time and good scalability over a wide range of problem sizes and machine sizes. They also incorporate a topological check to eliminate inconsistencies due to degeneracies and numerical error. The algorithm is implemented on Connection Machines CM-2 and CM-5, and experimental results are presented.
Lu Dawei; Peng Xinhua; Du Jiangfeng; Zhu Jing; Zou Ping; Yu Yihua; Zhang Shanmin; Chen Qun
2010-02-15
An important quantum search algorithm based on the quantum random walk performs an oracle search on a database of N items with O({radical}(phN)) calls, yielding a speedup similar to the Grover quantum search algorithm. The algorithm was implemented on a quantum information processor of three-qubit liquid-crystal nuclear magnetic resonance (NMR) in the case of finding 1 out of 4, and the diagonal elements' tomography of all the final density matrices was completed with comprehensible one-dimensional NMR spectra. The experimental results agree well with the theoretical predictions.
NASA Astrophysics Data System (ADS)
Lu, Dawei; Zhu, Jing; Zou, Ping; Peng, Xinhua; Yu, Yihua; Zhang, Shanmin; Chen, Qun; Du, Jiangfeng
2010-02-01
An important quantum search algorithm based on the quantum random walk performs an oracle search on a database of N items with O(phN) calls, yielding a speedup similar to the Grover quantum search algorithm. The algorithm was implemented on a quantum information processor of three-qubit liquid-crystal nuclear magnetic resonance (NMR) in the case of finding 1 out of 4, and the diagonal elements’ tomography of all the final density matrices was completed with comprehensible one-dimensional NMR spectra. The experimental results agree well with the theoretical predictions.
Demonstration of a small programmable quantum computer with atomic qubits.
Debnath, S; Linke, N M; Figgatt, C; Landsman, K A; Wright, K; Monroe, C
2016-08-01
Quantum computers can solve certain problems more efficiently than any possible conventional computer. Small quantum algorithms have been demonstrated on multiple quantum computing platforms, many specifically tailored in hardware to implement a particular algorithm or execute a limited number of computational paths. Here we demonstrate a five-qubit trapped-ion quantum computer that can be programmed in software to implement arbitrary quantum algorithms by executing any sequence of universal quantum logic gates. We compile algorithms into a fully connected set of gate operations that are native to the hardware and have a mean fidelity of 98 per cent. Reconfiguring these gate sequences provides the flexibility to implement a variety of algorithms without altering the hardware. As examples, we implement the Deutsch-Jozsa and Bernstein-Vazirani algorithms with average success rates of 95 and 90 per cent, respectively. We also perform a coherent quantum Fourier transform on five trapped-ion qubits for phase estimation and period finding with average fidelities of 62 and 84 per cent, respectively. This small quantum computer can be scaled to larger numbers of qubits within a single register, and can be further expanded by connecting several such modules through ion shuttling or photonic quantum channels. PMID:27488798
Subspace scheduling and parallel implementation of non-systolic regular iterative algorithms
Roychowdhury, V.P.; Kailath, T.
1989-01-01
The study of Regular Iterative Algorithms (RIAs) was introduced in a seminal paper by Karp, Miller, and Winograd in 1967. In more recent years, the study of systolic architectures has led to a renewed interest in this class of algorithms, and the class of algorithms implementable on systolic arrays (as commonly understood) has been identified as a precise subclass of RIAs include matrix pivoting algorithms and certain forms of numerically stable two-dimensional filtering algorithms. It has been shown that the so-called hyperplanar scheduling for systolic algorithms can no longer be used to schedule and implement non-systolic RIAs. Based on the analysis of a so-called computability tree we generalize the concept of hyperplanar scheduling and determine linear subspaces in the index space of a given RIA such that all variables lying on the same subspace can be scheduled at the same time. This subspace scheduling technique is shown to be asymptotically optimal, and formal procedures are developed for designing processor arrays that will be compatible with our scheduling schemes. Explicit formulas for the schedule of a given variable are determined whenever possible; subspace scheduling is also applied to obtain lower dimensional processor arrays for systolic algorithms.
Implementation of C* Boundary Conditions in the Hybrid Monte Carlo Algorithm
NASA Astrophysics Data System (ADS)
Carmona, José Manuel; D'elia, Massimo; Di Giacomo, Adriano; Lucini, Biagio
In the study of QCD dynamics, C* boundary conditions are physically relevant in certain cases. In this paper, we study the implementation of these boundary conditions in the lattice formulation of full QCD with staggered fermions. In particular, we show that the usual even-odd partition trick to avoid the redoubling of the fermion matrix is still valid in this case. We give an explicit implementation of these boundary conditions for the Hybrid Monte Carlo algorithm.
A fast implementation of the incremental backprojection algorithms for parallel beam geometries
Chen, C.M.; Wang, C.Y.; Cho, Z.H.
1996-12-01
Filtered-backprojection algorithms are the most widely used approaches for reconstruction of computed tomographic (CT) images, such as X-ray CT and positron emission tomographic (PET) images. The Incremental backprojection algorithm is a fast backprojection approach based on restructuring the Shepp and Logan algorithm. By exploiting interdependency (position and values) of adjacent pixels, the Incremental algorithm requires only O(N) and O(N{sup 2}) multiplications in contrast to O(N{sup 2}) and O(N{sup 3}) multiplications for the Shepp and Logan algorithm in two-dimensional (2-D) and three-dimensional (3-D) backprojections, respectively, for each view, where N is the size of the image in each dimension. In addition, it may reduce the number of additions for each pixel computation. The improvement achieved by the Incremental algorithm in practice was not, however, as significant as expected. One of the main reasons is due to inevitably visiting pixels outside the beam in the searching flow scheme originally developed for the Incremental algorithm. To optimize implementation of the Incremental algorithm, an efficient scheme, namely, coded searching flow scheme, is proposed in this paper to minimize the overhead caused by searching for all pixels in a beam. The key idea of this scheme is to encode the searching flow for all pixels inside each beam. While backprojecting, all pixels may be visited without any overhead due to using the coded searching flow as the a priori information. The proposed coded searching flow scheme has been implemented on a Sun Sparc 10 and a Sun Sparc 20 workstations. The implementation results show that the proposed scheme is 1.45--2.0 times faster than the original searching flow scheme for most cases tested.
Linear array implementation of the EM algorithm for PET image reconstruction
Rajan, K.; Patnaik, L.M.; Ramakrishna, J.
1995-08-01
The PET image reconstruction based on the EM algorithm has several attractive advantages over the conventional convolution back projection algorithms. However, the PET image reconstruction based on the EM algorithm is computationally burdensome for today`s single processor systems. In addition, a large memory is required for the storage of the image, projection data, and the probability matrix. Since the computations are easily divided into tasks executable in parallel, multiprocessor configurations are the ideal choice for fast execution of the EM algorithms. In tis study, the authors attempt to overcome these two problems by parallelizing the EM algorithm on a multiprocessor systems. The parallel EM algorithm on a linear array topology using the commercially available fast floating point digital signal processor (DSP) chips as the processing elements (PE`s) has been implemented. The performance of the EM algorithm on a 386/387 machine, IBM 6000 RISC workstation, and on the linear array system is discussed and compared. The results show that the computational speed performance of a linear array using 8 DSP chips as PE`s executing the EM image reconstruction algorithm is about 15.5 times better than that of the IBM 6000 RISC workstation. The novelty of the scheme is its simplicity. The linear array topology is expandable with a larger number of PE`s. The architecture is not dependant on the DSP chip chosen, and the substitution of the latest DSP chip is straightforward and could yield better speed performance.
NASA Astrophysics Data System (ADS)
Obara, Lukasz; Żarnecki, Aleksander Filip
2015-09-01
Pi of the Sky is a system of wide field-of-view robotic telescopes, which search for short timescale astrophysical phenomena, especially for prompt optical GRB emission. The system was designed for autonomous operation, monitoring a large fraction of the sky with 12m-13m range and time resolution of the order of 1 - 100 seconds. LUIZA is a dedicated framework developed for efficient off-line processing of the Pi of the Sky data, implemented in C++. The photometric algorithm based on ASAS photometry was implemented in LUIZA and compared with the algorithm based on the pixel cluster reconstruction and simple aperture photometry algorithm. Optimized photometry algorithms were then applied to the sample of test images, which were modified to include different patterns of variability of the stars (training sample). Different statistical estimators are considered for developing the general variable star identification algorithm. The algorithm will then be used to search for short-period variable stars in the real data.
Algorithmic improvements to the real-time implementation of a synthetic aperture sonar beam former
NASA Astrophysics Data System (ADS)
Freeman, Douglas K.
1997-07-01
Coastal Systems Station has translated its synthetic aperture sonar beamformer from linear processing to parallel processing. The initial implementation included many linear processes delegated to individual processors and neglected algorithmic refinements available to parallel processing. The steps taken to achieve increased computational speed for real-time beam forming are presented.
An implementation of a data-transmission pipelining algorithm on Imote2 platforms
NASA Astrophysics Data System (ADS)
Li, Xu; Dorvash, Siavash; Cheng, Liang; Pakzad, Shamim
2011-04-01
Over the past several years, wireless network systems and sensing technologies have been developed significantly. This has resulted in the broad application of wireless sensor networks (WSNs) in many engineering fields and in particular structural health monitoring (SHM). The movement of traditional SHM toward the new generation of SHM, which utilizes WSNs, relies on the advantages of this new approach such as relatively low costs, ease of implementation and the capability of onboard data processing and management. In the particular case of long span bridge monitoring, a WSN should be capable of transmitting commands and measurement data over long network geometry in a reliable manner. While using single-hop data transmission in such geometry requires a long radio range and consequently a high level of power supply, multi-hop communication may offer an effective and reliable way for data transmissions across the network. Using a multi-hop communication protocol, the network relays data from a remote node to the base station via intermediary nodes. We have proposed a data-transmission pipelining algorithm to enable an effective use of the available bandwidth and minimize the energy consumption and the delay performance by the multi-hop communication protocol. This paper focuses on the implementation aspect of the pipelining algorithm on Imote2 platforms for SHM applications, describes its interaction with underlying routing protocols, and presents the solutions to various implementation issues of the proposed pipelining algorithm. Finally, the performance of the algorithm is evaluated based on the results of an experimental implementation.
Implementation and evaluation of the new wind algorithm in NASA's 50 MHz doppler radar wind profiler
NASA Technical Reports Server (NTRS)
Taylor, Gregory E.; Manobianco, John T.; Schumann, Robin S.; Wheeler, Mark M.; Yersavich, Ann M.
1993-01-01
The purpose of this report is to document the Applied Meteorology Unit's implementation and evaluation of the wind algorithm developed by Marshall Space Flight Center (MSFC) on the data analysis processor (DAP) of NASA's 50 MHz doppler radar wind profiler (DRWP). The report also includes a summary of the 50 MHz DRWP characteristics and performance and a proposed concept of operations for the DRWP.
Zhang, Aizhen; Wen, Ning; Nurushev, Teamour; Burmeister, Jay; Chetty, Indrin J
2013-01-01
A commercial electron Monte Carlo (eMC) dose calculation algorithm has become available in Eclipse treatment planning system. The purpose of this work was to evaluate the eMC algorithm and investigate the clinical implementation of this system. The beam modeling of the eMC algorithm was performed for beam energies of 6, 9, 12, 16, and 20 MeV for a Varian Trilogy and all available applicator sizes in the Eclipse treatment planning system. The accuracy of the eMC algorithm was evaluated in a homogeneous water phantom, solid water phantoms containing lung and bone materials, and an anthropomorphic phantom. In addition, dose calculation accuracy was compared between pencil beam (PB) and eMC algorithms in the same treatment planning system for heterogeneous phantoms. The overall agreement between eMC calculations and measurements was within 3%/2 mm, while the PB algorithm had large errors (up to 25%) in predicting dose distributions in the presence of inhomogeneities such as bone and lung. The clinical implementation of the eMC algorithm was investigated by performing treatment planning for 15 patients with lesions in the head and neck, breast, chest wall, and sternum. The dose distributions were calculated using PB and eMC algorithms with no smoothing and all three levels of 3D Gaussian smoothing for comparison. Based on a routine electron beam therapy prescription method, the number of eMC calculated monitor units (MUs) was found to increase with increased 3D Gaussian smoothing levels. 3D Gaussian smoothing greatly improved the visual usability of dose distributions and produced better target coverage. Differences of calculated MUs and dose distributions between eMC and PB algorithms could be significant when oblique beam incidence, surface irregularities, and heterogeneous tissues were present in the treatment plans. In our patient cases, monitor unit differences of up to 7% were observed between PB and eMC algorithms. Monitor unit calculations were also preformed
Towards the Implementation of an Autonomous Camera Algorithm on the da Vinci Platform.
Eslamian, Shahab; Reisner, Luke A; King, Brady W; Pandya, Abhilash K
2016-01-01
Camera positioning is critical for all telerobotic surgical systems. Inadequate visualization of the remote site can lead to serious errors that can jeopardize the patient. An autonomous camera algorithm has been developed on a medical robot (da Vinci) simulator. It is found to be robust in key scenarios of operation. This system behaves with predictable and expected actions for the camera arm with respect to the tool positions. The implementation of this system is described herein. The simulation closely models the methodology needed to implement autonomous camera control in a real hardware system. The camera control algorithm follows three rules: (1) keep the view centered on the tools, (2) keep the zoom level optimized such that the tools never leave the field of view, and (3) avoid unnecessary movement of the camera that may distract/disorient the surgeon. Our future work will apply this algorithm to the real da Vinci hardware. PMID:27046563
Kamath, Jayesh; Wakai, Sara; Zhang, Wanli; Kesten, Karen; Shelton, Deborah; Trestman, Robert
2016-08-01
Use of medication algorithms in the correctional setting may facilitate clinical decision making, improve consistency of care, and reduce polypharmacy. The objective of the present study was to evaluate effectiveness of algorithm (Texas Implementation of Medication Algorithm [TIMA])-driven treatment of bipolar disorder (BD) compared with Treatment as Usual (TAU) in the correctional environment. A total of 61 women inmates with BD were randomized to TIMA (n = 30) or TAU (n = 31) and treated over a 12-week period. The outcome measures included measures of BD symptoms, comorbid symptomatology, quality of life, and psychotropic medication utilization. In comparison with TAU, TIMA-driven treatment reduced polypharmacy, decreased overall psychotropic medication utilization, and significantly decreased use of specific classes of psychotropic medication (antipsychotics and antidepressants). This pilot study confirmed the feasibility and benefits of algorithm-driven treatment of BD in the correctional setting, primarily by enhancing appropriate use of evidence-based treatment. PMID:25829456
Quantum Algorithm for Universal Implementation of the Projective Measurement of Energy
NASA Astrophysics Data System (ADS)
Nakayama, Shojun; Soeda, Akihito; Murao, Mio
2015-05-01
A projective measurement of energy (PME) on a quantum system is a quantum measurement determined by the Hamiltonian of the system. PME protocols exist when the Hamiltonian is given in advance. Unknown Hamiltonians can be identified by quantum tomography, but the time cost to achieve a given accuracy increases exponentially with the size of the quantum system. In this Letter, we improve the time cost by adapting quantum phase estimation, an algorithm designed for computational problems, to measurements on physical systems. We present a PME protocol without quantum tomography for Hamiltonians whose dimension and energy scale are given but which are otherwise unknown. Our protocol implements a PME to arbitrary accuracy without any dimension dependence on its time cost. We also show that another computational quantum algorithm may be used for efficient estimation of the energy scale. These algorithms show that computational quantum algorithms, with suitable modifications, have applications beyond their original context.
Design and Implementation of Hybrid CORDIC Algorithm Based on Phase Rotation Estimation for NCO
Zhang, Chaozhu; Han, Jinan; Li, Ke
2014-01-01
The numerical controlled oscillator has wide application in radar, digital receiver, and software radio system. Firstly, this paper introduces the traditional CORDIC algorithm. Then in order to improve computing speed and save resources, this paper proposes a kind of hybrid CORDIC algorithm based on phase rotation estimation applied in numerical controlled oscillator (NCO). Through estimating the direction of part phase rotation, the algorithm reduces part phase rotation and add-subtract unit, so that it decreases delay. Furthermore, the paper simulates and implements the numerical controlled oscillator by Quartus II software and Modelsim software. Finally, simulation results indicate that the improvement over traditional CORDIC algorithm is achieved in terms of ease of computation, resource utilization, and computing speed/delay while maintaining the precision. It is suitable for high speed and precision digital modulation and demodulation. PMID:25110750
Design and implementation of hybrid CORDIC algorithm based on phase rotation estimation for NCO.
Zhang, Chaozhu; Han, Jinan; Li, Ke
2014-01-01
The numerical controlled oscillator has wide application in radar, digital receiver, and software radio system. Firstly, this paper introduces the traditional CORDIC algorithm. Then in order to improve computing speed and save resources, this paper proposes a kind of hybrid CORDIC algorithm based on phase rotation estimation applied in numerical controlled oscillator (NCO). Through estimating the direction of part phase rotation, the algorithm reduces part phase rotation and add-subtract unit, so that it decreases delay. Furthermore, the paper simulates and implements the numerical controlled oscillator by Quartus II software and Modelsim software. Finally, simulation results indicate that the improvement over traditional CORDIC algorithm is achieved in terms of ease of computation, resource utilization, and computing speed/delay while maintaining the precision. It is suitable for high speed and precision digital modulation and demodulation. PMID:25110750
Design and Implementation of Broadcast Algorithms for Extreme-Scale Systems
Shamis, Pavel; Graham, Richard L; Gorentla Venkata, Manjunath; Ladd, Joshua
2011-01-01
The scalability and performance of collective communication operations limit the scalability and performance of many scientific applications. This paper presents two new blocking and nonblocking Broadcast algorithms for communicators with arbitrary communication topology, and studies their performance. These algorithms benefit from increased concurrency and a reduced memory footprint, making them suitable for use on large-scale systems. Measuring small, medium, and large data Broadcasts on a Cray-XT5, using 24,576 MPI processes, the Cheetah algorithms outperform the native MPI on that system by 51%, 69%, and 9%, respectively, at the same process count. These results demonstrate an algorithmic approach to the implementation of the important class of collective communications, which is high performing, scalable, and also uses resources in a scalable manner.
Implementation of the Iterative Proportion Fitting Algorithm for Geostatistical Facies Modeling
Li Yupeng Deutsch, Clayton V.
2012-06-15
In geostatistics, most stochastic algorithm for simulation of categorical variables such as facies or rock types require a conditional probability distribution. The multivariate probability distribution of all the grouped locations including the unsampled location permits calculation of the conditional probability directly based on its definition. In this article, the iterative proportion fitting (IPF) algorithm is implemented to infer this multivariate probability. Using the IPF algorithm, the multivariate probability is obtained by iterative modification to an initial estimated multivariate probability using lower order bivariate probabilities as constraints. The imposed bivariate marginal probabilities are inferred from profiles along drill holes or wells. In the IPF process, a sparse matrix is used to calculate the marginal probabilities from the multivariate probability, which makes the iterative fitting more tractable and practical. This algorithm can be extended to higher order marginal probability constraints as used in multiple point statistics. The theoretical framework is developed and illustrated with estimation and simulation example.
A complete implementation of the conjugate gradient algorithm on a reconfigurable supercomputer
Dubois, David H; Dubois, Andrew J; Connor, Carolyn M; Boorman, Thomas M; Poole, Stephen W
2008-01-01
The conjugate gradient is a prominent iterative method for solving systems of sparse linear equations. Large-scale scientific applications often utilize a conjugate gradient solver at their computational core. In this paper we present a field programmable gate array (FPGA) based implementation of a double precision, non-preconditioned, conjugate gradient solver for fmite-element or finite-difference methods. OUf work utilizes the SRC Computers, Inc. MAPStation hardware platform along with the 'Carte' software programming environment to ease the programming workload when working with the hybrid (CPUIFPGA) environment. The implementation is designed to handle large sparse matrices of up to order N x N where N <= 116,394, with up to 7 non-zero, 64-bit elements per sparse row. This implementation utilizes an optimized sparse matrix-vector multiply operation which is critical for obtaining high performance. Direct parallel implementations of loop unrolling and loop fusion are utilized to extract performance from the various vector/matrix operations. Rather than utilize the FPGA devices as function off-load accelerators, our implementation uses the FPGAs to implement the core conjugate gradient algorithm. Measured run-time performance data is presented comparing the FPGA implementation to a software-only version showing that the FPGA can outperform processors running up to 30x the clock rate. In conclusion we take a look at the new SRC-7 system and estimate the performance of this algorithm on that architecture.
On recursive least-squares filtering algorithms and implementations. Ph.D. Thesis
NASA Technical Reports Server (NTRS)
Hsieh, Shih-Fu
1990-01-01
In many real-time signal processing applications, fast and numerically stable algorithms for solving least-squares problems are necessary and important. In particular, under non-stationary conditions, these algorithms must be able to adapt themselves to reflect the changes in the system and take appropriate adjustments to achieve optimum performances. Among existing algorithms, the QR-decomposition (QRD)-based recursive least-squares (RLS) methods have been shown to be useful and effective for adaptive signal processing. In order to increase the speed of processing and achieve high throughput rate, many algorithms are being vectorized and/or pipelined to facilitate high degrees of parallelism. A time-recursive formulation of RLS filtering employing block QRD will be considered first. Several methods, including a new non-continuous windowing scheme based on selectively rejecting contaminated data, were investigated for adaptive processing. Based on systolic triarrays, many other forms of systolic arrays are shown to be capable of implementing different algorithms. Various updating and downdating systolic algorithms and architectures for RLS filtering are examined and compared in details, which include Householder reflector, Gram-Schmidt procedure, and Givens rotation. A unified approach encompassing existing square-root-free algorithms is also proposed. For the sinusoidal spectrum estimation problem, a judicious method of separating the noise from the signal is of great interest. Various truncated QR methods are proposed for this purpose and compared to the truncated SVD method. Computer simulations provided for detailed comparisons show the effectiveness of these methods. This thesis deals with fundamental issues of numerical stability, computational efficiency, adaptivity, and VLSI implementation for the RLS filtering problems. In all, various new and modified algorithms and architectures are proposed and analyzed; the significance of any of the new method depends
Further optimization of SeDDaRA blind image deconvolution algorithm and its DSP implementation
NASA Astrophysics Data System (ADS)
Wen, Bo; Zhang, Qiheng; Zhang, Jianlin
2011-11-01
Efficient algorithm for blind image deconvolution and its high-speed implementation is of great value in practice. Further optimization of SeDDaRA is developed, from algorithm structure to numerical calculation methods. The main optimization covers that, the structure's modularization for good implementation feasibility, reducing the data computation and dependency of 2D-FFT/IFFT, and acceleration of power operation by segmented look-up table. Then the Fast SeDDaRA is proposed and specialized for low complexity. As the final implementation, a hardware system of image restoration is conducted by using the multi-DSP parallel processing. Experimental results show that, the processing time and memory demand of Fast SeDDaRA decreases 50% at least; the data throughput of image restoration system is over 7.8Msps. The optimization is proved efficient and feasible, and the Fast SeDDaRA is able to support the real-time application.
Madduri, Kamesh; Ediger, David; Jiang, Karl; Bader, David A.; Chavarría-Miranda, Daniel
2009-05-29
We present a new lock-free parallel algorithm for computing betweenness centrality of massive small-world networks. With minor changes to the data structures, our algorithm also achieves better spatial cache locality compared to previous approaches. Betweenness centrality is a key algorithm kernel in the HPCS SSCA#2 Graph Analysis benchmark, which has been extensively used to evaluate the performance of emerging high-performance computing architectures for graph-theoretic computations. We design optimized implementations of betweenness centrality and the SSCA#2 benchmark for two hardware multithreaded systems: a Cray XMT system with the ThreadStorm processor, and a single-socket Sun multicore server with the UltraSparc T2 processor. For a small-world network of 134 million vertices and 1.073 billion edges, the 16-processor XMT system and the 8-core Sun Fire T5120 server achieve TEPS scores (an algorithmic performance count for the SSCA#2 benchmark) of 160 million and 90 million respectively, which corresponds to more than a 2X performance improvement over the previous parallel implementations. To better characterize the performance of these multithreaded systems, we correlate the SSCA#2 performance results with data from the memory-intensive STREAM and RandomAccess benchmarks. Finally, we demonstrate the applicability of our implementation to analyze massive real-world datasets by computing approximate betweenness centrality for a large-scale IMDb movie-actor network.
A dual-processor multi-frequency implementation of the FINDS algorithm
NASA Technical Reports Server (NTRS)
Godiwala, Pankaj M.; Caglayan, Alper K.
1987-01-01
This report presents a parallel processing implementation of the FINDS (Fault Inferring Nonlinear Detection System) algorithm on a dual processor configured target flight computer. First, a filter initialization scheme is presented which allows the no-fail filter (NFF) states to be initialized using the first iteration of the flight data. A modified failure isolation strategy, compatible with the new failure detection strategy reported earlier, is discussed and the performance of the new FDI algorithm is analyzed using flight recorded data from the NASA ATOPS B-737 aircraft in a Microwave Landing System (MLS) environment. The results show that low level MLS, IMU, and IAS sensor failures are detected and isolated instantaneously, while accelerometer and rate gyro failures continue to take comparatively longer to detect and isolate. The parallel implementation is accomplished by partitioning the FINDS algorithm into two parts: one based on the translational dynamics and the other based on the rotational kinematics. Finally, a multi-rate implementation of the algorithm is presented yielding significantly low execution times with acceptable estimation and FDI performance.
The design and hardware implementation of a low-power real-time seizure detection algorithm
NASA Astrophysics Data System (ADS)
Raghunathan, Shriram; Gupta, Sumeet K.; Ward, Matthew P.; Worth, Robert M.; Roy, Kaushik; Irazoqui, Pedro P.
2009-10-01
Epilepsy affects more than 1% of the world's population. Responsive neurostimulation is emerging as an alternative therapy for the 30% of the epileptic patient population that does not benefit from pharmacological treatment. Efficient seizure detection algorithms will enable closed-loop epilepsy prostheses by stimulating the epileptogenic focus within an early onset window. Critically, this is expected to reduce neuronal desensitization over time and lead to longer-term device efficacy. This work presents a novel event-based seizure detection algorithm along with a low-power digital circuit implementation. Hippocampal depth-electrode recordings from six kainate-treated rats are used to validate the algorithm and hardware performance in this preliminary study. The design process illustrates crucial trade-offs in translating mathematical models into hardware implementations and validates statistical optimizations made with empirical data analyses on results obtained using a real-time functioning hardware prototype. Using quantitatively predicted thresholds from the depth-electrode recordings, the auto-updating algorithm performs with an average sensitivity and selectivity of 95.3 ± 0.02% and 88.9 ± 0.01% (mean ± SEα = 0.05), respectively, on untrained data with a detection delay of 8.5 s [5.97, 11.04] from electrographic onset. The hardware implementation is shown feasible using CMOS circuits consuming under 350 nW of power from a 250 mV supply voltage from simulations on the MIT 180 nm SOI process.
Rios, A. B.; Valda, A.; Somacal, H.
2007-10-26
Usually tomographic procedure requires a set of projections around the object under study and a mathematical processing of such projections through reconstruction algorithms. An accurate reconstruction requires a proper number of projections (angular sampling) and a proper number of elements in each projection (linear sampling). However in several practical cases it is not possible to fulfill these conditions leading to the so-called problem of few projections. In this case, iterative reconstruction algorithms are more suitable than analytic ones. In this work we present a program written in C++ that provides an environment for two iterative algorithm implementations, one algebraic and the other statistical. The software allows the user a full definition of the acquisition and reconstruction geometries used for the reconstruction algorithms but also to perform projection and backprojection operations. A set of analysis tools was implemented for the characterization of the convergence process. We analyze the performance of the algorithms on numerical phantoms and present the reconstruction of experimental data with few projections coming from transmission X-ray and micro PIXE (Particle-Induced X-Ray Emission) images.
Parallel Implementations Of The Nelder-Mead Simplex Algorithm For Unconstrained Optimization
NASA Astrophysics Data System (ADS)
Dennis, J. E.; Torczon, Virginia
1988-04-01
We are interested in implementing direct search methods on parallel computers to solve the unconstrained minimization problem: Given a function f : IRn --? IR find an x E En that minimizes 1 (x). Our preliminary work has focused on the Nelder-Mead simplex algorithm. The origin of the algorithm can be found in a 1962 paper by Spendley, Hext and Himsworth;1 Nelder and Meade proposed an adaptive version which proved to be much more robust in practice. Dennis and Woods3 give a clear presentation of the standard Nelder-Mead simplex algorithm; Woods4 includes a more complete discussion of implementation details as well as some preliminary convergence results. Since descriptions of the standard Nelder-Mead simplex algorithm appear in Nelder and Mead,2 Dennis and Woods,3 and Woods,4 we will limit our introductory discussion to the advantages and disadvantages of the algorithm, as well as some of the features which make it so popular. We then outline the approaches we have taken and discuss our preliminary results. We conclude with a discussion of future research and some observations about our findings.
Efficient parallel implementation of active appearance model fitting algorithm on GPU.
Wang, Jinwei; Ma, Xirong; Zhu, Yuanping; Sun, Jizhou
2014-01-01
The active appearance model (AAM) is one of the most powerful model-based object detecting and tracking methods which has been widely used in various situations. However, the high-dimensional texture representation causes very time-consuming computations, which makes the AAM difficult to apply to real-time systems. The emergence of modern graphics processing units (GPUs) that feature a many-core, fine-grained parallel architecture provides new and promising solutions to overcome the computational challenge. In this paper, we propose an efficient parallel implementation of the AAM fitting algorithm on GPUs. Our design idea is fine grain parallelism in which we distribute the texture data of the AAM, in pixels, to thousands of parallel GPU threads for processing, which makes the algorithm fit better into the GPU architecture. We implement our algorithm using the compute unified device architecture (CUDA) on the Nvidia's GTX 650 GPU, which has the latest Kepler architecture. To compare the performance of our algorithm with different data sizes, we built sixteen face AAM models of different dimensional textures. The experiment results show that our parallel AAM fitting algorithm can achieve real-time performance for videos even on very high-dimensional textures. PMID:24723812
Efficient Parallel Implementation of Active Appearance Model Fitting Algorithm on GPU
Wang, Jinwei; Ma, Xirong; Zhu, Yuanping; Sun, Jizhou
2014-01-01
The active appearance model (AAM) is one of the most powerful model-based object detecting and tracking methods which has been widely used in various situations. However, the high-dimensional texture representation causes very time-consuming computations, which makes the AAM difficult to apply to real-time systems. The emergence of modern graphics processing units (GPUs) that feature a many-core, fine-grained parallel architecture provides new and promising solutions to overcome the computational challenge. In this paper, we propose an efficient parallel implementation of the AAM fitting algorithm on GPUs. Our design idea is fine grain parallelism in which we distribute the texture data of the AAM, in pixels, to thousands of parallel GPU threads for processing, which makes the algorithm fit better into the GPU architecture. We implement our algorithm using the compute unified device architecture (CUDA) on the Nvidia's GTX 650 GPU, which has the latest Kepler architecture. To compare the performance of our algorithm with different data sizes, we built sixteen face AAM models of different dimensional textures. The experiment results show that our parallel AAM fitting algorithm can achieve real-time performance for videos even on very high-dimensional textures. PMID:24723812
Implementation and analysis of a Navier-Stokes algorithm on parallel computers
NASA Technical Reports Server (NTRS)
Fatoohi, Raad A.; Grosch, Chester E.
1988-01-01
The results of the implementation of a Navier-Stokes algorithm on three parallel/vector computers are presented. The object of this research is to determine how well, or poorly, a single numerical algorithm would map onto three different architectures. The algorithm is a compact difference scheme for the solution of the incompressible, two-dimensional, time-dependent Navier-Stokes equations. The computers were chosen so as to encompass a variety of architectures. They are the following: the MPP, an SIMD machine with 16K bit serial processors; Flex/32, an MIMD machine with 20 processors; and Cray/2. The implementation of the algorithm is discussed in relation to these architectures and measures of the performance on each machine are given. The basic comparison is among SIMD instruction parallelism on the MPP, MIMD process parallelism on the Flex/32, and vectorization of a serial code on the Cray/2. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for this algorithm on these architectures. Finally, conclusions are presented.
Software algorithm and hardware design for real-time implementation of new spectral estimator
2014-01-01
Background Real-time spectral analyzers can be difficult to implement for PC computer-based systems because of the potential for high computational cost, and algorithm complexity. In this work a new spectral estimator (NSE) is developed for real-time analysis, and compared with the discrete Fourier transform (DFT). Method Clinical data in the form of 216 fractionated atrial electrogram sequences were used as inputs. The sample rate for acquisition was 977 Hz, or approximately 1 millisecond between digital samples. Real-time NSE power spectra were generated for 16,384 consecutive data points. The same data sequences were used for spectral calculation using a radix-2 implementation of the DFT. The NSE algorithm was also developed for implementation as a real-time spectral analyzer electronic circuit board. Results The average interval for a single real-time spectral calculation in software was 3.29 μs for NSE versus 504.5 μs for DFT. Thus for real-time spectral analysis, the NSE algorithm is approximately 150× faster than the DFT. Over a 1 millisecond sampling period, the NSE algorithm had the capability to spectrally analyze a maximum of 303 data channels, while the DFT algorithm could only analyze a single channel. Moreover, for the 8 second sequences, the NSE spectral resolution in the 3-12 Hz range was 0.037 Hz while the DFT spectral resolution was only 0.122 Hz. The NSE was also found to be implementable as a standalone spectral analyzer board using approximately 26 integrated circuits at a cost of approximately $500. The software files used for analysis are included as a supplement, please see the Additional files 1 and 2. Conclusions The NSE real-time algorithm has low computational cost and complexity, and is implementable in both software and hardware for 1 millisecond updates of multichannel spectra. The algorithm may be helpful to guide radiofrequency catheter ablation in real time. PMID:24886214
An implementation of the look-ahead Lanczos algorithm for non-Hermitian matrices
NASA Technical Reports Server (NTRS)
Freund, Roland W.; Gutknecht, Martin H.; Nachtigal, Noel M.
1991-01-01
The nonsymmetric Lanczos method can be used to compute eigenvalues of large sparse non-Hermitian matrices or to solve large sparse non-Hermitian linear systems. However, the original Lanczos algorithm is susceptible to possible breakdowns and potential instabilities. An implementation is presented of a look-ahead version of the Lanczos algorithm that, except for the very special situation of an incurable breakdown, overcomes these problems by skipping over those steps in which a breakdown or near-breakdown would occur in the standard process. The proposed algorithm can handle look-ahead steps of any length and requires the same number of matrix-vector products and inner products as the standard Lanczos process without look-ahead.
Implementation of an algorithm for cylindrical object identification using range data
NASA Technical Reports Server (NTRS)
Bozeman, Sylvia T.; Martin, Benjamin J.
1989-01-01
One of the problems in 3-D object identification and localization is addressed. In robotic and navigation applications the vision system must be able to distinguish cylindrical or spherical objects as well as those of other geometric shapes. An algorithm was developed to identify cylindrical objects in an image when range data is used. The algorithm incorporates the Hough transform for line detection using edge points which emerge from a Sobel mask. Slices of the data are examined to locate arcs of circles using the normal equations of an over-determined linear system. Current efforts are devoted to testing the computer implementation of the algorithm. Refinements are expected to continue in order to accommodate cylinders in various positions. A technique is sought which is robust in the presence of noise and partial occlusions.
NASA Astrophysics Data System (ADS)
Chung, Vera Y. Y.; Bergmann, Neil W.
1998-12-01
This paper presents how to implement the block-matching motion estimation algorithm efficiently by Field Programmable Gate Arrays (FPGAs) based Custom Computer Machine (CCM) for video compression. The SPACE2 Custom Computer board consists of up to eight Xilinx XC6216 fine- grain, sea-of-gate FPGA chips. The results show that two Xilinx XC6216 FPGA can perform at 960 MOPs, hence the real- time full-search motion estimation encoder can be easily implemented by our SPACE2 CCM system.
An Optional Threshold with Svm Cloud Detection Algorithm and Dsp Implementation
NASA Astrophysics Data System (ADS)
Zhou, Guoqing; Zhou, Xiang; Yue, Tao; Liu, Yilong
2016-06-01
This paper presents a method which combines the traditional threshold method and SVM method, to detect the cloud of Landsat-8 images. The proposed method is implemented using DSP for real-time cloud detection. The DSP platform connects with emulator and personal computer. The threshold method is firstly utilized to obtain a coarse cloud detection result, and then the SVM classifier is used to obtain high accuracy of cloud detection. More than 200 cloudy images from Lansat-8 were experimented to test the proposed method. Comparing the proposed method with SVM method, it is demonstrated that the cloud detection accuracy of each image using the proposed algorithm is higher than those of SVM algorithm. The results of the experiment demonstrate that the implementation of the proposed method on DSP can effectively realize the real-time cloud detection accurately.
NASA Technical Reports Server (NTRS)
Jaggi, S.; Quattrochi, Dale A.; Lam, Nina S.-N.
1993-01-01
Fractal geometry is increasingly becoming a useful tool for modeling natural phenomena. As an alternative to Euclidean concepts, fractals allow for a more accurate representation of the nature of complexity in natural boundaries and surfaces. The purpose of this paper is to introduce and implement three algorithms in C code for deriving fractal measurement from remotely sensed data. These three methods are: the line-divider method, the variogram method, and the triangular prism method. Remote-sensing data acquired by NASA's Calibrated Airborne Multispectral Scanner (CAMS) are used to compute the fractal dimension using each of the three methods. These data were obtained as a 30 m pixel spatial resolution over a portion of western Puerto Rico in January 1990. A description of the three methods, their implementation in PC-compatible environment, and some results of applying these algorithms to remotely sensed image data are presented.
NASA Astrophysics Data System (ADS)
Barney, D.; Bialas, W.; Kokkas, P.; Manthos, N.; Reynaud, S.; Sidiropoulos, G.; Vichoudis, P.
2007-03-01
The CMS Endcap Preshower (ES) sub-detector comprises 4288 silicon sensors, each containing 32 strips. The data are transferred from the detector to the counting room via 1208 optical fibres running at 800Mbps. Each fibre carries data from two, three or four sensors. For the readout of the Preshower, a VME-based system, the Endcap Preshower Data Concentrator Card (ES-DCC), is currently under development. The main objective of each readout board is to acquire on-detector data from up to 36 optical links, perform on-line data reduction via zero suppression and pass the concentrated data to the CMS event builder. This document presents the conceptual design of the Reduction Algorithms as well as their implementation in the ES-DCC FPGAs. These algorithms, as implemented in the ES-DCC, result in a data-reduction factor of 20.
NASA Technical Reports Server (NTRS)
Tilton, James C.; Plaza, Antonio J. (Editor); Chang, Chein-I. (Editor)
2008-01-01
The hierarchical image segmentation algorithm (referred to as HSEG) is a hybrid of hierarchical step-wise optimization (HSWO) and constrained spectral clustering that produces a hierarchical set of image segmentations. HSWO is an iterative approach to region grooving segmentation in which the optimal image segmentation is found at N(sub R) regions, given a segmentation at N(sub R+1) regions. HSEG's addition of constrained spectral clustering makes it a computationally intensive algorithm, for all but, the smallest of images. To counteract this, a computationally efficient recursive approximation of HSEG (called RHSEG) has been devised. Further improvements in processing speed are obtained through a parallel implementation of RHSEG. This chapter describes this parallel implementation and demonstrates its computational efficiency on a Landsat Thematic Mapper test scene.
Design and Implementation of IIR Algorithms for Control of Longitudinal Coupled-Bunch Instabilities
Teytelman, Dmitry
2000-05-16
The recent installation of third-harmonic RF cavities at the Advanced Light Source has raised instability growth rates, and also caused tune shifts (coherent and incoherent) of more than an octave over the required range of beam currents and energies. The larger growth rates and tune shifts have rendered control by the original bandpass FIR feedback algorithms unreliable. In this paper the authors describe an implementation of an IIR feedback algorithm with more exible response tailoring. A cascade of up to 6 second-order IIR sections (12 poles and 12 zeros) was implemented in the DSPs of the longitudinal feedback system. Filter design has been formulated as an optimization problem and solved using constrained optimization methods. These IIR filters provided 2.4 times the control bandwidth as compared to the original FIR designs. Here the authors demonstrate the performance of the designed filters using transient diagnostic measurements from ALS and DAPNE.
NASA Astrophysics Data System (ADS)
Karabiber, Fethullah; Vecchio, Pietro; Grassi, Giuseppe
2011-12-01
The Bio-inspired (Bi-i) Cellular Vision System is a computing platform consisting of sensing, array sensing-processing, and digital signal processing. The platform is based on the Cellular Neural/Nonlinear Network (CNN) paradigm. This article presents the implementation of a novel CNN-based segmentation algorithm onto the Bi-i system. Each part of the algorithm, along with the corresponding implementation on the hardware platform, is carefully described through the article. The experimental results, carried out for Foreman and Car-phone video sequences, highlight the feasibility of the approach, which provides a frame rate of about 26 frames/s. Comparisons with existing CNN-based methods show that the conceived approach is more accurate, thus representing a good trade-off between real-time requirements and accuracy.
A new morphological anomaly detection algorithm for hyperspectral images and its GPU implementation
NASA Astrophysics Data System (ADS)
Paz, Abel; Plaza, Antonio
2011-10-01
Anomaly detection is considered a very important task for hyperspectral data exploitation. It is now routinely applied in many application domains, including defence and intelligence, public safety, precision agriculture, geology, or forestry. Many of these applications require timely responses for swift decisions which depend upon high computing performance of algorithm analysis. However, with the recent explosion in the amount and dimensionality of hyperspectral imagery, this problem calls for the incorporation of parallel computing techniques. In the past, clusters of computers have offered an attractive solution for fast anomaly detection in hyperspectral data sets already transmitted to Earth. However, these systems are expensive and difficult to adapt to on-board data processing scenarios, in which low-weight and low-power integrated components are essential to reduce mission payload and obtain analysis results in (near) real-time, i.e., at the same time as the data is collected by the sensor. An exciting new development in the field of commodity computing is the emergence of commodity graphics processing units (GPUs), which can now bridge the gap towards on-board processing of remotely sensed hyperspectral data. In this paper, we develop a new morphological algorithm for anomaly detection in hyperspectral images along with an efficient GPU implementation of the algorithm. The algorithm is implemented on latest-generation GPU architectures, and evaluated with regards to other anomaly detection algorithms using hyperspectral data collected by NASA's Airborne Visible Infra-Red Imaging Spectrometer (AVIRIS) over the World Trade Center (WTC) in New York, five days after the terrorist attacks that collapsed the two main towers in the WTC complex. The proposed GPU implementation achieves real-time performance in the considered case study.
Sundareshan, Malur K; Bhattacharjee, Supratik; Inampudi, Radhika; Pang, Ho-Yuen
2002-12-10
Computational complexity is a major impediment to the real-time implementation of image restoration and superresolution algorithms in many applications. Although powerful restoration algorithms have been developed within the past few years utilizing sophisticated mathematical machinery (based on statistical optimization and convex set theory), these algorithms are typically iterative in nature and require a sufficient number of iterations to be executed to achieve the desired resolution improvement that may be needed to meaningfully perform postprocessing image exploitation tasks in practice. Additionally, recent technological breakthroughs have facilitated novel sensor designs (focal plane arrays, for instance) that make it possible to capture megapixel imagery data at video frame rates. A major challenge in the processing of these large-format images is to complete the execution of the image processing steps within the frame capture times and to keep up with the output rate of the sensor so that all data captured by the sensor can be efficiently utilized. Consequently, development of novel methods that facilitate real-time implementation of image restoration and superresolution algorithms is of significant practical interest and is the primary focus of this study. The key to designing computationally efficient processing schemes lies in strategically introducing appropriate preprocessing steps together with the superresolution iterations to tailor optimized overall processing sequences for imagery data of specific formats. For substantiating this assertion, three distinct methods for tailoring a preprocessing filter and integrating it with the superresolution processing steps are outlined. These methods consist of a region-of-interest extraction scheme, a background-detail separation procedure, and a scene-derived information extraction step for implementing a set-theoretic restoration of the image that is less demanding in computation compared with the
NASA Astrophysics Data System (ADS)
Sundareshan, Malur K.; Bhattacharjee, Supratik; Inampudi, Radhika; Pang, Ho-Yuen
2002-12-01
Computational complexity is a major impediment to the real-time implementation of image restoration and superresolution algorithms in many applications. Although powerful restoration algorithms have been developed within the past few years utilizing sophisticated mathematical machinery (based on statistical optimization and convex set theory), these algorithms are typically iterative in nature and require a sufficient number of iterations to be executed to achieve the desired resolution improvement that may be needed to meaningfully perform postprocessing image exploitation tasks in practice. Additionally, recent technological breakthroughs have facilitated novel sensor designs (focal plane arrays, for instance) that make it possible to capture megapixel imagery data at video frame rates. A major challenge in the processing of these large-format images is to complete the execution of the image processing steps within the frame capture times and to keep up with the output rate of the sensor so that all data captured by the sensor can be efficiently utilized. Consequently, development of novel methods that facilitate real-time implementation of image restoration and superresolution algorithms is of significant practical interest and is the primary focus of this study. The key to designing computationally efficient processing schemes lies in strategically introducing appropriate preprocessing steps together with the superresolution iterations to tailor optimized overall processing sequences for imagery data of specific formats. For substantiating this assertion, three distinct methods for tailoring a preprocessing filter and integrating it with the superresolution processing steps are outlined. These methods consist of a region-of-interest extraction scheme, a background-detail separation procedure, and a scene-derived information extraction step for implementing a set-theoretic restoration of the image that is less demanding in computation compared with the
Implementation of a block Lanczos algorithm for Eigenproblem solution of gyroscopic systems
NASA Technical Reports Server (NTRS)
Gupta, Kajal K.; Lawson, Charles L.
1987-01-01
The details of implementation of a general numerical procedure developed for the accurate and economical computation of natural frequencies and associated modes of any elastic structure rotating along an arbitrary axis are described. A block version of the Lanczos algorithm is derived for the solution that fully exploits associated matrix sparsity and employs only real numbers in all relevant computations. It is also capable of determining multiple roots and proves to be most efficient when compared to other, similar, exisiting techniques.
Implementation of a media synchronization algorithm for multistandard IP set-top box systems
NASA Astrophysics Data System (ADS)
Estévez, Esther; Samper, David; Pescador, Fernando; Juárez, Eduardo; Sanz, César
2009-05-01
Media synchronization at network context minimizes the effects of the network jitter and the skew between the emitter and receiver clocks. Theoretical algorithms cannot always be implemented on real systems for the architecture differences between a real and a theoretical system. In this paper an implementation for an intra-medium and an inter-media synchronization algorithm for a real multistandard IP set-top box is presented. For intra-medium synchronization, the proposed technique is based on controlling the receiver buffer. However for inter-media synchronization, the proposed technique is based on controlling the video playback according the Presentation Time Stamp (PTS) of the media units (audio and video). The proposed synchronizations algorithms has been integrated in an IP-STB and tested in a real environment using DVD movies and TV channels with excellent results. Those results show that the proposed algorithm can achieve media synchronization and meet the requirements of perceived quality of service (P-QoS).
NASA Astrophysics Data System (ADS)
Besrour, Amine; Abdelkefi, Fatma; Siala, Mohamed; Snoussi, Hichem
2015-09-01
Nowadays, the high dynamic range (HDR) imaging represents the subject of the most researches. The major problem lies in the implementation of the best algorithm to acquire the best video quality. In fact, the major constraint is to conceive an optimal fusion which must meet the rapid movement of video frames. The implemented merging algorithms were not quick enough to reconstitute the HDR video. In this paper, we detail each of the previous existing works before detailing our algorithm and presenting results from the acquired HDR images, tone mapped with various techniques. Our proposed algorithm guarantees a more enhanced and faster solution compared to the existing ones. In fact, it has the ability to calculate the saturation matrix related to the saturation rate of the neighboring pixels. The computed coefficients are affected respectively to each picture from the tested ones. This analysis provides faster and efficient results in terms of quality and brightness. The originality of our work remains on its processing method including the pixels saturation in the totality of the captured pictures and their combination in order to obtain the best pictures illustrating all the possible details. These parameters are computed for each zone depending on the contrast and the luminosity of the current pixel and its neighboring. The final HDR image's coefficients are calculated dynamically ensuring the best image quality equilibrating the brightness and contrast values and making the perfect final image.
Design And Implementation Of A Multi-Sensor Fusion Algorithm On A Hypercube Computer Architecture
NASA Astrophysics Data System (ADS)
Glover, Charles W.
1990-03-01
A multi-sensor integration (MSI) algorithm written for sequential single processor computer architecture has been transformed into a concurrent algorithm and implemented in parallel on a multi-processor hypercube computer architecture. This paper will present the philosophy and methodologies used in the decomposition of the sequential MSI algorithm, and its transformation into a parallel MSI algorithm. The parallel MSI algorithm was implemented on a NCUBETM hypercube computer. The performance of the parallel MSI algorithm has been measured and compared against its sequential counterpart by running test case scenarios through a simulation program. The simulation program allows the user to define the trajectories of all players in the scenario, and to pick the sensor suites of the players and their operating characteristics. For example, an air-to-air engagement scenario was used as one of the test cases. In this scenario, two friend aircrafts were being attacked by six foe aircraft in a pincer maneuver. Both the friend and foe aircrafts launch missiles at several different time points in the engagement. The sensor suites on each aircraft are dual mode RADAR, dual mode IRST, and ESM sensors. The modes of the sensors are switched as needed throughout the scenario. The RADAR sensor is used only intermittently, thus most of the MSI information is obtained from passive sensing. The maneuvers in this scenario caused aircraft and missile to constantly fly in and out of sensors field-of-view (F0V). This resulted in the MSI algorithm to constantly reacquire, initiate, and delete new tracks as it tracked all objects in the scenario. The objective was to determine performance of the parallel MSI algorithm in such a complex environment, and to determine how many multi-processors (nodes) of the hypercube could be effectively used by an aircraft in such an environment. For the scenario just discussed, a 4-node hypercube was found to be the optimal size and a factor two in speedup
Madduri, Kamesh; Ediger, David; Jiang, Karl; Bader, David A.; Chavarria-Miranda, Daniel
2009-02-15
We present a new lock-free parallel algorithm for computing betweenness centralityof massive small-world networks. With minor changes to the data structures, ouralgorithm also achieves better spatial cache locality compared to previous approaches. Betweenness centrality is a key algorithm kernel in HPCS SSCA#2, a benchmark extensively used to evaluate the performance of emerging high-performance computing architectures for graph-theoretic computations. We design optimized implementations of betweenness centrality and the SSCA#2 benchmark for two hardware multithreaded systems: a Cray XMT system with the Threadstorm processor, and a single-socket Sun multicore server with the UltraSPARC T2 processor. For a small-world network of 134 million vertices and 1.073 billion edges, the 16-processor XMT system and the 8-core Sun Fire T5120 server achieve TEPS scores (an algorithmic performance count for the SSCA#2 benchmark) of 160 million and 90 million respectively, which corresponds to more than a 2X performance improvement over the previous parallel implementations. To better characterize the performance of these multithreaded systems, we correlate the SSCA#2 performance results with data from the memory-intensive STREAM and RandomAccess benchmarks. Finally, we demonstrate the applicability of our implementation to analyze massive real-world datasets by computing approximate betweenness centrality for a large-scale IMDb movie-actor network.
Implementation and evaluation of various demons deformable image registration algorithms on a GPU
NASA Astrophysics Data System (ADS)
Gu, Xuejun; Pan, Hubert; Liang, Yun; Castillo, Richard; Yang, Deshan; Choi, Dongju; Castillo, Edward; Majumdar, Amitava; Guerrero, Thomas; Jiang, Steve B.
2010-01-01
Online adaptive radiation therapy (ART) promises the ability to deliver an optimal treatment in response to daily patient anatomic variation. A major technical barrier for the clinical implementation of online ART is the requirement of rapid image segmentation. Deformable image registration (DIR) has been used as an automated segmentation method to transfer tumor/organ contours from the planning image to daily images. However, the current computational time of DIR is insufficient for online ART. In this work, this issue is addressed by using computer graphics processing units (GPUs). A gray-scale-based DIR algorithm called demons and five of its variants were implemented on GPUs using the compute unified device architecture (CUDA) programming environment. The spatial accuracy of these algorithms was evaluated over five sets of pulmonary 4D CT images with an average size of 256 × 256 × 100 and more than 1100 expert-determined landmark point pairs each. For all the testing scenarios presented in this paper, the GPU-based DIR computation required around 7 to 11 s to yield an average 3D error ranging from 1.5 to 1.8 mm. It is interesting to find out that the original passive force demons algorithms outperform subsequently proposed variants based on the combination of accuracy, efficiency and ease of implementation.
A digital architecture for support vector machines: theory, algorithm, and FPGA implementation.
Anguita, D; Boni, A; Ridella, S
2003-01-01
In this paper, we propose a digital architecture for support vector machine (SVM) learning and discuss its implementation on a field programmable gate array (FPGA). We analyze briefly the quantization effects on the performance of the SVM in classification problems to show its robustness, in the feedforward phase, respect to fixed-point math implementations; then, we address the problem of SVM learning. The architecture described here makes use of a new algorithm for SVM learning which is less sensitive to quantization errors respect to the solution appeared so far in the literature. The algorithm is composed of two parts: the first one exploits a recurrent network for finding the parameters of the SVM; the second one uses a bisection process for computing the threshold. The architecture implementing the algorithm is described in detail and mapped on a real current-generation FPGA (Xilinx Virtex II). Its effectiveness is then tested on a channel equalization problem, where real-time performances are of paramount importance. PMID:18244555
Implementation and evaluation of two helical CT reconstruction algorithms in CIVA
NASA Astrophysics Data System (ADS)
Banjak, H.; Costin, M.; Vienne, C.; Kaftandjian, V.
2016-02-01
The large majority of industrial CT systems reconstruct the 3D volume by using an acquisition on a circular trajec-tory. However, when inspecting long objects which are highly anisotropic, this scanning geometry creates severe artifacts in the reconstruction. For this reason, the use of an advanced CT scanning method like helical data acquisition is an efficient way to address this aspect known as the long-object problem. Recently, several analytically exact and quasi-exact inversion formulas for helical cone-beam reconstruction have been proposed. Among them, we identified two algorithms of interest for our case. These algorithms are exact and of filtered back-projection structure. In this work we implemented the filtered-backprojection (FBP) and backprojection-filtration (BPF) algorithms of Zou and Pan (2004). For performance evaluation, we present a numerical compari-son of the two selected algorithms with the helical FDK algorithm using both complete (noiseless and noisy) and truncated data generated by CIVA (the simulation platform for non-destructive testing techniques developed at CEA).
NASA Astrophysics Data System (ADS)
Karthik, Victor U.; Sivasuthan, Sivamayam; Hoole, Samuel Ratnajeevan H.
2014-02-01
The computational algorithms for device synthesis and nondestructive evaluation (NDE) are often the same. In both we have a goal - a particular field configuration yielding the design performance in synthesis or to match exterior measurements in NDE. The geometry of the design or the postulated interior defect is then computed. Several optimization methods are available for this. The most efficient like conjugate gradients are very complex to program for the required derivative information. The least efficient zeroth order algorithms like the genetic algorithm take much computational time but little programming effort. This paper reports launching a Genetic Algorithm kernel on thousands of compute unified device architecture (CUDA) threads exploiting the NVIDIA graphics processing unit (GPU) architecture. The efficiency of parallelization, although below that on shared memory supercomputer architectures, is quite effective in cutting down solution time into the realm of the practicable. We carry this further into multi-physics electro-heat problems where the parameters of description are in the electrical problem and the object function in the thermal problem. Indeed, this is where the derivative of the object function in the heat problem with respect to the parameters in the electrical problem is the most difficult to compute for gradient methods, and where the genetic algorithm is most easily implemented.
An implementation of the Gillespie algorithm for RNA kinetics with logarithmic time update
Dykeman, Eric C.
2015-01-01
In this paper I outline a fast method called KFOLD for implementing the Gillepie algorithm to stochastically sample the folding kinetics of an RNA molecule at single base-pair resolution. In the same fashion as the KINFOLD algorithm, which also uses the Gillespie algorithm to predict folding kinetics, KFOLD stochastically chooses a new RNA secondary structure state that is accessible from the current state by a single base-pair addition/deletion following the Gillespie procedure. However, unlike KINFOLD, the KFOLD algorithm utilizes the fact that many of the base-pair addition/deletion reactions and their corresponding rates do not change between each step in the algorithm. This allows KFOLD to achieve a substantial speed-up in the time required to compute a prediction of the folding pathway and, for a fixed number of base-pair moves, performs logarithmically with sequence size. This increase in speed opens up the possibility of studying the kinetics of much longer RNA sequences at single base-pair resolution while also allowing for the RNA folding statistics of smaller RNA sequences to be computed much more quickly. PMID:25990741
NASA Astrophysics Data System (ADS)
Liu, Qitao; Li, Yingchun; Sun, Huayan; Zhao, Yanzhong
2008-03-01
Laser active imaging system, which is of high resolution, anti-jamming and can be three-dimensional (3-D) imaging, has been used widely. But its imagery is usually affected by speckle noise which makes the grayscale of pixels change violently, hides the subtle details and makes the imaging resolution descend greatly. Removing speckle noise is one of the most difficult problems encountered in this system because of the poor statistical property of speckle. Based on the analysis of the statistical characteristic of speckle and morphological filtering algorithm, in this paper, an improved multistage morphological filtering algorithm is studied and implemented on TMS320C6416 DSP. The algorithm makes the morphological open-close and close-open transformation by using two different linear structure elements respectively, and then takes a weighted average over the above transformational results. The weighted coefficients are decided by the statistical characteristic of speckle. This algorithm is implemented on the TMS320C6416 DSPs after simulation on computer. The procedure of software design is fully presented. The methods are fully illustrated to achieve and optimize the algorithm in the research of the structural characteristic of TMS320C6416 DSP and feature of the algorithm. In order to fully benefit from such devices and increase the performance of the whole system, it is necessary to take a series of steps to optimize the DSP programs. This paper introduces some effective methods, including refining code structure, eliminating memory dependence, optimizing assembly code via linear assembly and so on, for TMS320C6x C language optimization and then offers the results of the application in a real-time implementation. The results of processing to the images blurred by speckle noise shows that the algorithm can not only effectively suppress speckle noise but also preserve the geometrical features of images. The results of the optimized code running on the DSP platform
Ariyawansa, K.A.
1991-04-01
A benchmark parallel implementation of the Van Slyke and Wets algorithm for stochastic linear programs, and the results of a carefully designed numerical experiment on the Sequent/Balance using the implementation are presented. An important use of this implementation is as a benchmark to assess the performance of approximation algorithms for stochastic linear programs. These approximation algorithms are best suited for implementation on parallel vector processes like the Alliant FX/8. Therefore, the performance of the benchmark implementation on the Alliant FX/8 is of interest. In this paper, we present results observed when a portion of the numerical experiment is performed on the Alliant FX/8. These results indicate that the implementation makes satisfactory use of the concurrency capabilities of the Alliant FX/8. They also indicate that the vectorization capabilities of the Alliant FX/8 are not satisfactorily utilized by the implementation. 9 refs., 9 tabs.
An implementation of differential search algorithm (DSA) for inversion of surface wave data
NASA Astrophysics Data System (ADS)
Song, Xianhai; Li, Lei; Zhang, Xueqiang; Shi, Xinchun; Huang, Jianquan; Cai, Jianchao; Jin, Si; Ding, Jianping
2014-12-01
Surface wave dispersion analysis is widely used in geophysics to infer near-surface shear (S)-wave velocity profiles for a wide variety of applications. However, inversion of surface wave data is challenging for most local-search methods due to its high nonlinearity and to its multimodality. In this work, we proposed and implemented a new Rayleigh wave dispersion curve inversion scheme based on differential search algorithm (DSA), one of recently developed swarm intelligence-based algorithms. DSA is inspired from seasonal migration behavior of species of the living beings throughout the year for solving highly nonlinear, multivariable, and multimodal optimization problems. The proposed inverse procedure is applied to nonlinear inversion of fundamental-mode Rayleigh wave dispersion curves for near-surface S-wave velocity profiles. To evaluate calculation efficiency and stability of DSA, four noise-free and four noisy synthetic data sets are firstly inverted. Then, the performance of DSA is compared with that of genetic algorithms (GA) by two noise-free synthetic data sets. Finally, a real-world example from a waste disposal site in NE Italy is inverted to examine the applicability and robustness of the proposed approach on surface wave data. Furthermore, the performance of DSA is compared against that of GA by real data to further evaluate scores of the inverse procedure described here. Simulation results from both synthetic and actual field data demonstrate that differential search algorithm (DSA) applied to nonlinear inversion of surface wave data should be considered good not only in terms of the accuracy but also in terms of the convergence speed. The great advantages of DSA are that the algorithm is simple, robust and easy to implement. Also there are fewer control parameters to tune.
Realization of a scalable coherent quantum Fourier transform
NASA Astrophysics Data System (ADS)
Debnath, Shantanu; Linke, Norbert; Figgatt, Caroline; Landsman, Kevin; Wright, Ken; Monroe, Chris
2016-05-01
The exponential speed-up in some quantum algorithms is a direct result of parallel function-evaluation paths that interfere through a quantum Fourier transform (QFT). We report the implementation of a fully coherent QFT on five trapped Yb+ atomic qubits using sequences of fundamental quantum logic gates. These modular gates can be used to program arbitrary sequences nearly independent of system size and distance between qubits. We use this capability to first perform a Deutsch-Jozsa algorithm where several instances of three-qubit balanced and constant functions are implemented and then examined using single qubit QFTs. Secondly, we apply a fully coherent five-qubit QFT as a part of a quantum phase estimation protocol. Here, the QFT operates on a five-qubit superposition state with a particular phase modulation of its coefficients and directly produces the corresponding phase to five-bit precision. Finally, we examine the performance of the QFT in the period finding problem in the context of Shor's factorization algorithm. This work is supported by the ARO with funding from the IARPA MQCO program and the AFOSR MURI on Quantum Measurement and Verification.
FPGA implementation of the hyperspectral Lossy Compression for Exomars (LCE) algorithm
NASA Astrophysics Data System (ADS)
García, Aday; Santos, L.; López, S.; Callicó, G. M.; López, J. F.; Sarmiento, R.
2014-10-01
The increase of data rates and data volumes in present remote sensing payload instruments, together with the restrictions imposed in the downlink connection requirements, represent at the same time a challenge and a must in the field of data and image compression. This is especially true for the case of hyperspectral images, in which both, reduction of spatial and spectral redundancy is mandatory. Recently the Consultative Committee for Space Data Systems (CCSDS) published the Lossless Multispectral and Hyperespectral Image Compression recommendation (CCSDS 123), a prediction-based technique resulted from the consensus of its members. Although this standard offers a good trade-off between coding performance and computational complexity, the appearance of future hyperspectral and ultraspectral sensors with vast amount of data imposes further efforts from the scientific community to ensure optimal transmission to ground stations based on greater compression rates. Furthermore, hardware implementations with specific features to deal with solar radiation problems play an important role in order to achieve real time applications. In this scenario, the Lossy Compression for Exomars (LCE) algorithm emerges as a good candidate to achieve these characteristics. Its good quality/compression ratio together with its low complexity facilitates the implementation in hardware platforms such as FPGAs or ASICs. In this work the authors present the implementation of the LCE algorithm into an antifuse-based FPGA and the optimizations carried out to obtain the RTL description code using CatapultC, a High Level Synthesis (HLS) Tool. Experimental results show an area occupancy of 75% in an RTAX2000 FPGA from Microsemi, with an operating frequency of 18 MHz. Additionally, the power budget obtained is presented giving an idea of the suitability of the proposed algorithm implementation for onboard compression applications.
Software for implementing trigger algorithms on the upgraded CMS Global Trigger System
NASA Astrophysics Data System (ADS)
Matsushita, Takashi; Arnold, Bernhard
2015-12-01
The Global Trigger is the final step of the CMS Level-1 Trigger and implements a trigger menu, a set of selection requirements applied to the final list of trigger objects. The conditions for trigger object selection, with possible topological requirements on multiobject triggers, are combined by simple combinatorial logic to form the algorithms. The LHC has resumed its operation in 2015, the collision-energy will be increased to 13 TeV with the luminosity expected to go up to 2x1034 cm-2s-1. The CMS Level-1 trigger system will be upgraded to improve its performance for selecting interesting physics events and to operate within the predefined data-acquisition rate in the challenging environment expected at LHC Run 2. The Global Trigger will be re-implemented on modern FPGAs on an Advanced Mezzanine Card in MicroTCA crate. The upgraded system will benefit from the ability to process complex algorithms with DSP slices and increased processing resources with optical links running at 10 Gbit/s, enabling more algorithms at a time than previously possible and allowing CMS to be more flexible in how it handles the trigger bandwidth. In order to handle the increased complexity of the trigger menu implemented on the upgraded Global Trigger, a set of new software has been developed. The software allows a physicist to define a menu with analysis-like triggers using intuitive user interface. The menu is then realised on FPGAs with further software processing, instantiating predefined firmware blocks. The design and implementation of the software for preparing a menu for the upgraded CMS Global Trigger system are presented.
An implementation of differential evolution algorithm for inversion of geoelectrical data
NASA Astrophysics Data System (ADS)
Balkaya, Çağlayan
2013-11-01
Differential evolution (DE), a population-based evolutionary algorithm (EA) has been implemented to invert self-potential (SP) and vertical electrical sounding (VES) data sets. The algorithm uses three operators including mutation, crossover and selection similar to genetic algorithm (GA). Mutation is the most important operator for the success of DE. Three commonly used mutation strategies including DE/best/1 (strategy 1), DE/rand/1 (strategy 2) and DE/rand-to-best/1 (strategy 3) were applied together with a binomial type crossover. Evolution cycle of DE was realized without boundary constraints. For the test studies performed with SP data, in addition to both noise-free and noisy synthetic data sets two field data sets observed over the sulfide ore body in the Malachite mine (Colorado) and over the ore bodies in the Neem-Ka Thana cooper belt (India) were considered. VES test studies were carried out using synthetically produced resistivity data representing a three-layered earth model and a field data set example from Gökçeada (Turkey), which displays a seawater infiltration problem. Mutation strategies mentioned above were also extensively tested on both synthetic and field data sets in consideration. Of these, strategy 1 was found to be the most effective strategy for the parameter estimation by providing less computational cost together with a good accuracy. The solutions obtained by DE for the synthetic cases of SP were quite consistent with particle swarm optimization (PSO) which is a more widely used population-based optimization algorithm than DE in geophysics. Estimated parameters of SP and VES data were also compared with those obtained from Metropolis-Hastings (M-H) sampling algorithm based on simulated annealing (SA) without cooling to clarify uncertainties in the solutions. Comparison to the M-H algorithm shows that DE performs a fast approximate posterior sampling for the case of low-dimensional inverse geophysical problems.
Implementation of Complex Signal Processing Algorithms for Position-Sensitive Microcalorimeters
NASA Technical Reports Server (NTRS)
Smith, Stephen J.
2008-01-01
We have recently reported on a theoretical digital signal-processing algorithm for improved energy and position resolution in position-sensitive, transition-edge sensor (POST) X-ray detectors [Smith et al., Nucl, lnstr and Meth. A 556 (2006) 2371. PoST's consists of one or more transition-edge sensors (TES's) on a large continuous or pixellated X-ray absorber and are under development as an alternative to arrays of single pixel TES's. PoST's provide a means to increase the field-of-view for the fewest number of read-out channels. In this contribution we extend the theoretical correlated energy position optimal filter (CEPOF) algorithm (originally developed for 2-TES continuous absorber PoST's) to investigate the practical implementation on multi-pixel single TES PoST's or Hydras. We use numerically simulated data for a nine absorber device, which includes realistic detector noise, to demonstrate an iterative scheme that enables convergence on the correct photon absorption position and energy without any a priori assumptions. The position sensitivity of the CEPOF implemented on simulated data agrees very well with the theoretically predicted resolution. We discuss practical issues such as the impact of random arrival phase of the measured data on the performance of the CEPOF. The CEPOF algorithm demonstrates that full-width-at- half-maximum energy resolution of < 8 eV coupled with position-sensitivity down to a few 100 eV should be achievable for a fully optimized device.
Supercomputer implementation of finite element algorithms for high speed compressible flows
NASA Technical Reports Server (NTRS)
Thornton, E. A.; Ramakrishnan, R.
1986-01-01
Prediction of compressible flow phenomena using the finite element method is of recent origin and considerable interest. Two shock capturing finite element formulations for high speed compressible flows are described. A Taylor-Galerkin formulation uses a Taylor series expansion in time coupled with a Galerkin weighted residual statement. The Taylor-Galerkin algorithms use explicit artificial dissipation, and the performance of three dissipation models are compared. A Petrov-Galerkin algorithm has as its basis the concepts of streamline upwinding. Vectorization strategies are developed to implement the finite element formulations on the NASA Langley VPS-32. The vectorization scheme results in finite element programs that use vectors of length of the order of the number of nodes or elements. The use of the vectorization procedure speeds up processing rates by over two orders of magnitude. The Taylor-Galerkin and Petrov-Galerkin algorithms are evaluated for 2D inviscid flows on criteria such as solution accuracy, shock resolution, computational speed and storage requirements. The convergence rates for both algorithms are enhanced by local time-stepping schemes. Extension of the vectorization procedure for predicting 2D viscous and 3D inviscid flows are demonstrated. Conclusions are drawn regarding the applicability of the finite element procedures for realistic problems that require hundreds of thousands of nodes.
Movie approximation technique for the implementation of fast bandwidth-smoothing algorithms
NASA Astrophysics Data System (ADS)
Feng, Wu-chi; Lam, Chi C.; Liu, Ming
1997-12-01
Bandwidth smoothing algorithms can effectively reduce the network resource requirements for the delivery of compressed video streams. For stored video, a large number of bandwidth smoothing algorithms have been introduced that are optimal under certain constraints but require access to all the frame size data in order to achieve their optimal properties. This constraint, however, can be both resource and computationally expensive, especially for moderately priced set-top-boxes. In this paper, we introduce a movie approximation technique for the representation of the frame sizes of a video, reducing the complexity of the bandwidth smoothing algorithms and the amount of frame data that must be transmitted prior to the start of playback. Our results show that the proposed approximation technique can accurately approximate the frame data with a small number of piece-wise linear segments without affecting the performance measures that the bandwidth soothing algorithms are attempting to achieve by more than 1%. In addition, we show that implementations of this technique can speed up execution times by 100 to 400 times, allowing the bandwidth plan calculation times to be reduced to tens of milliseconds. Evaluation using a compressed full-length motion-JPEG video is provided.
Implementation of parallel computational algorithms on a modified CORDIC arithmetic logic
Naseem, A.
1984-01-01
CORDIC (COordinate Rotation Digital Computer) is a powerful technique for evaluating trigonometric, hyperbolic, exponential and logarithmic functions and for performing a variety of plane coordinate transformations. Furthermore, the algorithm is also suitable for other computations such as multiplication, division, and the conversion between binary and mixed-radix number systems. The basis for the algorithm is coordinate rotation in a linear, circular, or hyperbolic coordinate system depending on which function is to be calculated. The algorithm involves iterative procedures that require only additions, shift operations, and recall of prestored constants. However, the iterative nature of the algorithm dictates hardware implementations that are highly sequential in nature, resulting in slow speed of processing. The growing need for processing at high speed has resulted in a constant push for the development of faster computing structures balanced against the constraint to minimize computational complexity. With the advent of VLSI, many processing elements can now be realized on a single chip, and a large collection of processors have therefore become economically feasible. So, with this possibility in mind, the CORDIC iteration equations were modified in order to eliminate their sequential nature and to incorporate more parallelism.
Fujita, Naotaka; Yoshihisa, Tomoki; Tsukamoto, Masahiko
2014-01-01
In recent years, sensors become popular and Home Energy Management System (HEMS) takes an important role in saving energy without decrease in QoL (Quality of Life). Currently, many rule-based HEMSs have been proposed and almost all of them assume “IF-THEN” rules. The Rete algorithm is a typical pattern matching algorithm for IF-THEN rules. Currently, we have proposed a rule-based Home Energy Management System (HEMS) using the Rete algorithm. In the proposed system, rules for managing energy are processed by smart taps in network, and the loads for processing rules and collecting data are distributed to smart taps. In addition, the number of processes and collecting data are reduced by processing rules based on the Rete algorithm. In this paper, we evaluated the proposed system by simulation. In the simulation environment, rules are processed by a smart tap that relates to the action part of each rule. In addition, we implemented the proposed system as HEMS using smart taps. PMID:25136672
Next Generation Aura-OMI SO2 Retrieval Algorithm: Introduction and Implementation Status
NASA Technical Reports Server (NTRS)
Li, Can; Joiner, Joanna; Krotkov, Nickolay A.; Bhartia, Pawan K.
2014-01-01
We introduce our next generation algorithm to retrieve SO2 using radiance measurements from the Aura Ozone Monitoring Instrument (OMI). We employ a principal component analysis technique to analyze OMI radiance spectral in 310.5-340 nm acquired over regions with no significant SO2. The resulting principal components (PCs) capture radiance variability caused by both physical processes (e.g., Rayleigh and Raman scattering, and ozone absorption) and measurement artifacts, enabling us to account for these various interferences in SO2 retrievals. By fitting these PCs along with SO2 Jacobians calculated with a radiative transfer model to OMI-measured radiance spectra, we directly estimate SO2 vertical column density in one step. As compared with the previous generation operational OMSO2 PBL (Planetary Boundary Layer) SO2 product, our new algorithm greatly reduces unphysical biases and decreases the noise by a factor of two, providing greater sensitivity to anthropogenic emissions. The new algorithm is fast, eliminates the need for instrument-specific radiance correction schemes, and can be easily adapted to other sensors. These attributes make it a promising technique for producing long-term, consistent SO2 records for air quality and climate research. We have operationally implemented this new algorithm on OMI SIPS for producing the new generation standard OMI SO2 products.
Kawakami, Tomoya; Fujita, Naotaka; Yoshihisa, Tomoki; Tsukamoto, Masahiko
2014-01-01
In recent years, sensors become popular and Home Energy Management System (HEMS) takes an important role in saving energy without decrease in QoL (Quality of Life). Currently, many rule-based HEMSs have been proposed and almost all of them assume "IF-THEN" rules. The Rete algorithm is a typical pattern matching algorithm for IF-THEN rules. Currently, we have proposed a rule-based Home Energy Management System (HEMS) using the Rete algorithm. In the proposed system, rules for managing energy are processed by smart taps in network, and the loads for processing rules and collecting data are distributed to smart taps. In addition, the number of processes and collecting data are reduced by processing rules based on the Rete algorithm. In this paper, we evaluated the proposed system by simulation. In the simulation environment, rules are processed by a smart tap that relates to the action part of each rule. In addition, we implemented the proposed system as HEMS using smart taps. PMID:25136672
SU-E-T-67: Clinical Implementation and Evaluation of the Acuros Dose Calculation Algorithm
Yan, C; Combine, T; Dickens, K; Wynn, R; Pavord, D; Huq, M
2014-06-01
Purpose: The main aim of the current study is to present a detailed description of the implementation of the Acuros XB Dose Calculation Algorithm, and subsequently evaluate its clinical impacts by comparing it with AAA algorithm. Methods: The source models for both Acuros XB and AAA were configured by importing the same measured beam data into Eclipse treatment planning system. Both algorithms were evaluated by comparing calculated dose with measured dose on a homogeneous water phantom for field sizes ranging from 6cm × 6cm to 40cm × 40cm. Central axis and off-axis points with different depths were chosen for the comparison. Similarly, wedge fields with wedge angles from 15 to 60 degree were used. In addition, variable field sizes for a heterogeneous phantom were used to evaluate the Acuros algorithm. Finally, both Acuros and AAA were tested on VMAT patient plans for various sites. Does distributions and calculation time were compared. Results: On average, computation time is reduced by at least 50% by Acuros XB compared with AAA on single fields and VMAT plans. When used for open 6MV photon beams on homogeneous water phantom, both Acuros XB and AAA calculated doses were within 1% of measurement. For 23 MV photon beams, the calculated doses were within 1.5% of measured doses for Acuros XB and 2% for AAA. When heterogeneous phantom was used, Acuros XB also improved on accuracy. Conclusion: Compared with AAA, Acuros XB can improve accuracy while significantly reduce computation time for VMAT plans.
The mGA1.0: A common LISP implementation of a messy genetic algorithm
NASA Technical Reports Server (NTRS)
Goldberg, David E.; Kerzic, Travis
1990-01-01
Genetic algorithms (GAs) are finding increased application in difficult search, optimization, and machine learning problems in science and engineering. Increasing demands are being placed on algorithm performance, and the remaining challenges of genetic algorithm theory and practice are becoming increasingly unavoidable. Perhaps the most difficult of these challenges is the so-called linkage problem. Messy GAs were created to overcome the linkage problem of simple genetic algorithms by combining variable-length strings, gene expression, messy operators, and a nonhomogeneous phasing of evolutionary processing. Results on a number of difficult deceptive test functions are encouraging with the mGA always finding global optima in a polynomial number of function evaluations. Theoretical and empirical studies are continuing, and a first version of a messy GA is ready for testing by others. A Common LISP implementation called mGA1.0 is documented and related to the basic principles and operators developed by Goldberg et. al. (1989, 1990). Although the code was prepared with care, it is not a general-purpose code, only a research version. Important data structures and global variations are described. Thereafter brief function descriptions are given, and sample input data are presented together with sample program output. A source listing with comments is also included.
A Survey on GPU-Based Implementation of Swarm Intelligence Algorithms.
Tan, Ying; Ding, Ke
2016-09-01
Inspired by the collective behavior of natural swarm, swarm intelligence algorithms (SIAs) have been developed and widely used for solving optimization problems. When applied to complex problems, a large number of fitness function evaluations are needed to obtain an acceptable solution. To tackle this vital issue, graphical processing units (GPUs) have been used to accelerate the optimization procedure of SIAs. Thanks to their inherent parallelism, SIAs are very suitable for parallel implementation under the GPU platform which have achieved a great success in recent years. This paper presents a comprehensive review of GPU-based parallel SIAs in accordance with a newly proposed taxonomy. Critical concerns for the efficient parallel implementation of SIAs are also described in detail. Moreover, novel criteria are also proposed to evaluate and compare the parallel implementation and algorithm performance universally. The rationality and practicability of the proposed optimization methodology and criteria are verified by careful case study. Finally, our opinions and perspectives on the trends and prospects on the relatively new research domain are also presented for future development. PMID:26571543
Implementation of a cone-beam backprojection algorithm on the cell broadband engine processor
NASA Astrophysics Data System (ADS)
Bockenbach, Olivier; Knaup, Michael; Kachelrieß, Marc
2007-03-01
Tomographic image reconstruction is computationally very demanding. In all cases the backprojection represents the performance bottleneck due to the high operational count and due to the high demand put on the memory subsystem. In the past, solving this problem has lead to the implementation of specific architectures, connecting Application Specific Integrated Circuits (ASICs) or Field Programmable Gate Arrays (FPGAs) to memory through dedicated high speed busses. More recently, there have also been attempt to use Graphic Processing Units (GPUs) to perform the backprojection step. Originally aimed at the gaming market, IBM, Toshiba and Sony have introduced the Cell Broadband Engine (CBE) processor, often considered as a multicomputer on a chip. Clocked at 3 GHz, the Cell allows for a theoretical performance of 192 GFlops and a peak data transfer rate over the internal bus of 200 GB/s. This performance indeed makes the Cell a very attractive architecture for implementing tomographic image reconstruction algorithms. In this study, we investigate the relative performance of a perspective backprojection algorithm when implemented on a standard PC and on the Cell processor. We compare these results to the performance achievable with FPGAs based boards and high end GPUs. The cone-beam backprojection performance was assessed by backprojecting a full circle scan of 512 projections of 1024x1024 pixels into a volume of size 512x512x512 voxels. It took 3.2 minutes on the PC (single CPU) and is as fast as 13.6 seconds on the Cell.
Zhou, Ning; Huang, Zhenyu; Tuffner, Francis K.; Jin, Shuangshuang; Lin, Jenglung; Hauer, Matthew L.
2010-02-28
Small signal stability problems are one of the major threats to grid stability and reliability. Prony analysis has been successfully applied on ringdown data to monitor electromechanical modes of a power system using phasor measurement unit (PMU) data. To facilitate an on-line application of mode estimation, this paper develops a recursive algorithm for implementing Prony analysis and proposed an oscillation detection method to detect ringdown data in real time. By automatically detecting ringdown data, the proposed method helps guarantee that Prony analysis is applied properly and timely on the ringdown data. Thus, the mode estimation results can be performed reliably and timely. The proposed method is tested using Monte Carlo simulations based on a 17-machine model and is shown to be able to properly identify the oscillation data for on-line application of Prony analysis. In addition, the proposed method is applied to field measurement data from WECC to show the performance of the proposed algorithm.
NASA Astrophysics Data System (ADS)
Bhole, Gaurav; Anjusha, V. S.; Mahesh, T. S.
2016-04-01
A robust control over quantum dynamics is of paramount importance for quantum technologies. Many of the existing control techniques are based on smooth Hamiltonian modulations involving repeated calculations of basic unitaries resulting in time complexities scaling rapidly with the length of the control sequence. Here we show that bang-bang controls need one-time calculation of basic unitaries and hence scale much more efficiently. By employing a global optimization routine such as the genetic algorithm, it is possible to synthesize not only highly intricate unitaries, but also certain nonunitary operations. We demonstrate the unitary control through the implementation of the optimal fixed-point quantum search algorithm in a three-qubit nuclear magnetic resonance (NMR) system. Moreover, by combining the bang-bang pulses with the crusher gradients, we also demonstrate nonunitary transformations of thermal equilibrium states into effective pure states in three- as well as five-qubit NMR systems.
Barazzetti, Luigi; Scaioni, Marco
2010-01-01
This paper presents the development and implementation of three image-based methods used to detect and measure the displacements of a vast number of points in the case of laboratory testing on construction materials. Starting from the needs of structural engineers, three ad hoc tools for crack measurement in fibre-reinforced specimens and 2D or 3D deformation analysis through digital images were implemented and tested. These tools make use of advanced image processing algorithms and can integrate or even substitute some traditional sensors employed today in most laboratories. In addition, the automation provided by the implemented software, the limited cost of the instruments and the possibility to operate with an indefinite number of points offer new and more extensive analysis in the field of material testing. Several comparisons with other traditional sensors widely adopted inside most laboratories were carried out in order to demonstrate the accuracy of the implemented software. Implementation details, simulations and real applications are reported and discussed in this paper. PMID:22163612
Morita, Kenji; Jitsev, Jenia; Morrison, Abigail
2016-09-15
Value-based action selection has been suggested to be realized in the corticostriatal local circuits through competition among neural populations. In this article, we review theoretical and experimental studies that have constructed and verified this notion, and provide new perspectives on how the local-circuit selection mechanisms implement reinforcement learning (RL) algorithms and computations beyond them. The striatal neurons are mostly inhibitory, and lateral inhibition among them has been classically proposed to realize "Winner-Take-All (WTA)" selection of the maximum-valued action (i.e., 'max' operation). Although this view has been challenged by the revealed weakness, sparseness, and asymmetry of lateral inhibition, which suggest more complex dynamics, WTA-like competition could still occur on short time scales. Unlike the striatal circuit, the cortical circuit contains recurrent excitation, which may enable retention or temporal integration of information and probabilistic "soft-max" selection. The striatal "max" circuit and the cortical "soft-max" circuit might co-implement an RL algorithm called Q-learning; the cortical circuit might also similarly serve for other algorithms such as SARSA. In these implementations, the cortical circuit presumably sustains activity representing the executed action, which negatively impacts dopamine neurons so that they can calculate reward-prediction-error. Regarding the suggested more complex dynamics of striatal, as well as cortical, circuits on long time scales, which could be viewed as a sequence of short WTA fragments, computational roles remain open: such a sequence might represent (1) sequential state-action-state transitions, constituting replay or simulation of the internal model, (2) a single state/action by the whole trajectory, or (3) probabilistic sampling of state/action. PMID:27173430
Parallel implementation of the time-evolving block decimation algorithm for the Bose-Hubbard model
NASA Astrophysics Data System (ADS)
Urbanek, Miroslav; Soldán, Pavel
2016-02-01
A system of ultracold atoms in an optical lattice represents a powerful experimental setup for testing the fundamentals of quantum mechanics. While its microscopic interaction mechanisms are well understood, the system behavior for a moderate number of particles is difficult to simulate due to a high dimension of its many-body space. This article presents TEBDOL, a parallel implementation of the time-evolving block decimation (TEBD) algorithm that can efficiently simulate time evolution of a one-dimensional chain of atoms in optical lattices. We investigate the parallelization strategy and the strong and weak scaling with the number of processes.
Implementation of a new iterative learning control algorithm on real data.
Zamanian, Hamed; Koohi, Ardavan
2016-02-01
In this paper, a newly presented approach is proposed for closed-loop automatic tuning of a proportional integral derivative (PID) controller based on iterative learning control (ILC) algorithm. A modified ILC scheme iteratively changes the control signal by adjusting it. Once a satisfactory performance is achieved, a linear compensator is identified in the ILC behavior using casual relationship between the closed loop signals. This compensator is approximated by a PD controller which is used to tune the original PID controller. Results of implementing this approach presented on the experimental data of Damavand tokamak and are consistent with simulation outcome. PMID:26931852
Implementation of a new iterative learning control algorithm on real data
NASA Astrophysics Data System (ADS)
Zamanian, Hamed; Koohi, Ardavan
2016-02-01
In this paper, a newly presented approach is proposed for closed-loop automatic tuning of a proportional integral derivative (PID) controller based on iterative learning control (ILC) algorithm. A modified ILC scheme iteratively changes the control signal by adjusting it. Once a satisfactory performance is achieved, a linear compensator is identified in the ILC behavior using casual relationship between the closed loop signals. This compensator is approximated by a PD controller which is used to tune the original PID controller. Results of implementing this approach presented on the experimental data of Damavand tokamak and are consistent with simulation outcome.
AVR microcontroller simulator for software implemented hardware fault tolerance algorithms research
NASA Astrophysics Data System (ADS)
Piotrowski, Adam; Tarnowski, Szymon; Napieralski, Andrzej
2008-01-01
Reliability of new, advanced electronic systems becomes a serious problem especially in places like accelerators and synchrotrons, where sophisticated digital devices operate closely to radiation sources. One of the possible solutions to harden the microprocessor-based system is a strict programming approach known as the Software Implemented Hardware Fault Tolerance. Unfortunately, in real environments it is not possible to perform precise and accurate tests of the new algorithms due to hardware limitation. This paper highlights the AVR-family microcontroller simulator project equipped with an appropriate monitoring and the SEU injection systems.
Implementation of a quantum adiabatic algorithm for factorization on two qudits
Zobov, V. E. Ermilov, A. S.
2012-06-15
Implementation of an adiabatic quantum algorithm for factorization on two qudits with the number of levels d{sub 1} and d{sub 2} is considered. A method is proposed for obtaining a time-dependent effective Hamiltonian by means of a sequence of rotation operators that are selective with respect to the transitions between neighboring levels of a qudit. A sequence of RF magnetic field pulses is obtained, and a factorization of the numbers 35, 21, and 15 is numerically simulated on two quadrupole nuclei with spins 3/2 (d{sub 1} = 4) and 1 (d{sub 2} = 3).
Implementation of QR-decomposition based on CORDIC for unitary MUSIC algorithm
NASA Astrophysics Data System (ADS)
Lounici, Merwan; Luan, Xiaoming; Saadi, Wahab
2013-07-01
The DOA (Direction Of Arrival) estimation with subspace methods such as MUSIC (MUltiple SIgnal Classification) and ESPRIT (Estimation of Signal Parameters via Rotational Invariance Technique) is based on an accurate estimation of the eigenvalues and eigenvectors of covariance matrix. QR decomposition is implemented with the Coordinate Rotation DIgital Computer (CORDIC) algorithm. QRD requires only additions and shifts [6], so it is faster and more regular than other methods. In this article the hardware architecture of an EVD (Eigen Value Decomposition) processor based on TSA (triangular systolic array) for QR decomposition is proposed. Using Xilinx System Generator (XSG), the design is implemented and the estimated logic device resource values are presented for different matrix sizes.
NASA Technical Reports Server (NTRS)
Doxley, Charles A.
2016-01-01
In the current world of applications that use reconfigurable technology implemented on field programmable gate arrays (FPGAs), there is a need for flexible architectures that can grow as the systems evolve. A project has limited resources and a fixed set of requirements that development efforts are tasked to meet. Designers must develop robust solutions that practically meet the current customer demands and also have the ability to grow for future performance. This paper describes the development of a high speed serial data streaming algorithm that allows for transmission of multiple data channels over a single serial link. The technique has the ability to change to meet new applications developed for future design considerations. This approach uses the Xilinx Serial RapidIO LOGICORE Solution to implement a flexible infrastructure to meet the current project requirements with the ability to adapt future system designs.
Implementation of a Point Algorithm for Real-Time Convex Optimization
NASA Technical Reports Server (NTRS)
Acikmese, Behcet; Motaghedi, Shui; Carson, John
2007-01-01
The primal-dual interior-point algorithm implemented in G-OPT is a relatively new and efficient way of solving convex optimization problems. Given a prescribed level of accuracy, the convergence to the optimal solution is guaranteed in a predetermined, finite number of iterations. G-OPT Version 1.0 is a flight software implementation written in C. Onboard application of the software enables autonomous, real-time guidance and control that explicitly incorporates mission constraints such as control authority (e.g. maximum thrust limits), hazard avoidance, and fuel limitations. This software can be used in planetary landing missions (Mars pinpoint landing and lunar landing), as well as in proximity operations around small celestial bodies (moons, asteroids, and comets). It also can be used in any spacecraft mission for thrust allocation in six-degrees-of-freedom control.
Multi-GPU implementation of a VMAT treatment plan optimization algorithm
Tian, Zhen E-mail: Xun.Jia@UTSouthwestern.edu Folkerts, Michael; Tan, Jun; Jia, Xun E-mail: Xun.Jia@UTSouthwestern.edu Jiang, Steve B. E-mail: Xun.Jia@UTSouthwestern.edu; Peng, Fei
2015-06-15
Purpose: Volumetric modulated arc therapy (VMAT) optimization is a computationally challenging problem due to its large data size, high degrees of freedom, and many hardware constraints. High-performance graphics processing units (GPUs) have been used to speed up the computations. However, GPU’s relatively small memory size cannot handle cases with a large dose-deposition coefficient (DDC) matrix in cases of, e.g., those with a large target size, multiple targets, multiple arcs, and/or small beamlet size. The main purpose of this paper is to report an implementation of a column-generation-based VMAT algorithm, previously developed in the authors’ group, on a multi-GPU platform to solve the memory limitation problem. While the column-generation-based VMAT algorithm has been previously developed, the GPU implementation details have not been reported. Hence, another purpose is to present detailed techniques employed for GPU implementation. The authors also would like to utilize this particular problem as an example problem to study the feasibility of using a multi-GPU platform to solve large-scale problems in medical physics. Methods: The column-generation approach generates VMAT apertures sequentially by solving a pricing problem (PP) and a master problem (MP) iteratively. In the authors’ method, the sparse DDC matrix is first stored on a CPU in coordinate list format (COO). On the GPU side, this matrix is split into four submatrices according to beam angles, which are stored on four GPUs in compressed sparse row format. Computation of beamlet price, the first step in PP, is accomplished using multi-GPUs. A fast inter-GPU data transfer scheme is accomplished using peer-to-peer access. The remaining steps of PP and MP problems are implemented on CPU or a single GPU due to their modest problem scale and computational loads. Barzilai and Borwein algorithm with a subspace step scheme is adopted here to solve the MP problem. A head and neck (H and N) cancer case is
The Sustainable Technology Division has recently completed an implementation of the U.S. EPA's Waste Reduction (WAR) Algorithm that can be directly accessed from a Cape-Open compliant process modeling environment. The WAR Algorithm add-in can be used in AmsterChem's COFE (Cape-Op...
Implementation of the FDK algorithm for cone-beam CT on the cell broadband engine architecture
NASA Astrophysics Data System (ADS)
Scherl, Holger; Koerner, Mario; Hofmann, Hannes; Eckert, Wieland; Kowarschik, Markus; Hornegger, Joachim
2007-03-01
In most of today's commercially available cone-beam CT scanners, the well known FDK method is used for solving the 3D reconstruction task. The computational complexity of this algorithm prohibits its use for many medical applications without hardware acceleration. The brand-new Cell Broadband Engine Architecture (CBEA) with its high level of parallelism is a cost-efficient processor for performing the FDK reconstruction according to the medical requirements. The programming scheme, however, is quite different to any standard personal computer hardware. In this paper, we present an innovative implementation of the most time-consuming parts of the FDK algorithm: filtering and back-projection. We also explain the required transformations to parallelize the algorithm for the CBEA. Our software framework allows to compute the filtering and back-projection in parallel, making it possible to do an on-the-fly-reconstruction. The achieved results demonstrate that a complete FDK reconstruction is computed with the CBEA in less than seven seconds for a standard clinical scenario. Given the fact that scan times are usually much higher, we conclude that reconstruction is finished right after the end of data acquisition. This enables us to present the reconstructed volume to the physician in real-time, immediately after the last projection image has been acquired by the scanning device.
Implementing Quantum Algorithms with Modular Gates in a Trapped Ion Chain
NASA Astrophysics Data System (ADS)
Figgatt, Caroline; Debnath, Shantanu; Linke, Norbert; Landsman, Kevin; Wright, Ken; Monroe, Chris
2016-05-01
We present experimental results on quantum algorithms performed using fully modular one- and two-qubit gates in a linear chain of 5 Yb + ions. This is accomplished through arbitrary qubit addressing and manipulation from stimulated Raman transitions driven by a beat note between counter-propagating beams from a pulsed laser. The Raman beam pairs consist of one global beam and a set of counter-propagating individual addressing beams, one for each ion. This provides arbitrary single-qubit rotations as well as arbitrary selection of ion pairs for a fully-connected system of two-qubit modular XX-entangling gates implemented using a pulse-segmentation scheme. We execute controlled-NOT gates with an average fidelity of 97.0% for all 10 possible pairs. Programming arbitrary sequences of gates allows us to construct any quantum algorithm, making this system a universal quantum computer. As an example, we present experimental results for the Bernstein-Vazirani algorithm using 4 control qubits and 1 ancilla, performed with concatenated gates that can be reconfigured to construct all 16 possible oracles, and obtain a process fidelity of 90.3%. This work is supported by the ARO with funding from the IARPA MQCO program and the AFOSR MURI on Quantum Measurement and Verification.
Parallel Implementation and Scaling of an Adaptive Mesh Discrete Ordinates Algorithm for Transport
Howell, L H
2004-11-29
Block-structured adaptive mesh refinement (AMR) uses a mesh structure built up out of locally-uniform rectangular grids. In the BoxLib parallel framework used by the Raptor code, each processor operates on one or more of these grids at each refinement level. The decomposition of the mesh into grids and the distribution of these grids among processors may change every few timesteps as a calculation proceeds. Finer grids use smaller timesteps than coarser grids, requiring additional work to keep the system synchronized and ensure conservation between different refinement levels. In a paper for NECDC 2002 I presented preliminary results on implementation of parallel transport sweeps on the AMR mesh, conjugate gradient acceleration, accuracy of the AMR solution, and scalar speedup of the AMR algorithm compared to a uniform fully-refined mesh. This paper continues with a more in-depth examination of the parallel scaling properties of the scheme, both in single-level and multi-level calculations. Both sweeping and setup costs are considered. The algorithm scales with acceptable performance to several hundred processors. Trends suggest, however, that this is the limit for efficient calculations with traditional transport sweeps, and that modifications to the sweep algorithm will be increasingly needed as job sizes in the thousands of processors become common.
NASA Astrophysics Data System (ADS)
Falivene, Oriol; Cabello, Patricia; Arbués, Pau; Muñoz, Josep Anton; Cabrera, Lluís
2009-08-01
Valid representations of geological heterogeneity are fundamental inputs for quantitative models used in managing subsurface activities. Consequently, the simulation of realistic facies distributions is a significant aim. Realistic facies distributions are typically obtained by pixel-based, object-based or process-based methods. This work presents a pixel-based geostatistical algorithm suitable for reproducing lateral gradual facies transitions (LGFT) between two adjacent sedimentary bodies. Lateral contact (i.e. interfingering) between distinct depositional facies is a widespread geometric relationship that occurs at different scales in any depositional system. The algorithm is based on the truncation of the sum of a linear expectation trend and a random Gaussian field, and can be conditioned to well data. The implementation introduced herein also includes subroutines to clean and geometrically characterize the obtained LGFT. The cleaned sedimentary body transition provides a more appropriate and realistic facies distribution for some depositional settings. The geometric measures of the LGFT yield an intuitive measure of the morphology of the sedimentary body boundary, which can be compared to analogue data. An example of a LGFT obtained by the algorithm presented herein is also flow simulated, quantitatively demonstrating the importance of realistically reproducing them in subsurface models, if further flow-related accurate predictions are to be made.
Santi, Peter Angelo; Cutler, Theresa Elizabeth; Favalli, Andrea; Koehler, Katrina Elizabeth; Henzl, Vladimir; Henzlova, Daniela; Parker, Robert Francis; Croft, Stephen
2015-12-01
In order to improve the accuracy and capabilities of neutron multiplicity counting, additional quantifiable information is needed in order to address the assumptions that are present in the point model. Extracting and utilizing higher order moments (Quads and Pents) from the neutron pulse train represents the most direct way of extracting additional information from the measurement data to allow for an improved determination of the physical properties of the item of interest. The extraction of higher order moments from a neutron pulse train required the development of advanced dead time correction algorithms which could correct for dead time effects in all of the measurement moments in a self-consistent manner. In addition, advanced analysis algorithms have been developed to address specific assumptions that are made within the current analysis model, namely that all neutrons are created at a single point within the item of interest, and that all neutrons that are produced within an item are created with the same energy distribution. This report will discuss the current status of implementation and initial testing of the advanced dead time correction and analysis algorithms that have been developed in an attempt to utilize higher order moments to improve the capabilities of correlated neutron measurement techniques.
A Dynamic Era-Based Time-Symmetric Block Time-Step Algorithm with Parallel Implementations
NASA Astrophysics Data System (ADS)
Kaplan, Murat; Saygin, Hasan
2012-06-01
The time-symmetric block time-step (TSBTS) algorithm is a newly developed efficient scheme for N-body integrations. It is constructed on an era-based iteration. In this work, we re-designed the TSBTS integration scheme with a dynamically changing era size. A number of numerical tests were performed to show the importance of choosing the size of the era, especially for long-time integrations. Our second aim was to show that the TSBTS scheme is as suitable as previously known schemes for developing parallel N-body codes. In this work, we relied on a parallel scheme using the copy algorithm for the time-symmetric scheme. We implemented a hybrid of data and task parallelization for force calculation to handle load balancing problems that can appear in practice. Using the Plummer model initial conditions for different numbers of particles, we obtained the expected efficiency and speedup for a small number of particles. Although parallelization of the direct N-body codes is negatively affected by the communication/calculation ratios, we obtained good load-balanced results. Moreover, we were able to conserve the advantages of the algorithm (e.g., energy conservation for long-term simulations).
McNunn, Gabriel S; Bryden, Kenneth M
2013-01-01
Tarjan's algorithm schedules the solution of systems of equations by noting the coupling and grouping between the equations. Simulating complex systems, e.g., advanced power plants, aerodynamic systems, or the multi-scale design of components, requires the linkage of large groups of coupled models. Currently, this is handled manually in systems modeling packages. That is, the analyst explicitly defines both the method and solution sequence necessary to couple the models. In small systems of models and equations this works well. However, as additional detail is needed across systems and across scales, the number of models grows rapidly. This precludes the manual assembly of large systems of federated models, particularly in systems composed of high fidelity models. This paper examines extending Tarjan's algorithm from sets of equations to sets of models. The proposed implementation of the algorithm is demonstrated using a small one-dimensional system of federated models representing the heat transfer and thermal stress in a gas turbine blade with thermal barrier coating. Enabling the rapid assembly and substitution of different models permits the rapid turnaround needed to support the “what-if” kinds of questions that arise in engineering design.
NASA Astrophysics Data System (ADS)
Sham Kumar, S.; C. V., Sreehari; Vivek, K.; T. V., Praveen; Moosad, K. P. B.; Rajesh, R.
2015-06-01
This paper discusses the detailed experimental investigations on the performance of interferometer based fiber optic hydrophones with different Phase Generated Carrier (PGC) demodulation algorithms for their interrogation. The study covers the effect on different parametric variations in the PGC implementations by comparison through Signal to Noise And Distortion (SINAD) and Total Harmonic Distortion (THD) analysis. This paper discusses experiments on most popular algorithms based on PGC like Arctangent, Differential Cross Multiplication (DCM) and Ameliorated PGC. A Distributed Feed-Back Fiber Lasers (DFB-FL) based fiber optic hydrophone, with Mach-Zehnder Interferometer having active phase modulator in reference arm and mechanism to cater polarization related intensity fading were used for the experiments. Experiments were carried out to study the effects of various parameters like the type and configuration of low pass filter, frequency of the modulation signal, frequency of acoustic signal etc. It is observed that all the three factors viz. the type of low pass filter, frequency of modulating and acoustic signal plays important role in retrieving the acoustic signal, based on the type of algorithms used and are discussed here.
Pre-Hardware Optimization and Implementation Of Fast Optics Closed Control Loop Algorithms
NASA Technical Reports Server (NTRS)
Kizhner, Semion; Lyon, Richard G.; Herman, Jay R.; Abuhassan, Nader
2004-01-01
One of the main heritage tools used in scientific and engineering data spectrum analysis is the Fourier Integral Transform and its high performance digital equivalent - the Fast Fourier Transform (FFT). The FFT is particularly useful in two-dimensional (2-D) image processing (FFT2) within optical systems control. However, timing constraints of a fast optics closed control loop would require a supercomputer to run the software implementation of the FFT2 and its inverse, as well as other image processing representative algorithm, such as numerical image folding and fringe feature extraction. A laboratory supercomputer is not always available even for ground operations and is not feasible for a night project. However, the computationally intensive algorithms still warrant alternative implementation using reconfigurable computing technologies (RC) such as Digital Signal Processors (DSP) and Field Programmable Gate Arrays (FPGA), which provide low cost compact super-computing capabilities. We present a new RC hardware implementation and utilization architecture that significantly reduces the computational complexity of a few basic image-processing algorithm, such as FFT2, image folding and phase diversity for the NASA Solar Viewing Interferometer Prototype (SVIP) using a cluster of DSPs and FPGAs. The DSP cluster utilization architecture also assures avoidance of a single point of failure, while using commercially available hardware. This, combined with the control algorithms pre-hardware optimization, or the first time allows construction of image-based 800 Hertz (Hz) optics closed control loops on-board a spacecraft, based on the SVIP ground instrument. That spacecraft is the proposed Earth Atmosphere Solar Occultation Imager (EASI) to study greenhouse gases CO2, C2H, H2O, O3, O2, N2O from Lagrange-2 point in space. This paper provides an advanced insight into a new type of science capabilities for future space exploration missions based on on-board image processing
NASA Technical Reports Server (NTRS)
Kizhner, Semion; Flatley, Thomas P.; Hestnes, Phyllis; Jentoft-Nilsen, Marit; Petrick, David J.; Day, John H. (Technical Monitor)
2001-01-01
Spacecraft telemetry rates have steadily increased over the last decade presenting a problem for real-time processing by ground facilities. This paper proposes a solution to a related problem for the Geostationary Operational Environmental Spacecraft (GOES-8) image processing application. Although large super-computer facilities are the obvious heritage solution, they are very costly, making it imperative to seek a feasible alternative engineering solution at a fraction of the cost. The solution is based on a Personal Computer (PC) platform and synergy of optimized software algorithms and re-configurable computing hardware technologies, such as Field Programmable Gate Arrays (FPGA) and Digital Signal Processing (DSP). It has been shown in [1] and [2] that this configuration can provide superior inexpensive performance for a chosen application on the ground station or on-board a spacecraft. However, since this technology is still maturing, intensive pre-hardware steps are necessary to achieve the benefits of hardware implementation. This paper describes these steps for the GOES-8 application, a software project developed using Interactive Data Language (IDL) (Trademark of Research Systems, Inc.) on a Workstation/UNIX platform. The solution involves converting the application to a PC/Windows/RC platform, selected mainly by the availability of low cost, adaptable high-speed RC hardware. In order for the hybrid system to run, the IDL software was modified to account for platform differences. It was interesting to examine the gains and losses in performance on the new platform, as well as unexpected observations before implementing hardware. After substantial pre-hardware optimization steps, the necessity of hardware implementation for bottleneck code in the PC environment became evident and solvable beginning with the methodology described in [1], [2], and implementing a novel methodology for this specific application [6]. The PC-RC interface bandwidth problem for the
NASA Astrophysics Data System (ADS)
Paz, Abel; Plaza, Antonio
2010-08-01
Automatic target and anomaly detection are considered very important tasks for hyperspectral data exploitation. These techniques are now routinely applied in many application domains, including defence and intelligence, public safety, precision agriculture, geology, or forestry. Many of these applications require timely responses for swift decisions which depend upon high computing performance of algorithm analysis. However, with the recent explosion in the amount and dimensionality of hyperspectral imagery, this problem calls for the incorporation of parallel computing techniques. In the past, clusters of computers have offered an attractive solution for fast anomaly and target detection in hyperspectral data sets already transmitted to Earth. However, these systems are expensive and difficult to adapt to on-board data processing scenarios, in which low-weight and low-power integrated components are essential to reduce mission payload and obtain analysis results in (near) real-time, i.e., at the same time as the data is collected by the sensor. An exciting new development in the field of commodity computing is the emergence of commodity graphics processing units (GPUs), which can now bridge the gap towards on-board processing of remotely sensed hyperspectral data. In this paper, we describe several new GPU-based implementations of target and anomaly detection algorithms for hyperspectral data exploitation. The parallel algorithms are implemented on latest-generation Tesla C1060 GPU architectures, and quantitatively evaluated using hyperspectral data collected by NASA's AVIRIS system over the World Trade Center (WTC) in New York, five days after the terrorist attacks that collapsed the two main towers in the WTC complex.
A time-efficient algorithm for implementing the Catmull-Clark subdivision method
NASA Astrophysics Data System (ADS)
Ioannou, G.; Savva, A.; Stylianou, V.
2015-10-01
Splines are the most popular methods in Figure Modeling and CAGD (Computer Aided Geometric Design) in generating smooth surfaces from a number of control points. The control points define the shape of a figure and splines calculate the required number of points which when displayed on a computer screen the result is a smooth surface. However, spline methods are based on a rectangular topological structure of points, i.e., a two-dimensional table of vertices, and thus cannot generate complex figures, such as the human and animal bodies that their complex structure does not allow them to be defined by a regular rectangular grid. On the other hand surface subdivision methods, which are derived by splines, generate surfaces which are defined by an arbitrary topology of control points. This is the reason that during the last fifteen years subdivision methods have taken the lead over regular spline methods in all areas of modeling in both industry and research. The cost of executing computer software developed to read control points and calculate the surface is run-time, due to the fact that the surface-structure required for handling arbitrary topological grids is very complicate. There are many software programs that have been developed related to the implementation of subdivision surfaces however, not many algorithms are documented in the literature, to support developers for writing efficient code. This paper aims to assist programmers by presenting a time-efficient algorithm for implementing subdivision splines. The Catmull-Clark which is the most popular of the subdivision methods has been employed to illustrate the algorithm.
NASA Astrophysics Data System (ADS)
Bernabe, Sergio; Igual, Francisco D.; Botella, Guillermo; Prieto-Matias, Manuel; Plaza, Antonio
2015-10-01
In the last decade, the issue of endmember variability has received considerable attention, particularly when each pixel is modeled as a linear combination of endmembers or pure materials. As a result, several models and algorithms have been developed for considering the effect of endmember variability in spectral unmixing and possibly include multiple endmembers in the spectral unmixing stage. One of the most popular approach for this purpose is the multiple endmember spectral mixture analysis (MESMA) algorithm. The procedure executed by MESMA can be summarized as follows: (i) First, a standard linear spectral unmixing (LSU) or fully constrained linear spectral unmixing (FCLSU) algorithm is run in an iterative fashion; (ii) Then, we use different endmember combinations, randomly selected from a spectral library, to decompose each mixed pixel; (iii) Finally, the model with the best fit, i.e., with the lowest root mean square error (RMSE) in the reconstruction of the original pixel, is adopted. However, this procedure can be computationally very expensive due to the fact that several endmember combinations need to be tested and several abundance estimation steps need to be conducted, a fact that compromises the use of MESMA in applications under real-time constraints. In this paper we develop (for the first time in the literature) an efficient implementation of MESMA on different platforms using OpenCL, an open standard for parallel programing on heterogeneous systems. Our experiments have been conducted using a simulated data set and the clMAGMA mathematical library. This kind of implementations with the same descriptive language on different architectures are very important in order to actually calibrate the possibility of using heterogeneous platforms for efficient hyperspectral imaging processing in real remote sensing missions.
2014-01-01
Background The huge quantity of data produced in Biomedical research needs sophisticated algorithmic methodologies for its storage, analysis, and processing. High Performance Computing (HPC) appears as a magic bullet in this challenge. However, several hard to solve parallelization and load balancing problems arise in this context. Here we discuss the HPC-oriented implementation of a general purpose learning algorithm, originally conceived for DNA analysis and recently extended to treat uncertainty on data (U-BRAIN). The U-BRAIN algorithm is a learning algorithm that finds a Boolean formula in disjunctive normal form (DNF), of approximately minimum complexity, that is consistent with a set of data (instances) which may have missing bits. The conjunctive terms of the formula are computed in an iterative way by identifying, from the given data, a family of sets of conditions that must be satisfied by all the positive instances and violated by all the negative ones; such conditions allow the computation of a set of coefficients (relevances) for each attribute (literal), that form a probability distribution, allowing the selection of the term literals. The great versatility that characterizes it, makes U-BRAIN applicable in many of the fields in which there are data to be analyzed. However the memory and the execution time required by the running are of O(n3) and of O(n5) order, respectively, and so, the algorithm is unaffordable for huge data sets. Results We find mathematical and programming solutions able to lead us towards the implementation of the algorithm U-BRAIN on parallel computers. First we give a Dynamic Programming model of the U-BRAIN algorithm, then we minimize the representation of the relevances. When the data are of great size we are forced to use the mass memory, and depending on where the data are actually stored, the access times can be quite different. According to the evaluation of algorithmic efficiency based on the Disk Model, in order to
Dongarra, J.J.; Hewitt, T.
1985-08-01
This note describes some experiments on simple, dense linear algebra algorithms. These experiments show that the CRAY X-MP is capable of small-grain multitasking arising from standard implementations of LU and Cholesky decomposition. The implementation described here provides the ''fastest'' execution rate for LU decomposition, 718 MFLOPS for a matrix of order 1000.
NASA Astrophysics Data System (ADS)
Tang, Yu-Hang; Karniadakis, George; Crunch Team
2014-03-01
We present a scalable dissipative particle dynamics simulation code, fully implemented on the Graphics Processing Units (GPUs) using a hybrid CUDA/MPI programming model, which achieves 10-30 times speedup on a single GPU over 16 CPU cores and almost linear weak scaling across a thousand nodes. A unified framework is developed within which the efficient generation of the neighbor list and maintaining particle data locality are addressed. Our algorithm generates strictly ordered neighbor lists in parallel, while the construction is deterministic and makes no use of atomic operations or sorting. Such neighbor list leads to optimal data loading efficiency when combined with a two-level particle reordering scheme. A faster in situ generation scheme for Gaussian random numbers is proposed using precomputed binary signatures. We designed custom transcendental functions that are fast and accurate for evaluating the pairwise interaction. Computer benchmarks demonstrate the speedup of our implementation over the CPU implementation as well as strong and weak scalability. A large-scale simulation of spontaneous vesicle formation consisting of 128 million particles was conducted to illustrate the practicality of our code in real-world applications. This work was supported by the new Department of Energy Collaboratory on Mathematics for Mesoscopic Modeling of Materials (CM4). Simulations were carried out at the Oak Ridge Leadership Computing Facility through the INCITE program under project BIP017.
FPGA-based implementation for steganalysis: a JPEG-compatibility algorithm
NASA Astrophysics Data System (ADS)
Gutierrez-Fernandez, E.; Portela-García, M.; Lopez-Ongil, C.; Garcia-Valderas, M.
2013-05-01
Steganalysis is a process to detect hidden data in cover documents, like digital images, videos, audio files, etc. This is the inverse process of steganography, which is the used method to hide secret messages. The widely use of computers and network technologies make digital files very easy-to-use means for storing secret data or transmitting secret messages through the Internet. Depending on the cover medium used to embed the data, there are different steganalysis methods. In case of images, many of the steganalysis and steganographic methods are focused on JPEG image formats, since JPEG is one of the most common formats. One of the main important handicaps of steganalysis methods is the processing speed, since it is usually necessary to process huge amount of data or it can be necessary to process the on-going internet traffic in real-time. In this paper, a JPEG steganalysis system is implemented in an FPGA in order to speed-up the detection process with respect to software-based implementations and to increase the throughput. In particular, the implemented method is the JPEG-compatibility detection algorithm that is based on the fact that when a JPEG image is modified, the resulting image is incompatible with the JPEG compression process.
Dragonfly: an implementation of the expand–maximize–compress algorithm for single-particle imaging1
Ayyer, Kartik; Lan, Ti-Yen; Elser, Veit; Loh, N. Duane
2016-01-01
Single-particle imaging (SPI) with X-ray free-electron lasers has the potential to change fundamentally how biomacromolecules are imaged. The structure would be derived from millions of diffraction patterns, each from a different copy of the macromolecule before it is torn apart by radiation damage. The challenges posed by the resultant data stream are staggering: millions of incomplete, noisy and un-oriented patterns have to be computationally assembled into a three-dimensional intensity map and then phase reconstructed. In this paper, the Dragonfly software package is described, based on a parallel implementation of the expand–maximize–compress reconstruction algorithm that is well suited for this task. Auxiliary modules to simulate SPI data streams are also included to assess the feasibility of proposed SPI experiments at the Linac Coherent Light Source, Stanford, California, USA. PMID:27504078
Recursive multiport schemes for implementing quantum algorithms with photonic integrated circuits
NASA Astrophysics Data System (ADS)
Tabia, Gelo Noel M.
2016-01-01
We present recursive multiport schemes for implementing quantum Fourier transforms and the inversion step in Grover's algorithm on an integrated linear optics device. In particular, each scheme shows how to execute a quantum operation on 2 d modes using a pair of circuits for the same operation on d modes. The circuits operate on path-encoded qudits and realize d -dimensional unitary transformations on these states using linear optical networks with O (d2) optical elements. To evaluate the schemes against realistic errors, we ran simulations of proof-of-principle experiments using a simple fabrication model of silicon-based photonic integrated devices that employ directional couplers and thermo-optic modulators for beam splitters and phase shifters, respectively. We find that high-fidelity performance is achievable with our multiport circuits for 2-qubit and 3-qubit quantum Fourier transforms, and for quantum search on four-item and eight-item databases.
FPGA implementation of 2-D discrete cosine transforms algorithm using systemC
NASA Astrophysics Data System (ADS)
Liu, Yifei; Ding, Mingyue
2007-12-01
Discrete Cosine Transform (DCT) is widely applied in image and video compression. This paper presented the software and hardware co-design method based on SystemC. As a case of study, a two dimension (2D) DCT Algorithm was implemented on Programmable Gate Arrays (FPGAs) chip. The short simulation time and verification process greatly increases the design efficiency of SystemC, making the product designed by SystemC more quickly into the market. The design effect using SystemC is compared between the expertise hardware designer and the software designer with little hardware knowledge. The result shows SystemC is an excellent and high efficiency hardware design method for an expertise hardware designer.
NASA Astrophysics Data System (ADS)
Fenner, Jonathan W.; Simon, Solomon H.; Eden, Dayton D.
1994-07-01
As new IR focal plane array (IRFPA) technologies become available, improved methods for coping with array errors must be developed. Traditional methods of nonuniformity correction using simple calibration mode are not adequate to compensate for the inherent nonuniformity and 1/f noise in some arrays. In an effort to compensate for nonuniformity in a HgCdTe IRFPA, and to reduce the effects of 1/f noise over a time interval, a new dynamic neural network (NN) based algorithm was implemented. The algorithm compensates for nonuniformities, and corrects for 1/f noise. A gradient descent algorithm is used with nearest neighbor feedback for training, creating a dynamic model of the IRFPA's gains and offsets, then updating and correcting them continuously. Improvements to the NN include implementation on a IBM 486 computer system, and a close examination of simulated scenes to test the algorithms boundaries. Preliminary designs for a real-time hardware prototype have been developed as well. Simulations were implemented to test the algorithm's ability to correct under a variety of conditions. A wide range of background noise, 1/f noise, object intensities, and background intensities were used. Results indicate that this algorithm can correct efficiently down to the background noise. Our conclusions are that NN based adaptive algorithms will supplement the effectiveness of IRFPA's.
Implementation of damage detection algorithms for the Alfred Zampa Memorial Suspension Bridge
NASA Astrophysics Data System (ADS)
Talebinejad, I.; Sedarat, H.; Emami-Naeini, A.; Krimotat, A.; Lynch, Jerome
2014-03-01
This study investigated a number of different damage detection algorithms for structural health monitoring of a typical suspension bridge. The Alfred Zampa Memorial Bridge, a part of the Interstate 80 in California, was selected for this study. The focus was to implement and validate simple damage detection algorithms for structural health monitoring of complex bridges. Accordingly, the numerical analysis involved development of a high fidelity finite element model of the bridge in order to simulate various structural damage scenarios. The finite element model of the bridge was validated based on the experimental modal properties. A number of damage scenarios were simulated by changing the stiffness of different bridge components including suspenders, main cable, bulkheads and deck. Several vibration-based damage detection methods namely the change in the stiffness, change in the flexibility, change in the uniform load surface and change in the uniform load surface curvature were employed to locate the simulated damages. The investigation here provides the relative merits and shortcomings of these methods when applied to long span suspension bridges. It also shows the applicability of these methods to locate the decay in the structure.
NASA Astrophysics Data System (ADS)
Yu, Biin; Peng, Xiang; Tian, Jindong; Niu, Hanben
2006-01-01
A cascaded iterative angular spectrum approach (CIASA) based on the methodology of virtual optics is presented for optical security applications. The technique encodes the target image into two different phase only masks (POM) using a concept of free-space angular spectrum propagation. The two phase-masks are designed and located in any two arbitrary planes interrelated through the free space propagation domain in order to implement the optical encryption or authenticity verification. And both phase masks can serve as enciphered texts. Compared with previous methods, the proposed algorithm employs an improved searching strategy: modifying the phase-distributions of both masks synchronously as well as enlarging the searching space. And with such a scheme, we make use of a high performance floating-point Digital Signal Processor (DSP) to accomplish a design of multiple-locks and multiple-keys optical image encryption system. An evaluation of the system performance is made and it is shown that the algorithm results in much faster convergence and better image quality for the recovered image. And two masks and system parameters can be used to design keys for image encryption, therefore the decrypted image can be obtained only when all these keys are under authorization. This key-assignment strategy may reduce the risk of being intruded and show a high security level. These characters may introduce a high level security that makes the encrypted image more difficult to be decrypted by an unauthorized person.
Auberry, Kathy; Cullen, Deborah
2016-03-01
Based on the results of the Surrogate Decision-Making Self Efficacy Scale (Lopez, 2009a), this study sought to determine whether nurses working in the field of intellectual disability (ID) experience increased confidence when they implemented the American Association of Neuroscience Nurses (AANN) Seizure Algorithm during telephone triage. The results of the study indicated using the AANN Seizure Algorithm increased self-confidence for many of the nurses in guiding care decisions during telephone triage. The treatment effect was statistically significant -3.169(p < 0.01) for a small sample of study participants. This increase in confidence is clinically essential for two reasons. Many individuals with ID and epilepsy reside within community-based settings. ID nurses provide seizure guidance to this population living in community-based settings via telephone triage. Evidenced-based training tools provide a valuable mechanism by guiding nurses via best practices. Nurses may need to be formally trained for seizure management due to high epilepsy rates in this population. PMID:26283660
FPGA implementation of dynamic channel assignment algorithm for cognitive wireless sensor networks
NASA Astrophysics Data System (ADS)
Martínez, Daniela M.; Andrade, Ángel G.
2015-07-01
The reliability of wireless sensor networks (WSNs) in industrial applications can be thwarted due to multipath fading, noise generated by industrial equipment or heavy machinery and particularly by the interference generated from other wireless devices operating in the same spectrum band. Recently, cognitive WSNs (CWSNs) were proposed to improve the performance and reliability of WSNs in highly interfered and noisy environments. In this class of WSN, the nodes are spectrum aware, that is, they monitor the radio spectrum to find channels available for data transmission and dynamically assign and reassign nodes to low-interference condition channels. In this work, we present the implementation of a channel assignment algorithm in a field-programmable gate array, which dynamically assigns channels to sensor nodes based on the interference and noise levels experimented in the network. From the results obtained from the performance evaluation of the CWSN when the channel assignment algorithm is considered, it is possible to identify how many channels should be available in the network in order to achieve a desired percentage of successful transmissions, subject to constraints on the signal-to-interference plus noise ratio on each active link.
Design and implementation of a hybrid MPI-CUDA model for the Smith-Waterman algorithm.
Khaled, Heba; Faheem, Hossam El Deen Mostafa; El Gohary, Rania
2015-01-01
This paper provides a novel hybrid model for solving the multiple pair-wise sequence alignment problem combining message passing interface and CUDA, the parallel computing platform and programming model invented by NVIDIA. The proposed model targets homogeneous cluster nodes equipped with similar Graphical Processing Unit (GPU) cards. The model consists of the Master Node Dispatcher (MND) and the Worker GPU Nodes (WGN). The MND distributes the workload among the cluster working nodes and then aggregates the results. The WGN performs the multiple pair-wise sequence alignments using the Smith-Waterman algorithm. We also propose a modified implementation to the Smith-Waterman algorithm based on computing the alignment matrices row-wise. The experimental results demonstrate a considerable reduction in the running time by increasing the number of the working GPU nodes. The proposed model achieved a performance of about 12 Giga cell updates per second when we tested against the SWISS-PROT protein knowledge base running on four nodes. PMID:26510289
NASA Astrophysics Data System (ADS)
Rajaram, Vignesh; Subramanian, Shankar C.
2016-07-01
An important aspect from the perspective of operational safety of heavy road vehicles is the detection and avoidance of collisions, particularly at high speeds. The development of a collision avoidance system is the overall focus of the research presented in this paper. The collision avoidance algorithm was developed using a sliding mode controller (SMC) and compared to one developed using linear full state feedback in terms of performance and controller effort. Important dynamic characteristics such as load transfer during braking, tyre-road interaction, dynamic brake force distribution and pneumatic brake system response were considered. The effect of aerodynamic drag on the controller performance was also studied. The developed control algorithms have been implemented on a Hardware-in-Loop experimental set-up equipped with the vehicle dynamic simulation software, IPG/TruckMaker®. The evaluation has been performed for realistic traffic scenarios with different loading and road conditions. The Hardware-in-Loop experimental results showed that the SMC and full state feedback controller were able to prevent the collision. However, when the discrepancies in the form of parametric variations were included, the SMC provided better results in terms of reduced stopping distance and lower controller effort compared to the full state feedback controller.
A real-time GPU implementation of the SIFT algorithm for large-scale video analysis tasks
NASA Astrophysics Data System (ADS)
Fassold, Hannes; Rosner, Jakub
2015-02-01
The SIFT algorithm is one of the most popular feature extraction methods and therefore widely used in all sort of video analysis tasks like instance search and duplicate/ near-duplicate detection. We present an efficient GPU implementation of the SIFT descriptor extraction algorithm using CUDA. The major steps of the algorithm are presented and for each step we describe how to efficiently parallelize it massively, how to take advantage of the unique capabilities of the GPU like shared memory / texture memory and how to avoid or minimize common GPU performance pitfalls. We compare the GPU implementation with the reference CPU implementation in terms of runtime and quality and achieve a speedup factor of approximately 3 - 5 for SD and 5 - 6 for Full HD video with respect to a multi-threaded CPU implementation, allowing us to run the SIFT descriptor extraction algorithm in real-time on SD video. Furthermore, quality tests show that the GPU implementation gives the same quality as the reference CPU implementation from the HessSIFT library. We further describe the benefits of GPU-accelerated SIFT descriptor calculation for video analysis applications such as near-duplicate video detection.
Real-time implementation of camera positioning algorithm based on FPGA & SOPC
NASA Astrophysics Data System (ADS)
Yang, Mingcao; Qiu, Yuehong
2014-09-01
In recent years, with the development of positioning algorithm and FPGA, to achieve the camera positioning based on real-time implementation, rapidity, accuracy of FPGA has become a possibility by way of in-depth study of embedded hardware and dual camera positioning system, this thesis set up an infrared optical positioning system based on FPGA and SOPC system, which enables real-time positioning to mark points in space. Thesis completion include: (1) uses a CMOS sensor to extract the pixel of three objects with total feet, implemented through FPGA hardware driver, visible-light LED, used here as the target point of the instrument. (2) prior to extraction of the feature point coordinates, the image needs to be filtered to avoid affecting the physical properties of the system to bring the platform, where the median filtering. (3) Coordinate signs point to FPGA hardware circuit extraction, a new iterative threshold selection method for segmentation of images. Binary image is then segmented image tags, which calculates the coordinates of the feature points of the needle through the center of gravity method. (4) direct linear transformation (DLT) and extreme constraints method is applied to three-dimensional reconstruction of the plane array CMOS system space coordinates. using SOPC system on a chip here, taking advantage of dual-core computing systems, which let match and coordinate operations separately, thus increase processing speed.
Implementing and Evaluating Multithreaded Triad Census Algorithms on the Cray XMT
Chin, George; Marquez, Andres; Choudhury, Sutanay; Maschhoff, Kristyn
2009-05-29
Commonly represented as directed graphs, social networks depict relationships and behaviors among social entities such as people, groups, and organizations. Social network analysis denotes a class of mathematical and statistical methods designed to study and measure social networks. Beyond sociolo-gy, social network analysis methods are being applied to other types of data in other domains such as bioinformatics, computer networks, national security, and economics. For particular problems, the size of a social network can grow to millions of nodes and tens of millions of edges or more. In such cases, researchers could benefit from the application of social network analysis algorithms on high-performance architectures and systems. The Cray XMT is a third generation multithreaded system based on the Cray XT-3/4 platform. Like most other multithreaded architectures, the Cray XMT is designed to tolerate memory access latencies by switching context between threads. The processors maintain multiple threads of execution and util-ize hardware-based context switching to overlap the memory latency incurred by any thread with the computations from other threads. Due to its memory latency tolerance, the Cray XMT has the poten-tial of significantly improving the execution speed of irregular data-intensive applications such as those found in social network analysis. In this paper, we describe our experiences in developing and optimizing three implementations of a social network analysis method known as triadic analysis to execute on the Cray XMT. The three im-plementations possess different execution complexities, qualities, and characteristics. We evaluate how the various attributes of the codes affect their performance on the Cray XMT. We also explore the effects of different compiler options and execution strategies on the different triadic analysis im-plementations and identify general XMT programming issues and lessons learned.
NASA Astrophysics Data System (ADS)
Liu, Hsin-I.; Richards, Brian; Zakhor, Avideh; Nikolic, Borivoje
2010-03-01
Future lithography systems must produce chips with smaller feature sizes, while maintaining throughput comparable to today's optical lithography systems. This places stringent data handling requirements on the design of any direct-write maskless system. To achieve the throughput of one wafer layer per minute with a direct-write maskless lithography system, using 22 nm pixels for 45 nm technology, a data rate of 12 Tb/s is required. In our past research, we have developed a datapath architecture for direct-write lithography systems, and have shown that lossless compression plays a key role in reducing throughput requirements of such systems. Our approach integrates a low complexity hardware-based decoder with the writers, in order to decode a compressed data layer in real time on the fly. In doing so, we have developed a spectrum of lossless compression algorithms for integrated circuit rasterized layout data to provide a tradeoff between compression efficiency and hardware complexity, the most promising of which is Block Golomb Context Copy Coding (Block GC3). In this paper, we present the synthesis results of the Block GC3 decoder for both FPGA and ASIC implementations. For one Block GC3 decoder, 3233 slice flip-flops and 3086 4-input LUTs are utilized in a Xilinx Virtex II Pro 70 FPGA, which corresponds to 4% of its resources, along with 1.7 KB of internal memory. The system runs at 100 MHz clock rate, with the overall output rate of 495 Mb/s for a single decoder. The corresponding ASIC implementation results in a 0.07 mm2 design with the maximum output rate of 2.47 Gb/s. In addition to the decoder implementation results, we discuss other hardware implementation issues for the writer system data path, including on-chip input/output buffering, error propagation control, and input data stream packaging. This hardware data path implementation is independent of the writer systems or data link types, and can be integrated with arbitrary directwrite lithography systems.
NASA Astrophysics Data System (ADS)
Hadia, Sarman K.; Thakker, R. A.; Bhatt, Kirit R.
2016-05-01
The study proposes an application of evolutionary algorithms, specifically an artificial bee colony (ABC), variant ABC and particle swarm optimisation (PSO), to extract the parameters of metal oxide semiconductor field effect transistor (MOSFET) model. These algorithms are applied for the MOSFET parameter extraction problem using a Pennsylvania surface potential model. MOSFET parameter extraction procedures involve reducing the error between measured and modelled data. This study shows that ABC algorithm optimises the parameter values based on intelligent activities of honey bee swarms. Some modifications have also been applied to the basic ABC algorithm. Particle swarm optimisation is a population-based stochastic optimisation method that is based on bird flocking activities. The performances of these algorithms are compared with respect to the quality of the solutions. The simulation results of this study show that the PSO algorithm performs better than the variant ABC and basic ABC algorithm for the parameter extraction of the MOSFET model; also the implementation of the ABC algorithm is shown to be simpler than that of the PSO algorithm.
NASA Astrophysics Data System (ADS)
Zhang, Zhenhai; Li, Kejie; Wu, Xiaobing; Zhang, Shujiang
2008-03-01
The unwrapped and correcting algorithm based on Coordinate Rotation Digital Computer (CORDIC) and bilinear interpolation algorithm was presented in this paper, with the purpose of processing dynamic panoramic annular image. An original annular panoramic image captured by panoramic annular lens (PAL) can be unwrapped and corrected to conventional rectangular image without distortion, which is much more coincident with people's vision. The algorithm for panoramic image processing is modeled by VHDL and implemented in FPGA. The experimental results show that the proposed panoramic image algorithm for unwrapped and distortion correction has the lower computation complexity and the architecture for dynamic panoramic image processing has lower hardware cost and power consumption. And the proposed algorithm is valid.
NASA Astrophysics Data System (ADS)
Vela, Adan Ernesto
2011-12-01
From 2010 to 2030, the number of instrument flight rules aircraft operations handled by Federal Aviation Administration en route traffic centers is predicted to increase from approximately 39 million flights to 64 million flights. The projected growth in air transportation demand is likely to result in traffic levels that exceed the abilities of the unaided air traffic controller in managing, separating, and providing services to aircraft. Consequently, the Federal Aviation Administration, and other air navigation service providers around the world, are making several efforts to improve the capacity and throughput of existing airspaces. Ultimately, the stated goal of the Federal Aviation Administration is to triple the available capacity of the National Airspace System by 2025. In an effort to satisfy air traffic demand through the increase of airspace capacity, air navigation service providers are considering the inclusion of advisory conflict-detection and resolution systems. In a human-in-the-loop framework, advisory conflict-detection and resolution decision-support tools identify potential conflicts and propose resolution commands for the air traffic controller to verify and issue to aircraft. A number of researchers and air navigation service providers hypothesize that the inclusion of combined conflict-detection and resolution tools into air traffic control systems will reduce or transform controller workload and enable the required increases in airspace capacity. In an effort to understand the potential workload implications of introducing advisory conflict-detection and resolution tools, this thesis provides a detailed study of the conflict event process and the implementation of conflict-detection and resolution algorithms. Specifically, the research presented here examines a metric of controller taskload: how many resolution commands an air traffic controller issues under the guidance of a conflict-detection and resolution decision-support tool. The goal
NASA Astrophysics Data System (ADS)
Lindegren, L.; Lammers, U.; Hobbs, D.; O'Mullane, W.; Bastian, U.; Hernández, J.
2012-02-01
Context. The Gaia satellite will observe about one billion stars and other point-like sources. The astrometric core solution will determine the astrometric parameters (position, parallax, and proper motion) for a subset of these sources, using a global solution approach which must also include a large number of parameters for the satellite attitude and optical instrument. The accurate and efficient implementation of this solution is an extremely demanding task, but crucial for the outcome of the mission. Aims: We aim to provide a comprehensive overview of the mathematical and physical models applicable to this solution, as well as its numerical and algorithmic framework. Methods: The astrometric core solution is a simultaneous least-squares estimation of about half a billion parameters, including the astrometric parameters for some 100 million well-behaved so-called primary sources. The global nature of the solution requires an iterative approach, which can be broken down into a small number of distinct processing blocks (source, attitude, calibration and global updating) and auxiliary processes (including the frame rotator and selection of primary sources). We describe each of these processes in some detail, formulate the underlying models, from which the observation equations are derived, and outline the adopted numerical solution methods with due consideration of robustness and the structure of the resulting system of equations. Appendices provide brief introductions to some important mathematical tools (quaternions and B-splines for the attitude representation, and a modified Cholesky algorithm for positive semidefinite problems) and discuss some complications expected in the real mission data. Results: A complete software system called AGIS (Astrometric Global Iterative Solution) is being built according to the methods described in the paper. Based on simulated data for 2 million primary sources we present some initial results, demonstrating the basic
NASA Astrophysics Data System (ADS)
Gutiérrez, Rebeca; Just, Dieter
2014-10-01
The Meteosat Third Generation (MTG) Programme is the next generation of European geostationary meteorological systems. The first MTG satellite, which is scheduled for launch at the end of 2018/early 2019, will host two imaging instruments: the Flexible Combined Imager (FCI) and the Lightning Imager. The FCI will continue the operation of the SEVIRI imager on the current Meteosat Second Generation satellites (MSG), but with an improved spatial, temporal and spectral resolution, not dissimilar to GOES-R (of NASA/NOAA). The transition from spinner to 3-axis stabilised platform, a 2-axis tapered scan pattern with overlaps between adjacent scan swaths, and the more stringent geometric, radiometric and timeliness requirements, make the rectification process for MTG FCI more challenging than for MSG SEVIRI. The effect of non-uniform sampling in the image rectification process was analysed in an earlier paper. The use of classical interpolation methods, such as truncated Shannon interpolation or cubic convolution interpolation, was shown to cause significant errors when applied to non-uniform samples. Moreover, cubic splines and Lagrange interpolation were selected as candidate resampling algorithms for the FCI rectification that can cope with irregularities in the sampling acquisition process. This paper extends the study for the two-dimensional case focusing on practical 2D interpolation methods and its feasibility for an operational implementation. Candidate kernels are described and assessed with respect to MTG requirements. The operational constraints of the Level 1 processor have been considered to develop an early image rectification prototype, including the impact of the potential curvature of the FCI scan swaths. The implementation follows a swath-based approach, uses parallel processing to speed up computation time and allows the selection of a number of resampling functions. Due to the tight time constraints of the FCI level 1 processing chain, focus is both on
NASA Astrophysics Data System (ADS)
Pietrzyk, Mariusz W.; Rannou, Didier; Brennan, Patrick C.
2012-02-01
This pilot study examines the effect of a novel decision support system in medical image interpretation. This system is based on combining image spatial frequency properties and eye-tracking data in order to recognize over and under calling errors. Thus, before it can be implemented as a detection aided schema, training is required during which SVMbased algorithm learns to recognize FP from all reported outcomes, and, FN from all unreported prolonged dwelled regions. Eight radiologists inspected 50 PA chest radiographs with the specific task of identifying lung nodules. Twentyfive cases contained CT proven subtle malignant lesions (5-20mm), but prevalence was not known by the subjects, who took part in two sequential reading sessions, the second, without and with support system feedback. MCMR ROC DBM and JAFROC analyses were conducted and demonstrated significantly higher scores following feedback with p values of 0.04, and 0.03 respectively, highlighting significant improvements in radiology performance once feedback was used. This positive effect on radiologists' performance might have important implications for future CAD-system development.
Liu, Ju-Chi; Chou, Hung-Chyun; Chen, Chien-Hsiu; Lin, Yi-Tseng
2016-01-01
A high efficient time-shift correlation algorithm was proposed to deal with the peak time uncertainty of P300 evoked potential for a P300-based brain-computer interface (BCI). The time-shift correlation series data were collected as the input nodes of an artificial neural network (ANN), and the classification of four LED visual stimuli was selected as the output node. Two operating modes, including fast-recognition mode (FM) and accuracy-recognition mode (AM), were realized. The proposed BCI system was implemented on an embedded system for commanding an adult-size humanoid robot to evaluate the performance from investigating the ground truth trajectories of the humanoid robot. When the humanoid robot walked in a spacious area, the FM was used to control the robot with a higher information transfer rate (ITR). When the robot walked in a crowded area, the AM was used for high accuracy of recognition to reduce the risk of collision. The experimental results showed that, in 100 trials, the accuracy rate of FM was 87.8% and the average ITR was 52.73 bits/min. In addition, the accuracy rate was improved to 92% for the AM, and the average ITR decreased to 31.27 bits/min. due to strict recognition constraints. PMID:27579033
NASA Astrophysics Data System (ADS)
Perez, L.; Autrique, L.; Gillet, M.
2008-11-01
The aim of this paper is to investigate the thermal diffusivity identification of a multilayered material dedicated to fire protection. In a military framework, fire protection needs to meet specific requirements, and operational protective systems must be constantly improved in order to keep up with the development of new weapons. In the specific domain of passive fire protections, intumescent coatings can be an effective solution on the battlefield. Intumescent materials have the ability to swell up when they are heated, building a thick multi-layered coating which provides efficient thermal insulation to the underlying material. Due to the heat aggressions (fire or explosion) leading to the intumescent phenomena, high temperatures are considered and prevent from linearization of the mathematical model describing the system state evolution. Previous sensitivity analysis has shown that the thermal diffusivity of the multilayered intumescent coating is a key parameter in order to validate the predictive numerical tool and therefore for thermal protection optimisation. A conjugate gradient method is implemented in order to minimise the quadratic cost function related to the error between predicted temperature and measured temperature. This regularisation algorithm is well adapted for a large number of unknown parameters.
Albert, Carlo; Vogel, Sören
2016-01-01
The General Unified Threshold model of Survival (GUTS) provides a consistent mathematical framework for survival analysis. However, the calibration of GUTS models is computationally challenging. We present a novel algorithm and its fast implementation in our R package, GUTS, that help to overcome these challenges. We show a step-by-step application example consisting of model calibration and uncertainty estimation as well as making probabilistic predictions and validating the model with new data. Using self-defined wrapper functions, we show how to produce informative text printouts and plots without effort, for the inexperienced as well as the advanced user. The complete ready-to-run script is available as supplemental material. We expect that our software facilitates novel re-analysis of existing survival data as well as asking new research questions in a wide range of sciences. In particular the ability to quickly quantify stressor thresholds in conjunction with dynamic compensating processes, and their uncertainty, is an improvement that complements current survival analysis methods. PMID:27340823
Position Accuracy Improvement by Implementing the DGNSS-CP Algorithm in Smartphones
Yoon, Donghwan; Kee, Changdon; Seo, Jiwon; Park, Byungwoon
2016-01-01
The position accuracy of Global Navigation Satellite System (GNSS) modules is one of the most significant factors in determining the feasibility of new location-based services for smartphones. Considering the structure of current smartphones, it is impossible to apply the ordinary range-domain Differential GNSS (DGNSS) method. Therefore, this paper describes and applies a DGNSS-correction projection method to a commercial smartphone. First, the local line-of-sight unit vector is calculated using the elevation and azimuth angle provided in the position-related output of Android’s LocationManager, and this is transformed to Earth-centered, Earth-fixed coordinates for use. To achieve position-domain correction for satellite systems other than GPS, such as GLONASS and BeiDou, the relevant line-of-sight unit vectors are used to construct an observation matrix suitable for multiple constellations. The results of static and dynamic tests show that the standalone GNSS accuracy is improved by about 30%–60%, thereby reducing the existing error of 3–4 m to just 1 m. The proposed algorithm enables the position error to be directly corrected via software, without the need to alter the hardware and infrastructure of the smartphone. This method of implementation and the subsequent improvement in performance are expected to be highly effective to portability and cost saving. PMID:27322284
Position Accuracy Improvement by Implementing the DGNSS-CP Algorithm in Smartphones.
Yoon, Donghwan; Kee, Changdon; Seo, Jiwon; Park, Byungwoon
2016-01-01
The position accuracy of Global Navigation Satellite System (GNSS) modules is one of the most significant factors in determining the feasibility of new location-based services for smartphones. Considering the structure of current smartphones, it is impossible to apply the ordinary range-domain Differential GNSS (DGNSS) method. Therefore, this paper describes and applies a DGNSS-correction projection method to a commercial smartphone. First, the local line-of-sight unit vector is calculated using the elevation and azimuth angle provided in the position-related output of Android's LocationManager, and this is transformed to Earth-centered, Earth-fixed coordinates for use. To achieve position-domain correction for satellite systems other than GPS, such as GLONASS and BeiDou, the relevant line-of-sight unit vectors are used to construct an observation matrix suitable for multiple constellations. The results of static and dynamic tests show that the standalone GNSS accuracy is improved by about 30%-60%, thereby reducing the existing error of 3-4 m to just 1 m. The proposed algorithm enables the position error to be directly corrected via software, without the need to alter the hardware and infrastructure of the smartphone. This method of implementation and the subsequent improvement in performance are expected to be highly effective to portability and cost saving. PMID:27322284
Liu, Ju-Chi; Chou, Hung-Chyun; Chen, Chien-Hsiu; Lin, Yi-Tseng; Kuo, Chung-Hsien
2016-01-01
A high efficient time-shift correlation algorithm was proposed to deal with the peak time uncertainty of P300 evoked potential for a P300-based brain-computer interface (BCI). The time-shift correlation series data were collected as the input nodes of an artificial neural network (ANN), and the classification of four LED visual stimuli was selected as the output node. Two operating modes, including fast-recognition mode (FM) and accuracy-recognition mode (AM), were realized. The proposed BCI system was implemented on an embedded system for commanding an adult-size humanoid robot to evaluate the performance from investigating the ground truth trajectories of the humanoid robot. When the humanoid robot walked in a spacious area, the FM was used to control the robot with a higher information transfer rate (ITR). When the robot walked in a crowded area, the AM was used for high accuracy of recognition to reduce the risk of collision. The experimental results showed that, in 100 trials, the accuracy rate of FM was 87.8% and the average ITR was 52.73 bits/min. In addition, the accuracy rate was improved to 92% for the AM, and the average ITR decreased to 31.27 bits/min. due to strict recognition constraints. PMID:27579033
Massanes, Francesc; Cadennes, Marie; Brankov, Jovan G
2011-07-01
In this paper we describe and evaluate a fast implementation of a classical block matching motion estimation algorithm for multiple Graphical Processing Units (GPUs) using the Compute Unified Device Architecture (CUDA) computing engine. The implemented block matching algorithm (BMA) uses summed absolute difference (SAD) error criterion and full grid search (FS) for finding optimal block displacement. In this evaluation we compared the execution time of a GPU and CPU implementation for images of various sizes, using integer and non-integer search grids.The results show that use of a GPU card can shorten computation time by a factor of 200 times for integer and 1000 times for a non-integer search grid. The additional speedup for non-integer search grid comes from the fact that GPU has built-in hardware for image interpolation. Further, when using multiple GPU cards, the presented evaluation shows the importance of the data splitting method across multiple cards, but an almost linear speedup with a number of cards is achievable.In addition we compared execution time of the proposed FS GPU implementation with two existing, highly optimized non-full grid search CPU based motion estimations methods, namely implementation of the Pyramidal Lucas Kanade Optical flow algorithm in OpenCV and Simplified Unsymmetrical multi-Hexagon search in H.264/AVC standard. In these comparisons, FS GPU implementation still showed modest improvement even though the computational complexity of FS GPU implementation is substantially higher than non-FS CPU implementation.We also demonstrated that for an image sequence of 720×480 pixels in resolution, commonly used in video surveillance, the proposed GPU implementation is sufficiently fast for real-time motion estimation at 30 frames-per-second using two NVIDIA C1060 Tesla GPU cards. PMID:22347787
NASA Astrophysics Data System (ADS)
Massanes, Francesc; Cadennes, Marie; Brankov, Jovan G.
2011-07-01
We describe and evaluate a fast implementation of a classical block-matching motion estimation algorithm for multiple graphical processing units (GPUs) using the compute unified device architecture computing engine. The implemented block-matching algorithm uses summed absolute difference error criterion and full grid search (FS) for finding optimal block displacement. In this evaluation, we compared the execution time of a GPU and CPU implementation for images of various sizes, using integer and noninteger search grids. The results show that use of a GPU card can shorten computation time by a factor of 200 times for integer and 1000 times for a noninteger search grid. The additional speedup for a noninteger search grid comes from the fact that GPU has built-in hardware for image interpolation. Further, when using multiple GPU cards, the presented evaluation shows the importance of the data splitting method across multiple cards, but an almost linear speedup with a number of cards is achievable. In addition, we compared the execution time of the proposed FS GPU implementation with two existing, highly optimized nonfull grid search CPU-based motion estimations methods, namely implementation of the Pyramidal Lucas Kanade Optical flow algorithm in OpenCV and simplified unsymmetrical multi-hexagon search in H.264/AVC standard. In these comparisons, FS GPU implementation still showed modest improvement even though the computational complexity of FS GPU implementation is substantially higher than non-FS CPU implementation. We also demonstrated that for an image sequence of 720 × 480 pixels in resolution commonly used in video surveillance, the proposed GPU implementation is sufficiently fast for real-time motion estimation at 30 frames-per-second using two NVIDIA C1060 Tesla GPU cards.
Implementation of EPID transit dosimetry based on a through-air dosimetry algorithm
Berry, Sean L.; Sheu, Ren-Dih; Polvorosa, Cynthia S.; Wuu, Cheng-Shie
2012-01-15
Purpose: A method to perform transit dosimetry with an electronic portal imaging device (EPID) by extending the commercial implementation of a published through-air portal dose image (PDI) prediction algorithm Van Esch et al.[Radiother. Oncol. 71, 223-234 (2004)] is proposed and validated. A detailed characterization of the attenuation, scattering, and EPID response behind objects in the beam path is used to convert through-air PDIs into transit PDIs. Methods: The EPID detector response beyond a range of water equivalent thicknesses (0-35 cm) and field sizes (3x3 to 22.2x29.6 cm{sup 2}) was analyzed. A constant air gap between the phantom exit surface and the EPID was utilized. A model was constructed that accounts for the beam's attenuation along the central axis, the presence of phantom scattered radiation, the detector's energy dependent response, and the difference in EPID off-axis pixel response relative to the central pixel. The efficacy of the algorithm was verified by comparing predicted and measured PDIs for IMRT fields delivered through phantoms of increasing complexity. Results: The expression that converts a through-air PDI to a transit PDI is dependent on the object's thickness, the irradiated field size, and the EPID pixel position. Monte Carlo derived narrow-beam linear attenuation coefficients are used to model the decrease in primary fluence incident upon the EPID due to the object's presence in the beam. This term is multiplied by a factor that accounts for the broad beam scatter geometry of the linac-phantom-EPID system and the detector's response to the incident beam quality. A 2D Gaussian function that models the nonuniformity of pixel response across the EPID detector plane is developed. For algorithmic verification, 49 IMRT fields were repeatedly delivered to homogeneous slab phantoms in 5 cm increments. Over the entire set of measurements, the average area passing a 3%/3mm gamma criteria slowly decreased from 98% for no material in the beam
Gong, Zongyi; Klanian, Kelly; Patel, Tushita; Sullivan, Olivia; Williams, Mark B.
2012-01-01
Purpose: We are developing a dual modality tomosynthesis breast scanner in which x-ray transmission tomosynthesis and gamma emission tomosynthesis are performed sequentially with the breast in a common configuration. In both modalities projection data are obtained over an angular range of less than 180° from one side of the mildly compressed breast resulting in incomplete and asymmetrical sampling. The objective of this work is to implement and evaluate a maximum likelihood expectation maximization (MLEM) reconstruction algorithm for gamma emission breast tomosynthesis (GEBT). Methods: A combination of Monte Carlo simulations and phantom experiments was used to test the MLEM algorithm for GEBT. The algorithm utilizes prior information obtained from the x-ray breast tomosynthesis scan to partially compensate for the incomplete angular sampling and to perform attenuation correction (AC) and resolution recovery (RR). System spatial resolution, image artifacts, lesion contrast, and signal to noise ratio (SNR) were measured as image quality figures of merit. To test the robustness of the reconstruction algorithm and to assess the relative impacts of correction techniques with changing angular range, simulations and experiments were both performed using acquisition angular ranges of 45°, 90° and 135°. For comparison, a single projection containing the same total number of counts as the full GEBT scan was also obtained to simulate planar breast scintigraphy. Results: The in-plane spatial resolution of the reconstructed GEBT images is independent of source position within the reconstructed volume and independent of acquisition angular range. For 45° acquisitions, spatial resolution in the depth dimension (the direction of breast compression) is degraded with increasing source depth (increasing distance from the collimator surface). Increasing the acquisition angular range from 45° to 135° both greatly reduces this depth dependence and improves the average depth
Schneider, Nadine; Sayle, Roger A; Landrum, Gregory A
2015-10-26
Finding a canonical ordering of the atoms in a molecule is a prerequisite for generating a unique representation of the molecule. The canonicalization of a molecule is usually accomplished by applying some sort of graph relaxation algorithm, the most common of which is the Morgan algorithm. There are known issues with that algorithm that lead to noncanonical atom orderings as well as problems when it is applied to large molecules like proteins. Furthermore, each cheminformatics toolkit or software provides its own version of a canonical ordering, most based on unpublished algorithms, which also complicates the generation of a universal unique identifier for molecules. We present an alternative canonicalization approach that uses a standard stable-sorting algorithm instead of a Morgan-like index. Two new invariants that allow canonical ordering of molecules with dependent chirality as well as those with highly symmetrical cyclic graphs have been developed. The new approach proved to be robust and fast when tested on the 1.45 million compounds of the ChEMBL 20 data set in different scenarios like random renumbering of input atoms or SMILES round tripping. Our new algorithm is able to generate a canonical order of the atoms of protein molecules within a few milliseconds. The novel algorithm is implemented in the open-source cheminformatics toolkit RDKit. With this paper, we provide a reference Python implementation of the algorithm that could easily be integrated in any cheminformatics toolkit. This provides a first step toward a common standard for canonical atom ordering to generate a universal unique identifier for molecules other than InChI. PMID:26441310
Chlorophyll fluorescence: implementation in the full physics RemoTeC algorithm
NASA Astrophysics Data System (ADS)
Hahne, Philipp; Frankenberg, Christian; Hasekamp, Otto; Landgraf, Jochen; Butz, André
2014-05-01
Several operating and future satellite missions are dedicated to enhancing our understanding of the carbon cycle. They infer the atmospheric concentrations of carbon dioxide and methane from shortwave infrared absorption spectra of sunlight backscattered from Earth's atmosphere and surface. Exhibiting high spatial and temporal resolution, the inferred gas concentration databases provide valuable information for inverse modelling of source and sink processes at the Earth's surface. However, the inversion of sources and sinks requires highly accurate total column CO2 (XCO2) and CH4 (XCH4) measurements, which remains a challenge. Recently, Frankenberg et al., 2012, showed that - beside XCO2 and XCH4 - chlorophyll fluorescence can be retrieved from sounders such as GOSAT exploiting Fraunhofer lines in the vicinity of the O2 A-band. This has two implications: a) chlorophyll fluorescence itself being a proxy for photosynthetic activity yields new information on carbon cycle processes and b) the neglect of the fluorescence signal can induce errors in the retrieved greenhouse gas concentrations. Our RemoTeC full physics algorithm iteratively retrieves the target gas concentrations XCO2 and XCH4 along with atmospheric scattering properties and other auxiliary parameters. The radiative transfer model (RTM) LINTRAN provides RemoTeC with the single and multiple scattered intensity field and its analytically calculated derivatives. Here, we report on the implementation of a fluorescence light source at the lower boundary of our RTM. Processing three years of GOSAT data, we evaluate the performance of the refined retrieval method. To this end, we compare different retrieval configurations, using the s- and p-polarization detectors independently and combined, and validate to independent data sources.
Implementation of FFT Algorithm using DSP TMS320F28335 for Shunt Active Power Filter
NASA Astrophysics Data System (ADS)
Patel, Pinkal Jashvantbhai; Patel, Rajesh M.; Patel, Vinod
2016-07-01
This work presents simulation, analysis and experimental verification of Fast Fourier Transform (FFT) algorithm for shunt active power filter based on three-level inverter. Different types of filters can be used for elimination of harmonics in the power system. In this work, FFT algorithm for reference current generation is discussed. FFT control algorithm is verified using PSIM simulation results with DLL block and C-code. Simulation results are compared with experimental results for FFT algorithm using DSP TMS320F28335 for shunt active power filter application.
NASA Astrophysics Data System (ADS)
Jo, Sunhwan; Jiang, Wei
2015-12-01
Replica Exchange with Solute Tempering (REST2) is a powerful sampling enhancement algorithm of molecular dynamics (MD) in that it needs significantly smaller number of replicas but achieves higher sampling efficiency relative to standard temperature exchange algorithm. In this paper, we extend the applicability of REST2 for quantitative biophysical simulations through a robust and generic implementation in greatly scalable MD software NAMD. The rescaling procedure of force field parameters controlling REST2 "hot region" is implemented into NAMD at the source code level. A user can conveniently select hot region through VMD and write the selection information into a PDB file. The rescaling keyword/parameter is written in NAMD Tcl script interface that enables an on-the-fly simulation parameter change. Our implementation of REST2 is within communication-enabled Tcl script built on top of Charm++, thus communication overhead of an exchange attempt is vanishingly small. Such a generic implementation facilitates seamless cooperation between REST2 and other modules of NAMD to provide enhanced sampling for complex biomolecular simulations. Three challenging applications including native REST2 simulation for peptide folding-unfolding transition, free energy perturbation/REST2 for absolute binding affinity of protein-ligand complex and umbrella sampling/REST2 Hamiltonian exchange for free energy landscape calculation were carried out on IBM Blue Gene/Q supercomputer to demonstrate efficacy of REST2 based on the present implementation.
Strout, Michelle
2015-08-15
Programming parallel machines is fraught with difficulties: the obfuscation of algorithms due to implementation details such as communication and synchronization, the need for transparency between language constructs and performance, the difficulty of performing program analysis to enable automatic parallelization techniques, and the existence of important "dusty deck" codes. The SAIMI project developed abstractions that enable the orthogonal specification of algorithms and implementation details within the context of existing DOE applications. The main idea is to enable the injection of small programming models such as expressions involving transcendental functions, polyhedral iteration spaces with sparse constraints, and task graphs into full programs through the use of pragmas. These smaller, more restricted programming models enable orthogonal specification of many implementation details such as how to map the computation on to parallel processors, how to schedule the computation, and how to allocation storage for the computation. At the same time, these small programming models enable the expression of the most computationally intense and communication heavy portions in many scientific simulations. The ability to orthogonally manipulate the implementation for such computations will significantly ease performance programming efforts and expose transformation possibilities and parameter to automated approaches such as autotuning. At Colorado State University, the SAIMI project was supported through DOE grant DE-SC3956 from April 2010 through August 2015. The SAIMI project has contributed a number of important results to programming abstractions that enable the orthogonal specification of implementation details in scientific codes. This final report summarizes the research that was funded by the SAIMI project.
Design and Implementation of High-Speed Input-Queued Switches Based on a Fair Scheduling Algorithm
NASA Astrophysics Data System (ADS)
Hu, Qingsheng; Zhao, Hua-An
To increase both the capacity and the processing speed for input-queued (IQ) switches, we proposed a fair scalable scheduling architecture (FSSA). By employing FSSA comprised of several cascaded sub-schedulers, a large-scale high performance switches or routers can be realized without the capacity limitation of monolithic device. In this paper, we present a fair scheduling algorithm named FSSA_DI based on an improved FSSA where a distributed iteration scheme is employed, the scheduler performance can be improved and the processing time can be reduced as well. Simulation results show that FSSA_DI achieves better performance on average delay and throughput under heavy loads compared to other existing algorithms. Moreover, a practical 64 × 64 FSSA using FSSA_DI algorithm is implemented by four Xilinx Vertex-4 FPGAs. Measurement results show that the data rates of our solution can be up to 800Mbps and the tradeoff between performance and hardware complexity has been solved peacefully.
Optical pattern recognition architecture implementing the mean-square error correlation algorithm
Molley, Perry A.
1991-01-01
An optical architecture implementing the mean-square error correlation algorithm, MSE=.SIGMA.[I-R].sup.2 for discriminating the presence of a reference image R in an input image scene I by computing the mean-square-error between a time-varying reference image signal s.sub.1 (t) and a time-varying input image signal s.sub.2 (t) includes a laser diode light source which is temporally modulated by a double-sideband suppressed-carrier source modulation signal I.sub.1 (t) having the form I.sub.1 (t)=A.sub.1 [1+.sqroot.2m.sub.1 s.sub.1 (t)cos (2.pi.f.sub.o t)] and the modulated light output from the laser diode source is diffracted by an acousto-optic deflector. The resultant intensity of the +1 diffracted order from the acousto-optic device is given by: I.sub.2 (t)=A.sub.2 [+2m.sub.2.sup.2 s.sub.2.sup.2 (t)-2.sqroot.2m.sub.2 (t) cos (2.pi.f.sub.o t] The time integration of the two signals I.sub.1 (t) and I.sub.2 (t) on the CCD deflector plane produces the result R(.tau.) of the mean-square error having the form: R(.tau.)=A.sub.1 A.sub.2 {[T]+[2m.sub.2.sup.2.multidot..intg.s.sub.2.sup.2 (t-.tau.)dt]-[2m.sub.1 m.sub.2 cos (2.tau.f.sub.o .tau.).multidot..intg.s.sub.1 (t)s.sub.2 (t-.tau.)dt]} where: s.sub.1 (t) is the signal input to the diode modulation source: s.sub.2 (t) is the signal input to the AOD modulation source; A.sub.1 is the light intensity; A.sub.2 is the diffraction efficiency; m.sub.1 and m.sub.2 are constants that determine the signal-to-bias ratio; f.sub.o is the frequency offset between the oscillator at f.sub.c and the modulation at f.sub.c +f.sub.o ; and a.sub.o and a.sub.1 are constant chosen to bias the diode source and the acousto-optic deflector into their respective linear operating regions so that the diode source exhibits a linear intensity characteristic and the AOD exhibits a linear amplitude characteristic.
Terra, Ricardo Mingarini; Waisberg, Daniel Reis; de Almeida, José Luiz Jesus; Devido, Marcela Santana; Pêgo-Fernandes, Paulo Manuel; Jatene, Fabio Biscegli
2012-01-01
OBJECTIVE: We aimed to evaluate whether the inclusion of videothoracoscopy in a pleural empyema treatment algorithm would change the clinical outcome of such patients. METHODS: This study performed quality-improvement research. We conducted a retrospective review of patients who underwent pleural decortication for pleural empyema at our institution from 2002 to 2008. With the old algorithm (January 2002 to September 2005), open decortication was the procedure of choice, and videothoracoscopy was only performed in certain sporadic mid-stage cases. With the new algorithm (October 2005 to December 2008), videothoracoscopy became the first-line treatment option, whereas open decortication was only performed in patients with a thick pleural peel (>2 cm) observed by chest scan. The patients were divided into an old algorithm (n = 93) and new algorithm (n = 113) group and compared. The main outcome variables assessed included treatment failure (pleural space reintervention or death up to 60 days after medical discharge) and the occurrence of complications. RESULTS: Videothoracoscopy and open decortication were performed in 13 and 80 patients from the old algorithm group and in 81 and 32 patients from the new algorithm group, respectively (p<0.01). The patients in the new algorithm group were older (41±1 vs. 46.3±16.7 years, p = 0.014) and had higher Charlson Comorbidity Index scores [0(0-3) vs. 2(0-4), p = 0.032]. The occurrence of treatment failure was similar in both groups (19.35% vs. 24.77%, p = 0.35), although the complication rate was lower in the new algorithm group (48.3% vs. 33.6%, p = 0.04). CONCLUSIONS: The wider use of videothoracoscopy in pleural empyema treatment was associated with fewer complications and unaltered rates of mortality and reoperation even though more severely ill patients were subjected to videothoracoscopic surgery. PMID:22760892
On implementation of EM-type algorithms in the stochastic models for a matrix computing on GPU
Gorshenin, Andrey K.
2015-03-10
The paper discusses the main ideas of an implementation of EM-type algorithms for computing on the graphics processors and the application for the probabilistic models based on the Cox processes. An example of the GPU’s adapted MATLAB source code for the finite normal mixtures with the expectation-maximization matrix formulas is given. The testing of computational efficiency for GPU vs CPU is illustrated for the different sample sizes.
Scemama, Anthony; Renon, Nicolas; Rapacioli, Mathias
2014-06-10
We present an algorithm and its parallel implementation for solving a self-consistent problem as encountered in Hartree-Fock or density functional theory. The algorithm takes advantage of the sparsity of matrices through the use of local molecular orbitals. The implementation allows one to exploit efficiently modern symmetric multiprocessing (SMP) computer architectures. As a first application, the algorithm is used within the density-functional-based tight binding method, for which most of the computational time is spent in the linear algebra routines (diagonalization of the Fock/Kohn-Sham matrix). We show that with this algorithm (i) single point calculations on very large systems (millions of atoms) can be performed on large SMP machines, (ii) calculations involving intermediate size systems (1000-100 000 atoms) are also strongly accelerated and can run efficiently on standard servers, and (iii) the error on the total energy due to the use of a cutoff in the molecular orbital coefficients can be controlled such that it remains smaller than the SCF convergence criterion. PMID:26580754
Implementation of genetic algorithm for distribution systems loss minimum re-configuration
Nara, K.; Shiose, A. ); Kitagawa, M.; Ishihara, T. )
1992-08-01
In this paper, a distribution systems loss minimum reconfiguration method by genetic algorithm is proposed. The problem is a complex mixed integer programming problem and is very difficult to solve by a mathematical programming approach. A genetic algorithm (GA) is a search or optimization algorithm based on the mechanics of natural selection and natural genetics. Since GA is suitable to solve combinatorial optimization problems, it can be successfully applied to problems of loss minimum in distribution systems. Numerical examples demonstrate the validity and effectiveness of the proposed methodology.
Multi-Core Parallel Implementation of Data Filtering Algorithm for Multi-Beam Bathymetry Data
NASA Astrophysics Data System (ADS)
Liu, Tianyang; Xu, Weiming; Yin, Xiaodong; Zhao, Xiliang
In order to improve the multi-beam bathymetry data processing speed, we propose a parallel filtering algorithm based on multi thread technology. The algorithm consists of two parts. The first is the parallel data re-order step, in which the surveying area is divided into a regular grid, and the discrete bathymetry data is arranged into each grid by parallel method. The second part is the parallel filtering step, which involves dividing the grid into blocks and parallel executing filtering process in each block. In the experiment, the speedup of the proposed algorithm reaches to about 3.67 with an 8 core computer. The result shows the method can improve computing efficiency significantly comparing to the traditional algorithm.
Study on algorithm and real-time implementation of infrared image processing based on FPGA
NASA Astrophysics Data System (ADS)
Pang, Yulin; Ding, Ruijun; Liu, Shanshan; Chen, Zhe
2010-10-01
With the fast development of Infrared Focal Plane Arrays (IRFPA) detectors, high quality real-time image processing becomes more important in infrared imaging system. Facing the demand of better visual effect and good performance, we find FPGA is an ideal choice of hardware to realize image processing algorithm that fully taking advantage of its high speed, high reliability and processing a great amount of data in parallel. In this paper, a new idea of dynamic linear extension algorithm is introduced, which has the function of automatically finding the proper extension range. This image enhancement algorithm is designed in Verilog HDL and realized on FPGA. It works on higher speed than serial processing device like CPU and DSP. Experiment shows that this hardware unit of dynamic linear extension algorithm enhances the visual effect of infrared image effectively.
NASA Technical Reports Server (NTRS)
Vicroy, D. D.
1984-01-01
A simplified flight management descent algorithm was developed and programmed on a small programmable calculator. It was designed to aid the pilot in planning and executing a fuel conservative descent to arrive at a metering fix at a time designated by the air traffic control system. The algorithm may also be used for planning fuel conservative descents when time is not a consideration. The descent path was calculated for a constant Mach/airspeed schedule from linear approximations of airplane performance with considerations given for gross weight, wind, and nonstandard temperature effects. An explanation and examples of how the algorithm is used, as well as a detailed flow chart and listing of the algorithm are contained.
The VLSI implementation of a Reed-Solomon encoder using Berlekamp's bit-serial multiplier algorithm
NASA Technical Reports Server (NTRS)
Hsu, I.-S.; Reed, I. S.; Wang, K.; Yeh, C.-S.; Truong, T. K.; Deutsch, L. J.
1984-01-01
Realization of a bit-serial multiplication algorithm for the encoding of Reed-Solomon (RS) codes on a single VLSI chip using NMOS technology is demonstrated to be feasible. A dual basis (255, 223) over a Galois field is used. The conventional RS encoder for long codes often requires look-up tables to perform the multiplication of two field elements. Berlekamp's algorithm requires only shifting and exclusive-OR operations.
NASA Astrophysics Data System (ADS)
Plaza, Antonio; Chang, Chein-I.; Plaza, Javier; Valencia, David
2006-05-01
The incorporation of hyperspectral sensors aboard airborne/satellite platforms is currently producing a nearly continual stream of multidimensional image data, and this high data volume has soon introduced new processing challenges. The price paid for the wealth spatial and spectral information available from hyperspectral sensors is the enormous amounts of data that they generate. Several applications exist, however, where having the desired information calculated quickly enough for practical use is highly desirable. High computing performance of algorithm analysis is particularly important in homeland defense and security applications, in which swift decisions often involve detection of (sub-pixel) military targets (including hostile weaponry, camouflage, concealment, and decoys) or chemical/biological agents. In order to speed-up computational performance of hyperspectral imaging algorithms, this paper develops several fast parallel data processing techniques. Techniques include four classes of algorithms: (1) unsupervised classification, (2) spectral unmixing, and (3) automatic target recognition, and (4) onboard data compression. A massively parallel Beowulf cluster (Thunderhead) at NASA's Goddard Space Flight Center in Maryland is used to measure parallel performance of the proposed algorithms. In order to explore the viability of developing onboard, real-time hyperspectral data compression algorithms, a Xilinx Virtex-II field programmable gate array (FPGA) is also used in experiments. Our quantitative and comparative assessment of parallel techniques and strategies may help image analysts in selection of parallel hyperspectral algorithms for specific applications.
NASA Astrophysics Data System (ADS)
Yang, Eunjung; Lee, Jonghyun; Jung, Byungwook; Chun, Joohwan
2005-09-01
We describe a flexible hardware system of the Doppler radar which is designed to verify various baseband array signal processing algorithms. In this work we design the Doppler radar system simulator for baseband signal processing in laboratory level. Based on this baseband signal processor, a PN-code pulse doppler radar simulator is developed. More specifically, this simulator consists of an echo signal generation part and a signal processing part. For the echo signal generation part, we use active array structure with 4 elements, and adopt baker coded PCM signal in transmission and reception for digital pulse compression. In the signal processing part, we first transform RF radar pulse to the baseband signal because we use the basebands algorithms using IF sampling. Various digital beamforming algorithms can be adopted as a baseband algorithm in our simulator. We mainly use Multiple Sidelobe Canceller (MSC) with main array antenna elements and auxiliary antenna elements as beamforming and sidelobe canceller algorithm. For Doppler filtering algorithms, we use the FFT. A control set is necessary to control overall system and to manage the timing schedule for the operation.
Implementation on Landsat Data of a Simple Cloud Mask Algorithm Developed for MODIS Land Bands
NASA Technical Reports Server (NTRS)
Oreopoulos, Lazaros; Wilson, Michael J.; Varnai, Tamas
2010-01-01
This letter assesses the performance on Landsat-7 images of a modified version of a cloud masking algorithm originally developed for clear-sky compositing of Moderate Resolution Imaging Spectroradiometer (MODIS) images at northern mid-latitudes. While data from recent Landsat missions include measurements at thermal wavelengths, and such measurements are also planned for the next mission, thermal tests are not included in the suggested algorithm in its present form to maintain greater versatility and ease of use. To evaluate the masking algorithm we take advantage of the availability of manual (visual) cloud masks developed at USGS for the collection of Landsat scenes used here. As part of our evaluation we also include the Automated Cloud Cover Assesment (ACCA) algorithm that includes thermal tests and is used operationally by the Landsat-7 mission to provide scene cloud fractions, but no cloud masks. We show that the suggested algorithm can perform about as well as ACCA both in terms of scene cloud fraction and pixel-level cloud identification. Specifically, we find that the algorithm gives an error of 1.3% for the scene cloud fraction of 156 scenes, and a root mean square error of 7.2%, while it agrees with the manual mask for 93% of the pixels, figures very similar to those from ACCA (1.2%, 7.1%, 93.7%).
Xie, Dexuan; Dash, Ranjan K.; Beard, Daniel A.
2009-01-01
Fast algorithms for simulating mathematical models of coupled blood-tissue transport and metabolism are critical for the analysis of data on transport and reaction in tissues. Here, by combining the method of characteristics with the standard grid discretization technique, a novel algorithm is introduced for solving a general blood-tissue transport and metabolism model governed by a large system of one-dimensional semilinear first order partial differential equations. The key part of the algorithm is to approximate the model as a group of independent ordinary differential equation (ODE) systems such that each ODE system has the same size as the model and can be integrated independently. Thus the method can be easily implemented in parallel on a large scale multiprocessor computer. The accuracy of the algorithm is demonstrated for solving a simple blood-tissue exchange model introduced by Sangren and Sheppard (Bull. Math. Biophys. 15:387–394, 1953), which has an analytical solution. Numerical experiments made on a distributed-memory parallel computer (an HP Linux cluster) and a shared-memory parallel computer (a SGI Origin 2000) demonstrate the parallel efficiency of the algorithm. PMID:20161089
Xie, Dexuan; Dash, Ranjan K; Beard, Daniel A
2009-11-01
Fast algorithms for simulating mathematical models of coupled blood-tissue transport and metabolism are critical for the analysis of data on transport and reaction in tissues. Here, by combining the method of characteristics with the standard grid discretization technique, a novel algorithm is introduced for solving a general blood-tissue transport and metabolism model governed by a large system of one-dimensional semilinear first order partial differential equations. The key part of the algorithm is to approximate the model as a group of independent ordinary differential equation (ODE) systems such that each ODE system has the same size as the model and can be integrated independently. Thus the method can be easily implemented in parallel on a large scale multiprocessor computer. The accuracy of the algorithm is demonstrated for solving a simple blood-tissue exchange model introduced by Sangren and Sheppard (Bull. Math. Biophys. 15:387-394, 1953), which has an analytical solution. Numerical experiments made on a distributed-memory parallel computer (an HP Linux cluster) and a shared-memory parallel computer (a SGI Origin 2000) demonstrate the parallel efficiency of the algorithm. PMID:20161089
Kendziorra, Carsten; Meyer, Henning; Dewey, Marc
2014-01-01
This paper presents a phase detection algorithm for four-dimensional (4D) cardiac computed tomography (CT) analysis. The algorithm detects a phase, i.e. a specific three-dimensional (3D) image out of several time-distributed 3D images, with high contrast in the left ventricle and low contrast in the right ventricle. The purpose is to use the automatically detected phase in an existing algorithm that automatically aligns the images along the heart axis. Decision making is based on the contrast agent distribution over time. It was implemented in KardioPerfusion – a software framework currently being developed for 4D CT myocardial perfusion analysis. Agreement of the phase detection algorithm with two reference readers was 97% (95% CI: 82–100%). Mean duration for detection was 0.020 s (95% CI: 0.018–0.022 s), which was times less than the readers needed (s, ). Thus, this algorithm is an accurate and fast tool that can improve work flow of clinical examinations. PMID:25545863
Fu, Haohao; Shao, Xueguang; Chipot, Christophe; Cai, Wensheng
2016-08-01
Proper use of the adaptive biasing force (ABF) algorithm in free-energy calculations needs certain prerequisites to be met, namely, that the Jacobian for the metric transformation and its first derivative be available and the coarse variables be independent and fully decoupled from any holonomic constraint or geometric restraint, thereby limiting singularly the field of application of the approach. The extended ABF (eABF) algorithm circumvents these intrinsic limitations by applying the time-dependent bias onto a fictitious particle coupled to the coarse variable of interest by means of a stiff spring. However, with the current implementation of eABF in the popular molecular dynamics engine NAMD, a trajectory-based post-treatment is necessary to derive the underlying free-energy change. Usually, such a posthoc analysis leads to a decrease in the reliability of the free-energy estimates due to the inevitable loss of information, as well as to a drop in efficiency, which stems from substantial read-write accesses to file systems. We have developed a user-friendly, on-the-fly code for performing eABF simulations within NAMD. In the present contribution, this code is probed in eight illustrative examples. The performance of the algorithm is compared with traditional ABF, on the one hand, and the original eABF implementation combined with a posthoc analysis, on the other hand. Our results indicate that the on-the-fly eABF algorithm (i) supplies the correct free-energy landscape in those critical cases where the coarse variables at play are coupled to either each other or to geometric restraints or holonomic constraints, (ii) greatly improves the reliability of the free-energy change, compared to the outcome of a posthoc analysis, and (iii) represents a negligible additional computational effort compared to regular ABF. Moreover, in the proposed implementation, guidelines for choosing two parameters of the eABF algorithm, namely the stiffness of the spring and the mass
Orio, Patricio; Soudry, Daniel
2012-01-01
Background The phenomena that emerge from the interaction of the stochastic opening and closing of ion channels (channel noise) with the non-linear neural dynamics are essential to our understanding of the operation of the nervous system. The effects that channel noise can have on neural dynamics are generally studied using numerical simulations of stochastic models. Algorithms based on discrete Markov Chains (MC) seem to be the most reliable and trustworthy, but even optimized algorithms come with a non-negligible computational cost. Diffusion Approximation (DA) methods use Stochastic Differential Equations (SDE) to approximate the behavior of a number of MCs, considerably speeding up simulation times. However, model comparisons have suggested that DA methods did not lead to the same results as in MC modeling in terms of channel noise statistics and effects on excitability. Recently, it was shown that the difference arose because MCs were modeled with coupled gating particles, while the DA was modeled using uncoupled gating particles. Implementations of DA with coupled particles, in the context of a specific kinetic scheme, yielded similar results to MC. However, it remained unclear how to generalize these implementations to different kinetic schemes, or whether they were faster than MC algorithms. Additionally, a steady state approximation was used for the stochastic terms, which, as we show here, can introduce significant inaccuracies. Main Contributions We derived the SDE explicitly for any given ion channel kinetic scheme. The resulting generic equations were surprisingly simple and interpretable – allowing an easy, transparent and efficient DA implementation, avoiding unnecessary approximations. The algorithm was tested in a voltage clamp simulation and in two different current clamp simulations, yielding the same results as MC modeling. Also, the simulation efficiency of this DA method demonstrated considerable superiority over MC methods, except when
Egan, A; Laub, W
2014-06-15
Purpose: Several shortcomings of the current implementation of the analytic anisotropic algorithm (AAA) may lead to dose calculation errors in highly modulated treatments delivered to highly heterogeneous geometries. Here we introduce a set of dosimetric error predictors that can be applied to a clinical treatment plan and patient geometry in order to identify high risk plans. Once a problematic plan is identified, the treatment can be recalculated with more accurate algorithm in order to better assess its viability. Methods: Here we focus on three distinct sources dosimetric error in the AAA algorithm. First, due to a combination of discrepancies in smallfield beam modeling as well as volume averaging effects, dose calculated through small MLC apertures can be underestimated, while that behind small MLC blocks can overestimated. Second, due the rectilinear scaling of the Monte Carlo generated pencil beam kernel, energy is not properly transported through heterogeneities near, but not impeding, the central axis of the beamlet. And third, AAA overestimates dose in regions very low density (< 0.2 g/cm{sup 3}). We have developed an algorithm to detect the location and magnitude of each scenario within the patient geometry, namely the field-size index (FSI), the heterogeneous scatter index (HSI), and the lowdensity index (LDI) respectively. Results: Error indices successfully identify deviations between AAA and Monte Carlo dose distributions in simple phantom geometries. Algorithms are currently implemented in the MATLAB computing environment and are able to run on a typical RapidArc head and neck geometry in less than an hour. Conclusion: Because these error indices successfully identify each type of error in contrived cases, with sufficient benchmarking, this method can be developed into a clinical tool that may be able to help estimate AAA dose calculation errors and when it might be advisable to use Monte Carlo calculations.
NASA Astrophysics Data System (ADS)
Rueda, Antonio J.; Noguera, José M.; Luque, Adrián
2016-02-01
In recent years GPU computing has gained wide acceptance as a simple low-cost solution for speeding up computationally expensive processing in many scientific and engineering applications. However, in most cases accelerating a traditional CPU implementation for a GPU is a non-trivial task that requires a thorough refactorization of the code and specific optimizations that depend on the architecture of the device. OpenACC is a promising technology that aims at reducing the effort required to accelerate C/C++/Fortran code on an attached multicore device. Virtually with this technology the CPU code only has to be augmented with a few compiler directives to identify the areas to be accelerated and the way in which data has to be moved between the CPU and GPU. Its potential benefits are multiple: better code readability, less development time, lower risk of errors and less dependency on the underlying architecture and future evolution of the GPU technology. Our aim with this work is to evaluate the pros and cons of using OpenACC against native GPU implementations in computationally expensive hydrological applications, using the classic D8 algorithm of O'Callaghan and Mark for river network extraction as case-study. We implemented the flow accumulation step of this algorithm in CPU, using OpenACC and two different CUDA versions, comparing the length and complexity of the code and its performance with different datasets. We advance that although OpenACC can not match the performance of a CUDA optimized implementation (×3.5 slower in average), it provides a significant performance improvement against a CPU implementation (×2-6) with by far a simpler code and less implementation effort.
NASA Astrophysics Data System (ADS)
von Rudorff, Guido Falk; Wehmeyer, Christoph; Sebastiani, Daniel
2014-06-01
We adapt a swarm-intelligence-based optimization method (the artificial bee colony algorithm, ABC) to enhance its parallel scaling properties and to improve the escaping behavior from deep local minima. Specifically, we apply the approach to the geometry optimization of Lennard-Jones clusters. We illustrate the performance and the scaling properties of the parallelization scheme for several system sizes (5-20 particles). Our main findings are specific recommendations for ranges of the parameters of the ABC algorithm which yield maximal performance for Lennard-Jones clusters and Morse clusters. The suggested parameter ranges for these different interaction potentials turn out to be very similar; thus, we believe that our reported values are fairly general for the ABC algorithm applied to chemical optimization problems.
NASA Technical Reports Server (NTRS)
Whitmore, S. A.
1985-01-01
The dynamics model and data sources used to perform air-data reconstruction are discussed, as well as the Kalman filter. The need for adaptive determination of the noise statistics of the process is indicated. The filter innovations are presented as a means of developing the adaptive criterion, which is based on the true mean and covariance of the filter innovations. A method for the numerical approximation of the mean and covariance of the filter innovations is presented. The algorithm as developed is applied to air-data reconstruction for the space shuttle, and data obtained from the third landing are presented. To verify the performance of the adaptive algorithm, the reconstruction is also performed using a constant covariance Kalman filter. The results of the reconstructions are compared, and the adaptive algorithm exhibits better performance.
NASA Technical Reports Server (NTRS)
Whitmore, S. A.
1985-01-01
The dynamics model and data sources used to perform air-data reconstruction are discussed, as well as the Kalman filter. The need for adaptive determination of the noise statistics of the process is indicated. The filter innovations are presented as a means of developing the adaptive criterion, which is based on the true mean and covariance of the filter innovations. A method for the numerical approximation of the mean and covariance of the filter innovations is presented. The algorithm as developed is applied to air-data reconstruction for the Space Shuttle, and data obtained from the third landing are presented. To verify the performance of the adaptive algorithm, the reconstruction is also performed using a constant covariance Kalman filter. The results of the reconstructions are compared, and the adaptive algorithm exhibits better performance.
Implementation of Future Climate Satellite Cloud Algorithms: Case of the GCOM-C/SGLI
NASA Astrophysics Data System (ADS)
Dim, J. R.; Murakami, H.; Nakajima, T. Y.; Takamura, T.
2012-12-01
The Global Change Observation Mission-Climate/Second Generation GLobal Imager (GCOM-C/SGLI) is a future Earth observation satellite to be launched in 2015. Its major objective is the monitoring of long-term climate changes. A major factor of these changes is the cloud impact. A new cloud algorithm adapted to the spectral characteristics of the GCOM-C/SGLI and the products derived are currently tested. The tests consist of evaluating the performance of the cloud optical thickness (COT) and the cloud particle effective radius (CLER) against simulation data, and equivalent products derived from a compatible satellite, the Terra/MODerate resolution Image Spectrometer (Terra/MODIS). In addition to these tests, the sensitivity of the products derived from this algorithm, to external and internal cloud related parameters, is analyzed. The base-map of the initial data input for this algorithm is made of geometrically corrected radiances of the Advanced Earth Observation Satellite II/GLobal Imager (ADEOS-II/GLI) and the GCOM-C/SGLI simulated radiances. The results of these performance tests, based on timely matching products, show that the GCOM-C/SGLI algorithm performs relatively well for averagely overcast scenes, with an agreement rate of ±20% with the satellite simulation products and the Terra/MODIS COT and CLER. A negative bias is however frequently observed, with the GCOM-C/SGLI retrieved parameters showing higher values at high COT levels. The algorithm also seems less reactive to thin and small particles' clouds mainly in land areas, compared to Terra/MODIS data and the satellite simulation products. Sensitivity to varying ground albedo, cloud phase, cloud structure and cloud location are analyzed to understand the influence of these parameters on the results obtained. Possible consequences of these influences on long-term climate variations and the bases for the improvement of the present algorithm in various cloud types' conditions are discussed.
Transport implementation of the Bernstein–Vazirani algorithm with ion qubits
NASA Astrophysics Data System (ADS)
Fallek, S. D.; Herold, C. D.; McMahon, B. J.; Maller, K. M.; Brown, K. R.; Amini, J. M.
2016-08-01
Using trapped ion quantum bits in a scalable microfabricated surface trap, we perform the Bernstein–Vazirani algorithm. Our architecture takes advantage of the ion transport capabilities of such a trap. The algorithm is demonstrated using two- and three-ion chains. For three ions, an improvement is achieved compared to a classical system using the same number of oracle queries. For two ions and one query, we correctly determine an unknown bit string with probability 97.6(8)%. For three ions, we succeed with probability 80.9(3)%.
Two-chip implementation of the RSA public-key encryption algorithm
Rieden, R.F.; Snyder, J.B.; Widman, R.J.; Barnard, W.J.
1982-01-01
A system has been developed which employs two identical integrated circuits to perform the encryption algorithm developed by Rivest, Shamir, and Adleman (RSA) on a 336-bit message. The integrated circuit used in the system employs the 3-micron polysilicon gate, radiation-hard, CMOS technology developed at Sandia National Laboratories.
NASA Technical Reports Server (NTRS)
Springer, P.
1993-01-01
This paper discusses the method in which the Cascade-Correlation algorithm was parallelized in such a way that it could be run using the Time Warp Operating System (TWOS). TWOS is a special purpose operating system designed to run parellel discrete event simulations with maximum efficiency on parallel or distributed computers.
NASA Technical Reports Server (NTRS)
Soeder, J. F.
1983-01-01
As turbofan engines become more complex, the development of controls necessitate the use of multivariable control techniques. A control developed for the F100-PW-100(3) turbofan engine by using linear quadratic regulator theory and other modern multivariable control synthesis techniques is described. The assembly language implementation of this control on an SEL 810B minicomputer is described. This implementation was then evaluated by using a real-time hybrid simulation of the engine. The control software was modified to run with a real engine. These modifications, in the form of sensor and actuator failure checks and control executive sequencing, are discussed. Finally recommendations for control software implementations are presented.
Gregoretti, Francesco; Belcastro, Vincenzo; di Bernardo, Diego; Oliva, Gennaro
2010-01-01
The reverse engineering of gene regulatory networks using gene expression profile data has become crucial to gain novel biological knowledge. Large amounts of data that need to be analyzed are currently being produced due to advances in microarray technologies. Using current reverse engineering algorithms to analyze large data sets can be very computational-intensive. These emerging computational requirements can be met using parallel computing techniques. It has been shown that the Network Identification by multiple Regression (NIR) algorithm performs better than the other ready-to-use reverse engineering software. However it cannot be used with large networks with thousands of nodes--as is the case in biological networks--due to the high time and space complexity. In this work we overcome this limitation by designing and developing a parallel version of the NIR algorithm. The new implementation of the algorithm reaches a very good accuracy even for large gene networks, improving our understanding of the gene regulatory networks that is crucial for a wide range of biomedical applications. PMID:20422008
Passive microwave remote sensing of rainfall with SSM/I: Algorithm development and implementation
NASA Technical Reports Server (NTRS)
Ferriday, James G.; Avery, Susan K.
1994-01-01
A physically based algorithm sensitive to emission and scattering is used to estimate rainfall using the Special Sensor Microwave/Imager (SSM/I). The algorithm is derived from radiative transfer calculations through an atmospheric cloud model specifying vertical distributions of ice and liquid hydrometeors as a function of rain rate. The algorithm is structured in two parts: SSM/I brightness temperatures are screened to detect rainfall and are then used in rain-rate calculation. The screening process distinguishes between nonraining background conditions and emission and scattering associated with hydrometeors. Thermometric temperature and polarization thresholds determined from the radiative transfer calculations are used to detect rain, whereas the rain-rate calculation is based on a linear function fit to a linear combination of channels. Separate calculations for ocean and land account for different background conditions. The rain-rate calculation is constructed to respond to both emission and scattering, reduce extraneous atmospheric and surface effects, and to correct for beam filling. The resulting SSM/I rain-rate estimates are compared to three precipitation radars as well as to a dynamically simulated rainfall event. Global estimates from the SSM/I algorithm are also compared to continental and shipboard measurements over a 4-month period. The algorithm is found to accurately describe both localized instantaneous rainfall events and global monthly patterns over both land and ovean. Over land the 4-month mean difference between SSM/I and the Global Precipitation Climatology Center continental rain gauge database is less than 10%. Over the ocean, the mean difference between SSM/I and the Legates and Willmott global shipboard rain gauge climatology is less than 20%.
NASA Technical Reports Server (NTRS)
Choudhary, Alok Nidhi; Leung, Mun K.; Huang, Thomas S.; Patel, Janak H.
1989-01-01
Several techniques to perform static and dynamic load balancing techniques for vision systems are presented. These techniques are novel in the sense that they capture the computational requirements of a task by examining the data when it is produced. Furthermore, they can be applied to many vision systems because many algorithms in different systems are either the same, or have similar computational characteristics. These techniques are evaluated by applying them on a parallel implementation of the algorithms in a motion estimation system on a hypercube multiprocessor system. The motion estimation system consists of the following steps: (1) extraction of features; (2) stereo match of images in one time instant; (3) time match of images from different time instants; (4) stereo match to compute final unambiguous points; and (5) computation of motion parameters. It is shown that the performance gains when these data decomposition and load balancing techniques are used are significant and the overhead of using these techniques is minimal.
A well-separated pairs decomposition algorithm for k-d trees implemented on multi-core architectures
NASA Astrophysics Data System (ADS)
Lopes, Raul H. C.; Reid, Ivan D.; Hobson, Peter R.
2014-06-01
Variations of k-d trees represent a fundamental data structure used in Computational Geometry with numerous applications in science. For example particle track fitting in the software of the LHC experiments, and in simulations of N-body systems in the study of dynamics of interacting galaxies, particle beam physics, and molecular dynamics in biochemistry. The many-body tree methods devised by Barnes and Hutt in the 1980s and the Fast Multipole Method introduced in 1987 by Greengard and Rokhlin use variants of k-d trees to reduce the computation time upper bounds to O(n log n) and even O(n) from O(n2). We present an algorithm that uses the principle of well-separated pairs decomposition to always produce compressed trees in O(n log n) work. We present and evaluate parallel implementations for the algorithm that can take advantage of multi-core architectures.
Facchinetti, Andrea; Sparacino, Giovanni; Cobelli, Claudio
2013-01-01
Glucose readings provided by current continuous glucose monitoring (CGM) devices still suffer from accuracy and precision issues. In April 2013, we proposed a new conceptual architecture to deal with these problems and render CGM sensors algorithmically smarter, which consists of three modules for denoising, enhancement, and prediction placed in cascade to a commercial CGM sensor. The architecture was assessed on a data set consisting of 24 type 1 diabetes patients collected in four clinical centers of the AP@home Consortium (a European project of 7th Framework Programme funded by the European Committee). This article, as a companion to our prior publication, illustrates the technical details of the algorithms and of the implementation issues. PMID:24124959
NASA Astrophysics Data System (ADS)
Charlton, Mark Christopher
The development of low-cost sensors has generated a corresponding movement to integrate them into many different applications. One such application is determining the rotational attitude of an object. Since many of these low-cost sensors are less accurate than their more expensive counterparts, their noisy measurements must be filtered to obtain optimum results. This work describes the development, testing, and evaluation of four filtering algorithms for the nonlinear sounding rocket attitude determination problem. Sun sensor, magnetometer, and rate sensor measurements are simulated. A quatenion formulation is used to avoid singularity problems associated with Euler angles and other three-parameter approaches. Prior to filtering, Gauss-Newton error minimization is used to reduce the six reference vector components to four quaternion components that minimize a quadratic error function. Two of the algorithms are based on the traditional extended Kalman filter (EKF) and two are based on the recently developed unscented Kalman filter (UKF). One of each incorporates rate measurements, while the others rely on differencing quaternions. All incorporate a simplified process model for state propagation allowing the algorithms to be applied to rockets with different physical characteristics, or even to other platforms. Simulated data are used to develop and test the algorithms, and each successfully estimates the attitude motion of the rocket, to varying degrees of accuracy. The UKF-based filter that incorporates rate sensor measurements demonstrates a clear performance advantage over both EKFs and the UKF without rate measurements. This is due to its superior mean and covariance propagation characteristics and the fact that differencing generates noisier rates than measuring. For one sample case, the "pointing accuracy" of the rocket spin axis is improved by approximately 39 percent over the EKF that uses rate measurements and by 40 percent over the UKF without rates. The
NASA Technical Reports Server (NTRS)
Schallhorn, Paul; Majumdar, Alok
2012-01-01
This paper describes a finite volume based numerical algorithm that allows multi-dimensional computation of fluid flow within a system level network flow analysis. There are several thermo-fluid engineering problems where higher fidelity solutions are needed that are not within the capacity of system level codes. The proposed algorithm will allow NASA's Generalized Fluid System Simulation Program (GFSSP) to perform multi-dimensional flow calculation within the framework of GFSSP s typical system level flow network consisting of fluid nodes and branches. The paper presents several classical two-dimensional fluid dynamics problems that have been solved by GFSSP's multi-dimensional flow solver. The numerical solutions are compared with the analytical and benchmark solution of Poiseulle, Couette and flow in a driven cavity.
Implemented Wavelet Packet Tree based Denoising Algorithm in Bus Signals of a Wearable Sensorarray
NASA Astrophysics Data System (ADS)
Schimmack, M.; Nguyen, S.; Mercorelli, P.
2015-11-01
This paper introduces a thermosensing embedded system with a sensor bus that uses wavelets for the purposes of noise location and denoising. From the principle of the filter bank the measured signal is separated in two bands, low and high frequency. The proposed algorithm identifies the defined noise in these two bands. With the Wavelet Packet Transform as a method of Discrete Wavelet Transform, it is able to decompose and reconstruct bus input signals of a sensor network. Using a seminorm, the noise of a sequence can be detected and located, so that the wavelet basis can be rearranged. This particularly allows for elimination of any incoherent parts that make up unavoidable measuring noise of bus signals. The proposed method was built based on wavelet algorithms from the WaveLab 850 library of the Stanford University (USA). This work gives an insight to the workings of Wavelet Transformation.
Implementation of the SU(2) Hamiltonian Symmetry for the DMRG Algorithm
Alvarez, Gonzalo
2012-01-01
In the Density Matrix Renormalization Group (DMRG) algorithm (White, 1992, 1993) and Hamiltonian symmetries play an important role. Using symmetries, the matrix representation of the Hamiltonian can be blocked. Diagonalizing each matrix block is more efficient than diagonalizing the original matrix. This paper explains how the the DMRG++ code (Alvarez, 2009) has been extended to handle the non-local SU(2) symmetry in a model independent way. Improvements in CPU times compared to runs with only local symmetries are discussed for the one-orbital Hubbard model, and for a two-orbital Hubbard model for iron-based superconductors. The computational bottleneck of the algorithm and the use of shared memory parallelization are also addressed.
NASA Technical Reports Server (NTRS)
Chen, Shu-Po
1999-01-01
This paper presents software for solving the non-conforming fluid structure interfaces in aeroelastic simulation. It reviews the algorithm of interpolation and integration, highlights the flexibility and the user-friendly feature that allows the user to select the existing structure and fluid package, like NASTRAN and CLF3D, to perform the simulation. The presented software is validated by computing the High Speed Civil Transport model.
Schalkoff, R.J.; Shaaban, K.M.; Carver, A.E.
1996-12-31
The ARIES {number_sign}1 (Autonomous Robotic Inspection Experimental System) vision system is used to acquire drum surface images under controlled conditions and subsequently perform autonomous visual inspection leading to a classification as `acceptable` or `suspect`. Specific topics described include vision system design methodology, algorithmic structure,hardware processing structure, and image acquisition hardware. Most of these capabilities were demonstrated at the ARIES Phase II Demo held on Nov. 30, 1995. Finally, Phase III efforts are briefly addressed.
Implementation of a Landing Footprint Algorithm for the HTV-2 and Trajectory Simulations
NASA Technical Reports Server (NTRS)
Clark, Casie M.
2012-01-01
This presentation details work performed during the Fall 2011 term in the Research Controls and Dynamics Branch at NASA Dryden Flight Research Center. Included is a study on a possible landing footprint algorithm, with direct application to the HTV-2. Also discussed is work in support of the MIPCC effort, which includes optimal trajectory solutions for the F-15A Streak Eagle aircraft and theoretical performance of an F-15A with a MIPCC propulsion system.
Cantor network, control algorithm, two-dimensional compact structure and its optical implementation
NASA Astrophysics Data System (ADS)
Wang, Ning; Liu, Liren; Yin, Yaozu
1995-12-01
A compact integrating module technique for packaging a optical multistage Cantor network with a polarization multiplex technique is suggested. The modules have a unique configuration, which is the solid-state combination of a polarization rotator, double birefringent slabs, and a 2 \\times 2 switch array. The design and the fabrication of an eight-channel optical nonblocking Cantor network are demonstrated, and a fast-setup control algorithm is developed. The network systems are easy to assemble and insensitive to environment disturbance.
Towards a practical implementation of the MLE algorithm for Positron Emission Tomography
Llacer, J.; Andreae, S.; Veklerov, E.; Hoffman, E.J.
1985-10-01
Recognizing that the quality of images obtained by application of the Maximum Likelihood Estimator (MLE) to Positron Emission Tomography (PET) and Single Photon Emission Tomography (SPECT) appears to be substantially better than those obtained by conventional methods, we have started to develop methods that will facilitate the necessary research for a good evaluation of the algorithm and may lead to its practical application for research and routine tomography. We have found that the non-linear MLE algorithm can be used with pixel sizes which are smaller than the sampling distance, without interpolation, obtaining excellent resolution and no noticeable increase in noise. We have studied the role of symmetry in reducing the amount of matrix element storage requirements for full size applications of the algorithm and have used that concept to carry out two reconstructions of the Derenzo phantom with data from the ECAT-III instrument. The results show excellent signal-to-noise (S/N) ratio, particularly for data with low total counts, excellent sharpness, but low contrast at high frequencies when using the Shepp-Vardi model for probability matrices. 12 refs., 7 figs., 4 tabs.
Ravari, Alireza Norouzzadeh; Taghirad, Hamid D
2014-10-01
In this paper the problem of loop closing from depth or camera image information in an unknown environment is investigated. A sparse model is constructed from a parametric dictionary for every range or camera image as mobile robot observations. In contrast to high-dimensional feature-based representations, in this model, the dimension of the sensor measurements' representations is reduced. Considering the loop closure detection as a clustering problem in high-dimensional space, little attention has been paid to the curse of dimensionality in the existing state-of-the-art algorithms. In this paper, a representation is developed from a sparse model of images, with a lower dimension than original sensor observations. Exploiting the algorithmic information theory, the representation is developed such that it has the geometrically transformation invariant property in the sense of Kolmogorov complexity. A universal normalized metric is used for comparison of complexity based representations of image models. Finally, a distinctive property of normalized compression distance is exploited for detecting similar places and rejecting incorrect loop closure candidates. Experimental results show efficiency and accuracy of the proposed method in comparison to the state-of-the-art algorithms and some recently proposed methods. PMID:24968363
Clisby, Nathan
2010-02-01
We introduce a fast implementation of the pivot algorithm for self-avoiding walks, which we use to obtain large samples of walks on the cubic lattice of up to 33x10{6} steps. Consequently the critical exponent nu for three-dimensional self-avoiding walks is determined to great accuracy; the final estimate is nu=0.587 597(7). The method can be adapted to other models of polymers with short-range interactions, on the lattice or in the continuum. PMID:20366773
Massive parallel implementation of JPEG2000 decoding algorithm with multi-GPUs
NASA Astrophysics Data System (ADS)
Wu, Xianyun; Li, Yunsong; Liu, Kai; Wang, Keyan; Wang, Li
2014-05-01
JPEG2000 is an important technique for image compression that has been successfully used in many fields. Due to the increasing spatial, spectral and temporal resolution of remotely sensed imagery data sets, fast decompression of remote sensed data is becoming a very important and challenging object. In this paper, we develop an implementation of the JPEG2000 decompression in graphics processing units (GPUs) for fast decoding of codeblock-based parallel compression stream. We use one CUDA block to decode one frame. Tier-2 is still serial decoded while Tier-1 and IDWT are parallel processed. Since our encode stream are block-based parallel which means each block are independent with other blocks, we parallel process each block in T1 with one thread. For IDWT, we use one CUDA block to execute one line and one CUDA thread to process one pixel. We investigate the speedups that can be gained by using the GPUs implementations with regards to the CPUs-based serial implementations. Experimental result reveals that our implementation can achieve significant speedups compared with serial implementations.
NASA Astrophysics Data System (ADS)
Valencia, David; Plaza, Antonio; Vega-Rodríguez, Miguel A.; Pérez, Rosa M.
2005-11-01
Hyperspectral imagery is a class of image data which is used in many scientific areas, most notably, medical imaging and remote sensing. It is characterized by a wealth of spatial and spectral information. Over the last years, many algorithms have been developed with the purpose of finding "spectral endmembers," which are assumed to be pure signatures in remotely sensed hyperspectral data sets. Such pure signatures can then be used to estimate the abundance or concentration of materials in mixed pixels, thus allowing sub-pixel analysis which is crucial in many remote sensing applications due to current sensor optics and configuration. One of the most popular endmember extraction algorithms has been the pixel purity index (PPI), available from Kodak's Research Systems ENVI software package. This algorithm is very time consuming, a fact that has generally prevented its exploitation in valid response times in a wide range of applications, including environmental monitoring, military applications or hazard and threat assessment/tracking (including wildland fire detection, oil spill mapping and chemical and biological standoff detection). Field programmable gate arrays (FPGAs) are hardware components with millions of gates. Their reprogrammability and high computational power makes them particularly attractive in remote sensing applications which require a response in near real-time. In this paper, we present an FPGA design for implementation of PPI algorithm which takes advantage of a recently developed fast PPI (FPPI) algorithm that relies on software-based optimization. The proposed FPGA design represents our first step toward the development of a new reconfigurable system for fast, onboard analysis of remotely sensed hyperspectral imagery.
A Synchronization Algorithm and Implementation for High-Speed Block Codes Applications. Part 4
NASA Technical Reports Server (NTRS)
Lin, Shu; Zhang, Yu; Nakamura, Eric B.; Uehara, Gregory T.
1998-01-01
Block codes have trellis structures and decoders amenable to high speed CMOS VLSI implementation. For a given CMOS technology, these structures enable operating speeds higher than those achievable using convolutional codes for only modest reductions in coding gain. As a result, block codes have tremendous potential for satellite trunk and other future high-speed communication applications. This paper describes a new approach for implementation of the synchronization function for block codes. The approach utilizes the output of the Viterbi decoder and therefore employs the strength of the decoder. Its operation requires no knowledge of the signal-to-noise ratio of the received signal, has a simple implementation, adds no overhead to the transmitted data, and has been shown to be effective in simulation for received SNR greater than 2 dB.
NASA Astrophysics Data System (ADS)
Zabolotny, W. M.; Byszuk, A.
2016-03-01
The CMS experiment Level-1 trigger system is undergoing an upgrade. In the barrel-endcap transition region, it is necessary to merge data from 3 types of muon detectors—RPC, DT and CSC. The Overlap Muon Track Finder (OMTF) uses the novel approach to concentrate and process those data in a uniform manner to identify muons and their transversal momentum. The paper presents the algorithm and FPGA firmware implementation of the OMTF and its data transmission system in CMS. It is foreseen that the OMTF will be subject to significant changes resulting from optimization which will be done with the aid of physics simulations. Therefore, a special, high-level, parameterized HDL implementation is necessary.
NASA Astrophysics Data System (ADS)
Ham, Woonchul; Song, Chulgyu; Lee, Kangsan; Badarch, Luubaatar
2015-05-01
In this paper, we introduce the mobile embedded system implemented for capturing stereo image based on two CMOS camera module. We use WinCE as an operating system and capture the stereo image by using device driver for CMOS camera interface and Direct Draw API functions. We aslo comments on the GPU hardware and CUDA programming for implementation of 3D exaggeraion algorithm for ROI by adjusting and synthesizing the disparity value of ROI (region of interest) in real time. We comment on the pattern of aperture for deblurring of CMOS camera module based on the Kirchhoff diffraction formula and clarify the reason why we can get more sharp and clear image by blocking some portion of aperture or geometric sampling. Synthesized stereo image is real time monitored on the shutter glass type three-dimensional LCD monitor and disparity values of each segment are analyzed to prove the validness of emphasizing effect of ROI.
Implementation of a Fractional Model-Based Fault Detection Algorithm into a PLC Controller
NASA Astrophysics Data System (ADS)
Kopka, Ryszard
2014-12-01
This paper presents results related to the implementation of model-based fault detection and diagnosis procedures into a typical PLC controller. To construct the mathematical model and to implement the PID regulator, a non-integer order differential/integral calculation was used. Such an approach allows for more exact control of the process and more precise modelling. This is very crucial in model-based diagnostic methods. The theoretical results were verified on a real object in the form of a supercapacitor connected to a PLC controller by a dedicated electronic circuit controlled directly from the PLC outputs.
NASA Astrophysics Data System (ADS)
Giacometto, F. J.; Vilardy, J. M.; Torres, C. O.; Mattos, L.
2011-01-01
Currently addressing problems related to security in access control, as a consequence, have been developed applications that work under unique characteristics in individuals, such as biometric features. In the world becomes important working with biometric images such as the liveliness of the iris which are for both the pattern of retinal images as your blood vessels. This paper presents an implementation of an algorithm for creating templates for biometric authentication with ocular features for FPGA, in which the object of study is that the texture pattern of iris is unique to each individual. The authentication will be based in processes such as edge extraction methods, segmentation principle of John Daugman and Libor Masek's, and standardization to obtain necessary templates for the search of matches in a database and then get the expected results of authentication.
Formiconi, A R; Passeri, A; Guelfi, M R; Masoni, M; Pupi, A; Meldolesi, U; Malfetti, P; Calori, L; Guidazzoli, A
1997-11-01
Data from Single Photon Emission Computed Tomography (SPECT) studies are blurred by inevitable physical phenomena occurring during data acquisition. These errors may be compensated by means of reconstruction algorithms which take into account accurate physical models of the data acquisition procedure. Unfortunately, this approach involves high memory requirements as well as a high computational burden which cannot be afforded by the computer systems of SPECT acquisition devices. In this work the possibility of accessing High Performance Computing and Networking (HPCN) resources through a World Wide Web interface for the advanced reconstruction of SPECT data in a clinical environment was investigated. An iterative algorithm with an accurate model of the variable system response was ported on the Multiple Instruction Multiple Data (MIMD) parallel architecture of a Cray T3D massively parallel computer. The system was accessible even from low cost PC-based workstations through standard TCP/IP networking. A speedup factor of 148 was predicted by the benchmarks run on the Cray T3D. A complete brain study of 30 (64 x 64) slices was reconstructed from a set of 90 (64 x 64) projections with ten iterations of the conjugate gradients algorithm in 9 s which corresponds to an actual speed-up factor of 135. The technique was extended to a more accurate 3D modeling of the system response for a true 3D reconstruction of SPECT data; the reconstruction time of the same data set with this more accurate model was 5 min. This work demonstrates the possibility of exploiting remote HPCN resources from hospital sites by means of low cost workstations using standard communication protocols and an user-friendly WWW interface without particular problems for routine use. PMID:9506406
The SFXC software correlator for very long baseline interferometry: algorithms and implementation
NASA Astrophysics Data System (ADS)
Keimpema, A.; Kettenis, M. M.; Pogrebenko, S. V.; Campbell, R. M.; Cimó, G.; Duev, D. A.; Eldering, B.; Kruithof, N.; van Langevelde, H. J.; Marchal, D.; Molera Calvés, G.; Ozdemir, H.; Paragi, Z.; Pidopryhora, Y.; Szomoru, A.; Yang, J.
2015-06-01
In this paper a description is given of the SFXC software correlator, developed and maintained at the Joint Institute for VLBI in Europe (JIVE). The software is designed to run on generic Linux-based computing clusters. The correlation algorithm is explained in detail, as are some of the novel modes that software correlation has enabled, such as wide-field VLBI imaging through the use of multiple phase centres and pulsar gating and binning. This is followed by an overview of the software architecture. Finally, the performance of the correlator as a function of number of CPU cores, telescopes and spectral channels is shown.
Parrallel Implementation of Fast Randomized Algorithms for Low Rank Matrix Decomposition
Lucas, Andrew J.; Stalizer, Mark; Feo, John T.
2014-03-01
We analyze the parallel performance of randomized interpolative decomposition by de- composing low rank complex-valued Gaussian random matrices larger than 100 GB. We chose a Cray XMT supercomputer as it provides an almost ideal PRAM model permitting quick investigation of parallel algorithms without obfuscation from hardware idiosyncrasies. We obtain that on non-square matrices performance scales almost linearly with runtime about 100 times faster on 128 processors. We also verify that numerically discovered error bounds still hold on matrices two orders of magnitude larger than those previously tested.
Kanters, René P F; Donald, Kelling J
2014-12-01
A new flexible implementation of a genetic algorithm for locating unique low energy minima of isomers of clusters is described and tested. The strategy employed can be applied to molecular or atomic clusters and has a flexible input structure so that a system with several different elements can be built up from a set of individual atoms or from fragments made up of groups of atoms. This cluster program is tested on several systems, and the results are compared to computational and experimental data from previous studies. The quality of the algorithm for locating reliably the most competitive low energy structures of an assembly of atoms is examined for strongly bound Si-Li clusters, and ZnF2 clusters, and the more weakly interacting water trimers. The use of the nuclear repulsion energy as a duplication criterion, an increasing population size, and avoiding mutation steps without loss of efficacy are distinguishing features of the program. For the Si-Li clusters, a few new low energy minima are identified in the testing of the algorithm, and our results for the metal fluorides and water show very good agreement with the literature. PMID:26583254
NASA Astrophysics Data System (ADS)
Miao, J. C.; Zhu, P.; Shi, G. L.; Chen, G. L.
2008-01-01
A sub-cycling integration algorithm (or named multi-time-steps integration algorithm), which has been successfully applied to FEM dynamical analysis, was firstly presented by Belytschko et al. (Comput Methods Appl Mech Eng 17/18:259-275, 1979). However, the problem of how to apply this type of algorithm to flexible multi-body dynamics (FMD) problems still lacks investigation up to now. Similar to the region-partitioning method used in FEM, this paper presents a central-difference-based sub-cycling integral method by decomposing the variables of an FMD equation into several groups and adopting different integral step sizes to each group of the variables. Based on the condensed form of an FMD equation, a group of common update formulae and a sub-step update formula, which constitute the sub-cycling together, are established in the paper. Furthermore, an implementation flowchart of the sub-cycling is presented. Stability of the sub-cycling will be analyzed and numerical examples will be performed to verify availability and precision of the sub-cycling in part II of the paper.
Simulation of Propellant Loading System Senior Design Implement in Computer Algorithm
NASA Technical Reports Server (NTRS)
Bandyopadhyay, Alak
2010-01-01
Propellant loading from the Storage Tank to the External Tank is one of the very important and time consuming pre-launch ground operations for the launch vehicle. The propellant loading system is a complex integrated system involving many physical components such as the storage tank filled with cryogenic fluid at a very low temperature, the long pipe line connecting the storage tank with the external tank, the external tank along with the flare stack, and vent systems for releasing the excess fuel. Some of the very important parameters useful for design purpose are the prediction of pre-chill time, loading time, amount of fuel lost, the maximum pressure rise etc. The physics involved for mathematical modeling is quite complex due to the fact the process is unsteady, there is phase change as some of the fuel changes from liquid to gas state, then conjugate heat transfer in the pipe walls as well as between solid-to-fluid region. The simulation is very tedious and time consuming too. So overall, this is a complex system and the objective of the work is student's involvement and work in the parametric study and optimization of numerical modeling towards the design of such system. The students have to first become familiar and understand the physical process, the related mathematics and the numerical algorithm. The work involves exploring (i) improved algorithm to make the transient simulation computationally effective (reduced CPU time) and (ii) Parametric study to evaluate design parameters by changing the operational conditions
Lavrentieva, A.; Kontou, P.; Soulountsi, V.; Kioumis, J.; Chrysou, O.; Bitzani, M.
2015-01-01
Summary The purpose of this study was to examine the hypothesis that an algorithm based on serial measurements of procalcitonin (PCT) allows reduction in the duration of antibiotic therapy compared with empirical rules, and does not result in more adverse outcomes in burn patients with infectious complications. All burn patients requiring antibiotic therapy based on confirmed or highly suspected bacterial infections were eligible. Patients were assigned to either a procalcitonin-guided (study group) or a standard (control group) antibiotic regimen. The following variables were analyzed and compared in both groups: duration of antibiotic treatment, mortality rate, percentage of patients with relapse or superinfection, maximum SOFA score (days 1-28), length of ICU and hospital stay. A total of 46 Burn ICU patients receiving antibiotic therapy were enrolled in this study. In 24 patients antibiotic therapy was guided by daily procalcitonin and clinical assessment. PCT guidance resulted in a smaller antibiotic exposure (10.1±4 vs. 15.3±8 days, p=0.034) without negative effects on clinical outcome characteristics such as mortality rate, percentage of patients with relapse or superinfection, maximum SOFA score, length of ICU and hospital stay. The findings thus show that use of a procalcitonin-guided algorithm for antibiotic therapy in the burn intensive care unit may contribute to the reduction of antibiotic exposure without compromising clinical outcome parameters. PMID:27279801
NASA Technical Reports Server (NTRS)
Li, Wei; Saleeb, Atef F.
1995-01-01
This two-part report is concerned with the development of a general framework for the implicit time-stepping integrators for the flow and evolution equations in generalized viscoplastic models. The primary goal is to present a complete theoretical formulation, and to address in detail the algorithmic and numerical analysis aspects involved in its finite element implementation, as well as to critically assess the numerical performance of the developed schemes in a comprehensive set of test cases. On the theoretical side, the general framework is developed on the basis of the unconditionally-stable, backward-Euler difference scheme as a starting point. Its mathematical structure is of sufficient generality to allow a unified treatment of different classes of viscoplastic models with internal variables. In particular, two specific models of this type, which are representative of the present start-of-art in metal viscoplasticity, are considered in applications reported here; i.e., fully associative (GVIPS) and non-associative (NAV) models. The matrix forms developed for both these models are directly applicable for both initially isotropic and anisotropic materials, in general (three-dimensional) situations as well as subspace applications (i.e., plane stress/strain, axisymmetric, generalized plane stress in shells). On the computational side, issues related to efficiency and robustness are emphasized in developing the (local) interative algorithm. In particular, closed-form expressions for residual vectors and (consistent) material tangent stiffness arrays are given explicitly for both GVIPS and NAV models, with their maximum sizes 'optimized' to depend only on the number of independent stress components (but independent of the number of viscoplastic internal state parameters). Significant robustness of the local iterative solution is provided by complementing the basic Newton-Raphson scheme with a line-search strategy for convergence. In the present second part of
MPI implementation of a generalized implicit algorithm for multi-dimensional PIC simulations
NASA Astrophysics Data System (ADS)
Petrov, George; Davis, Jack
2012-10-01
The implicit 2D3V particle-in-cell (PIC) code developed to study the interaction of short pulse lasers with matter [G. M. Petrov and J. Davis, Computer Phys. Comm. 179, 868 (2008); Phys. Plasmas 18, 073102 (2011)] has been parallelized using MPI (Message Passing Interface). Performance evaluation has been made on a Linux cluster for two typical regimes of PIC operation: ``particle dominated,'' for which the bulk of the computation time is spent on pushing particles, and ``field dominated,'' for which computing the fields is prevalent. The MPI implementation of the code offers a significant numerical speedup, particularly in the ``particle dominated'' regime, which will allow extension to three dimensions and implementation of atomic physics.
NASA Astrophysics Data System (ADS)
Lohmann, Timo
Electric sector models are powerful tools that guide policy makers and stakeholders. Long-term power generation expansion planning models are a prominent example and determine a capacity expansion for an existing power system over a long planning horizon. With the changes in the power industry away from monopolies and regulation, the focus of these models has shifted to competing electric companies maximizing their profit in a deregulated electricity market. In recent years, consumers have started to participate in demand response programs, actively influencing electricity load and price in the power system. We introduce a model that features investment and retirement decisions over a long planning horizon of more than 20 years, as well as an hourly representation of day-ahead electricity markets in which sellers of electricity face buyers. This combination makes our model both unique and challenging to solve. Decomposition algorithms, and especially Benders decomposition, can exploit the model structure. We present a novel method that can be seen as an alternative to generalized Benders decomposition and relies on dynamic linear overestimation. We prove its finite convergence and present computational results, demonstrating its superiority over traditional approaches. In certain special cases of our model, all necessary solution values in the decomposition algorithms can be directly calculated and solving mathematical programming problems becomes entirely obsolete. This leads to highly efficient algorithms that drastically outperform their programming problem-based counterparts. Furthermore, we discuss the implementation of all tailored algorithms and the challenges from a modeling software developer's standpoint, providing an insider's look into the modeling language GAMS. Finally, we apply our model to the Texas power system and design two electricity policies motivated by the U.S. Environment Protection Agency's recently proposed CO2 emissions targets for the
Algorithmic complexity for psychology: a user-friendly implementation of the coding theorem method.
Gauvrit, Nicolas; Singmann, Henrik; Soler-Toscano, Fernando; Zenil, Hector
2016-03-01
Kolmogorov-Chaitin complexity has long been believed to be impossible to approximate when it comes to short sequences (e.g. of length 5-50). However, with the newly developed coding theorem method the complexity of strings of length 2-11 can now be numerically estimated. We present the theoretical basis of algorithmic complexity for short strings (ACSS) and describe an R-package providing functions based on ACSS that will cover psychologists' needs and improve upon previous methods in three ways: (1) ACSS is now available not only for binary strings, but for strings based on up to 9 different symbols, (2) ACSS no longer requires time-consuming computing, and (3) a new approach based on ACSS gives access to an estimation of the complexity of strings of any length. Finally, three illustrative examples show how these tools can be applied to psychology. PMID:25761393
Biffi, E; Ghezzi, D; Pedrocchi, A; Ferrigno, G
2010-01-01
Neurons cultured in vitro on MicroElectrode Array (MEA) devices connect to each other, forming a network. To study electrophysiological activity and long term plasticity effects, long period recording and spike sorter methods are needed. Therefore, on-line and real time analysis, optimization of memory use and data transmission rate improvement become necessary. We developed an algorithm for amplitude-threshold spikes detection, whose performances were verified with (a) statistical analysis on both simulated and real signal and (b) Big O Notation. Moreover, we developed a PCA-hierarchical classifier, evaluated on simulated and real signal. Finally we proposed a spike detection hardware design on FPGA, whose feasibility was verified in terms of CLBs number, memory occupation and temporal requirements; once realized, it will be able to execute on-line detection and real time waveform analysis, reducing data storage problems. PMID:20300592
VLSI implementation of a new LMS-based algorithm for noise removal in ECG signal
NASA Astrophysics Data System (ADS)
Satheeskumaran, S.; Sabrigiriraj, M.
2016-06-01
Least mean square (LMS)-based adaptive filters are widely deployed for removing artefacts in electrocardiogram (ECG) due to less number of computations. But they posses high mean square error (MSE) under noisy environment. The transform domain variable step-size LMS algorithm reduces the MSE at the cost of computational complexity. In this paper, a variable step-size delayed LMS adaptive filter is used to remove the artefacts from the ECG signal for improved feature extraction. The dedicated digital Signal processors provide fast processing, but they are not flexible. By using field programmable gate arrays, the pipelined architectures can be used to enhance the system performance. The pipelined architecture can enhance the operation efficiency of the adaptive filter and save the power consumption. This technique provides high signal-to-noise ratio and low MSE with reduced computational complexity; hence, it is a useful method for monitoring patients with heart-related problem.
NASA Technical Reports Server (NTRS)
Mandy, Christophe P.; Sakamoto, Hiraku; Saenz-Otero, Alvar; Miller, David W.
2007-01-01
The MIT's Space Systems Laboratory developed the Synchronized Position Hold Engage and Reorient Experimental Satellites (SPHERES) as a risk-tolerant spaceborne facility to develop and mature control, estimation, and autonomy algorithms for distributed satellite systems for applications such as satellite formation flight. Tests performed study interferometric mission-type formation flight maneuvers in deep space. These tests consist of having the satellites trace a coordinated trajectory under tight control that would allow simulated apertures to constructively interfere observed light and measure the resulting increase in angular resolution. This paper focuses on formation initialization (establishment of a formation using limited field of view relative sensors), formation coordination (synchronization of the different satellite s motion) and fuel-balancing among the different satellites.
Pre-Hardware Optimization of Spacecraft Image Processing Algorithms and Hardware Implementation
NASA Technical Reports Server (NTRS)
Kizhner, Semion; Petrick, David J.; Flatley, Thomas P.; Hestnes, Phyllis; Jentoft-Nilsen, Marit; Day, John H. (Technical Monitor)
2002-01-01
Spacecraft telemetry rates and telemetry product complexity have steadily increased over the last decade presenting a problem for real-time processing by ground facilities. This paper proposes a solution to a related problem for the Geostationary Operational Environmental Spacecraft (GOES-8) image data processing and color picture generation application. Although large super-computer facilities are the obvious heritage solution, they are very costly, making it imperative to seek a feasible alternative engineering solution at a fraction of the cost. The proposed solution is based on a Personal Computer (PC) platform and synergy of optimized software algorithms, and reconfigurable computing hardware (RC) technologies, such as Field Programmable Gate Arrays (FPGA) and Digital Signal Processors (DSP). It has been shown that this approach can provide superior inexpensive performance for a chosen application on the ground station or on-board a spacecraft.
Alteration of Box-Jenkins methodology by implementing genetic algorithm method
NASA Astrophysics Data System (ADS)
Ismail, Zuhaimy; Maarof, Mohd Zulariffin Md; Fadzli, Mohammad
2015-02-01
A time series is a set of values sequentially observed through time. The Box-Jenkins methodology is a systematic method of identifying, fitting, checking and using integrated autoregressive moving average time series model for forecasting. Box-Jenkins method is an appropriate for a medium to a long length (at least 50) time series data observation. When modeling a medium to a long length (at least 50), the difficulty arose in choosing the accurate order of model identification level and to discover the right parameter estimation. This presents the development of Genetic Algorithm heuristic method in solving the identification and estimation models problems in Box-Jenkins. Data on International Tourist arrivals to Malaysia were used to illustrate the effectiveness of this proposed method. The forecast results that generated from this proposed model outperformed single traditional Box-Jenkins model.
Implementation and optimization of ultrasound signal processing algorithms on mobile GPU
NASA Astrophysics Data System (ADS)
Kong, Woo Kyu; Lee, Wooyoul; Kim, Kyu Cheol; Yoo, Yangmo; Song, Tai-Kyong
2014-03-01
A general-purpose graphics processing unit (GPGPU) has been used for improving computing power in medical ultrasound imaging systems. Recently, a mobile GPU becomes powerful to deal with 3D games and videos at high frame rates on Full HD or HD resolution displays. This paper proposes the method to implement ultrasound signal processing on a mobile GPU available in the high-end smartphone (Galaxy S4, Samsung Electronics, Seoul, Korea) with programmable shaders on the OpenGL ES 2.0 platform. To maximize the performance of the mobile GPU, the optimization of shader design and load sharing between vertex and fragment shader was performed. The beamformed data were captured from a tissue mimicking phantom (Model 539 Multipurpose Phantom, ATS Laboratories, Inc., Bridgeport, CT, USA) by using a commercial ultrasound imaging system equipped with a research package (Ultrasonix Touch, Ultrasonix, Richmond, BC, Canada). The real-time performance is evaluated by frame rates while varying the range of signal processing blocks. The implementation method of ultrasound signal processing on OpenGL ES 2.0 was verified by analyzing PSNR with MATLAB gold standard that has the same signal path. CNR was also analyzed to verify the method. From the evaluations, the proposed mobile GPU-based processing method has no significant difference with the processing using MATLAB (i.e., PSNR<52.51 dB). The comparable results of CNR were obtained from both processing methods (i.e., 11.31). From the mobile GPU implementation, the frame rates of 57.6 Hz were achieved. The total execution time was 17.4 ms that was faster than the acquisition time (i.e., 34.4 ms). These results indicate that the mobile GPU-based processing method can support real-time ultrasound B-mode processing on the smartphone.
NASA Technical Reports Server (NTRS)
Maine, R. E.; Iliff, K. W.
1980-01-01
A new formulation is proposed for the problem of parameter estimation of dynamic systems with both process and measurement noise. The formulation gives estimates that are maximum likelihood asymptotically in time. The means used to overcome the difficulties encountered by previous formulations are discussed. It is then shown how the proposed formulation can be efficiently implemented in a computer program. A computer program using the proposed formulation is available in a form suitable for routine application. Examples with simulated and real data are given to illustrate that the program works well.
Implementation of ILLIAC 4 algorithms for multispectral image interpretation. [earth resources data
NASA Technical Reports Server (NTRS)
Ray, R. M.; Thomas, J. D.; Donovan, W. E.; Swain, P. H.
1974-01-01
Research has focused on the design and partial implementation of a comprehensive ILLIAC software system for computer-assisted interpretation of multispectral earth resources data such as that now collected by the Earth Resources Technology Satellite. Research suggests generally that the ILLIAC 4 should be as much as two orders of magnitude more cost effective than serial processing computers for digital interpretation of ERTS imagery via multivariate statistical classification techniques. The potential of the ARPA Network as a mechanism for interfacing geographically-dispersed users to an ILLIAC 4 image processing facility is discussed.
G0W0 implementation using Lanczos algorithm and Sternheimer equation
NASA Astrophysics Data System (ADS)
Laflamme Janssen, Jonathan; Berube, Nicolas; Antonius, Gabriel; Cote, Michel
2012-02-01
The G0W0 approach is an accurate method to give a physical meaning to the eigenvalues obtained in adensity-functional theory (DFT) calculation.However, the calculation of such corrections with plane wave codes is currently prohibitive for systems with more than a few hundreds of electrons. What limits calculations to this system size is the need in current implementations to invert the dielectric matrix and the need to carry out summations over conduction bands. This talk presents a strategy to avoid both of these bottlenecks. In traditional plane wave implementations of G0W0, the dielectric matrix is expressed in a plane wave basis, which needs to be relatively big to properly describe the matrix. Here, we will explain how a Lanczos basis can be generated to substantially reduce the size of the matrix. Also, the number of conduction bands needed to reach convergence in the summations is usually an order of magnitude larger than the number of valence bands. Here, the calculation of the conduction states is avoided by reformulating the summations into linear equation problems (Sternheimer equations), which also substantially reduces the computation time. Preliminary results will be presented.
NASA Astrophysics Data System (ADS)
Baba, Toshitaka; Takahashi, Narumi; Kaneda, Yoshiyuki; Ando, Kazuto; Matsuoka, Daisuke; Kato, Toshihiro
2015-12-01
Because of improvements in offshore tsunami observation technology, dispersion phenomena during tsunami propagation have often been observed in recent tsunamis, for example the 2004 Indian Ocean and 2011 Tohoku tsunamis. The dispersive propagation of tsunamis can be simulated by use of the Boussinesq model, but the model demands many computational resources. However, rapid progress has been made in parallel computing technology. In this study, we investigated a parallelized approach for dispersive tsunami wave modeling. Our new parallel software solves the nonlinear Boussinesq dispersive equations in spherical coordinates. A variable nested algorithm was used to increase spatial resolution in the target region. The software can also be used to predict tsunami inundation on land. We used the dispersive tsunami model to simulate the 2011 Tohoku earthquake on the Supercomputer K. Good agreement was apparent between the dispersive wave model results and the tsunami waveforms observed offshore. The finest bathymetric grid interval was 2/9 arcsec (approx. 5 m) along longitude and latitude lines. Use of this grid simulated tsunami soliton fission near the Sendai coast. Incorporating the three-dimensional shape of buildings and structures led to improved modeling of tsunami inundation.
Implementation of the Canny Edge Detection algorithm for a stereo vision system
Wang, J.R.; Davis, T.A.; Lee, G.K.
1996-12-31
There exists many applications in which three-dimensional information is necessary. For example, in manufacturing systems, parts inspection may require the extraction of three-dimensional information from two-dimensional images, through the use of a stereo vision system. In medical applications, one may wish to reconstruct a three-dimensional image of a human organ from two or more transducer images. An important component of three-dimensional reconstruction is edge detection, whereby an image boundary is separated from background, for further processing. In this paper, a modification of the Canny Edge Detection approach is suggested to extract an image from a cluttered background. The resulting cleaned image can then be sent to the image matching, interpolation and inverse perspective transformation blocks to reconstruct the 3-D scene. A brief discussion of the stereo vision system that has been developed at the Mars Mission Research Center (MMRC) is also presented. Results of a version of Canny Edge Detection algorithm show promise as an accurate edge extractor which may be used in the edge-pixel based binocular stereo vision system.
NASA Astrophysics Data System (ADS)
Hatzaki, Maria; Flocas, Elena A.; Simmonds, Ian; Kouroutzoglou, John; Keay, Kevin; Rudeva, Irina
2013-04-01
Migratory cyclones and anticyclones mainly account for the short-term weather variations in extra-tropical regions. By contrast to cyclones that have drawn major scientific attention due to their direct link to active weather and precipitation, climatological studies on anticyclones are limited, even though they also are associated with extreme weather phenomena and play an important role in global and regional climate. This is especially true for the Mediterranean, a region particularly vulnerable to climate change, and the little research which has been done is essentially confined to the manual analysis of synoptic charts. For the construction of a comprehensive climatology of migratory anticyclonic systems in the Mediterranean using an objective methodology, the Melbourne University automatic tracking algorithm is applied, based to the ERA-Interim reanalysis mean sea level pressure database. The algorithm's reliability in accurately capturing the weather patterns and synoptic climatology of the transient activity has been widely proven. This algorithm has been extensively applied for cyclone studies worldwide and it has been also successfully applied for the Mediterranean, though its use for anticyclone tracking is limited to the Southern Hemisphere. In this study the performance of the tracking algorithm under different data resolutions and different choices of parameter settings in the scheme is examined. Our focus is on the appropriate modification of the algorithm in order to efficiently capture the individual characteristics of the anticyclonic tracks in the Mediterranean, a closed basin with complex topography. We show that the number of the detected anticyclonic centers and the resulting tracks largely depend upon the data resolution and the search radius. We also find that different scale anticyclones and secondary centers that lie within larger anticyclone structures can be adequately represented; this is important, since the extensions of major
Implementation of the SU(2) Hamiltonian symmetry for the DMRG algorithm
NASA Astrophysics Data System (ADS)
Alvarez, Gonzalo
2012-10-01
In the Density Matrix Renormalization Group (DMRG) algorithm (White, 1992, 1993) [1,2], Hamiltonian symmetries play an important rôle. Using symmetries, the matrix representation of the Hamiltonian can be blocked. Diagonalizing each matrix block is more efficient than diagonalizing the original matrix. This paper explains how the the DMRG++ code (Alvarez, 2009) [3] has been extended to handle the non-local SU(2) symmetry in a model independent way. Improvements in CPU times compared to runs with only local symmetries are discussed for the one-orbital Hubbard model, and for a two-orbital Hubbard model for iron-based superconductors. The computational bottleneck of the algorithm and the use of shared memory parallelization are also addressed. Program summary Program title: DMRG++ Catalog identifier: AEDJ_v2_0 Program summary URL: http://cpc.cs.qub.ac.uk/summaries/AEDJ_v2_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Special license. See http://cpc.cs.qub.ac.uk/licence/AEDJ_v2_0.html No. of lines in distributed program, including test data, etc.: 211560 No. of bytes in distributed program, including test data, etc.: 10572185 Distribution format: tar.gz Programming language: C++. Computer: PC. Operating system: Multiplatform, tested on Linux. Has the code been vectorized or parallelized?: Yes. 1 to 8 processors with MPI, 2 to 4 cores with pthreads. RAM: 1GB (256MB is enough to run the included test) Classification: 23. Catalog identifier of previous version: AEDJ_v1_0 Journal reference of previous version: Comput. Phys. Comm. 180(2009)1572 External routines: BLAS and LAPACK Nature of problem: Strongly correlated electrons systems, display a broad range of important phenomena, and their study is a major area of research in condensed matter physics. In this context, model Hamiltonians are used to simulate the relevant interactions of a given compound, and the relevant degrees of freedom. These studies
NASA Astrophysics Data System (ADS)
Karabiber, Fethullah; Grassi, Giuseppe; Vecchio, Pietro; Arik, Sabri; Yalcin, M. Erhan
2011-01-01
Based on the cellular neural network (CNN) paradigm, the bio-inspired (bi-i) cellular vision system is a computing platform consisting of state-of-the-art sensing, cellular sensing-processing and digital signal processing. This paper presents the implementation of a novel CNN-based segmentation algorithm onto the bi-i system. The experimental results, carried out for different benchmark video sequences, highlight the feasibility of the approach, which provides a frame rate of about 26 frame/sec. Comparisons with existing CNN-based methods show that, even though these methods are from two to six times faster than the proposed one, the conceived approach is more accurate and, consequently, represents a satisfying trade-off between real-time requirements and accuracy.
NASA Astrophysics Data System (ADS)
Cua, G. B.; Fischer, M.; Heimers, S.; Clinton, J. F.; Diehl, T.; Kaestli, P.; Becker, J.; Saul, J.
2011-12-01
The Virtual Seismologist (VS) earthquake early warning (EEW) methodology is a Bayesian approach to EEW, wherein the most probable source estimate at any given time is a combination of contributions from a likelihood function that evolves in response to incoming data from the on-going earthquake, and selected prior information, which can include factors such as network topology, the Gutenberg-Richter relationship or previously observed seismicity. The VS algorithm, implemented by the Swiss Seismological Service (SED) at ETH Zurich, is a fundamental component of the California Integrated Seismic Network (CISN) ShakeAlert system, and has been thoroughly tested in real-time the Southern California Seismic Network since July 2008, and at the Northern California Seismic Network since February 2009. SeisComP3 (SC3) is a fully featured automated real-time earthquake monitoring software developed by GeoForschungZentrum Potsdam in collaboration with commercial partner, gempa GmbH. It is becoming a community standard software for earthquake detection and waveform processing for regional and global networks across the globe, including at the SED. As part of efforts in the development of real-time seismology tools supported by the Network of European Research Infrastructures for Earthquake Risk Assessment and Mitigation (NERA), the VS EEW algorithm has been implemented within the SeisComP3 framework. We discuss the software design and development, as well as testing and performance evaluation on real-time and archived waveform data from the SED. The "VS in SC3" effort facilitates the seamless integration of real-time EEW within standard network processing at SED, as well as the distribution, installation, and real-time testing of the VS codes at various European networks, in particular, real-time test sites at Naples, Istanbul, Patras, and Iceland planned as part of FP7 project REAKT "Strategies and Tools for Real-Time Earthquake Risk Mitigation".
Komorkiewicz, Mateusz; Kryjak, Tomasz; Gorgon, Marek
2014-01-01
This article presents an efficient hardware implementation of the Horn-Schunck algorithm that can be used in an embedded optical flow sensor. An architecture is proposed, that realises the iterative Horn-Schunck algorithm in a pipelined manner. This modification allows to achieve data throughput of 175 MPixels/s and makes processing of Full HD video stream (1, 920 × 1, 080 @ 60 fps) possible. The structure of the optical flow module as well as pre- and post-filtering blocks and a flow reliability computation unit is described in details. Three versions of optical flow modules, with different numerical precision, working frequency and obtained results accuracy are proposed. The errors caused by switching from floating- to fixed-point computations are also evaluated. The described architecture was tested on popular sequences from an optical flow dataset of the Middlebury University. It achieves state-of-the-art results among hardware implementations of single scale methods. The designed fixed-point architecture achieves performance of 418 GOPS with power efficiency of 34 GOPS/W. The proposed floating-point module achieves 103 GFLOPS, with power efficiency of 24 GFLOPS/W. Moreover, a 100 times speedup compared to a modern CPU with SIMD support is reported. A complete, working vision system realized on Xilinx VC707 evaluation board is also presented. It is able to compute optical flow for Full HD video stream received from an HDMI camera in real-time. The obtained results prove that FPGA devices are an ideal platform for embedded vision systems. PMID:24526303
Komorkiewicz, Mateusz; Kryjak, Tomasz; Gorgon, Marek
2014-01-01
This article presents an efficient hardware implementation of the Horn-Schunck algorithm that can be used in an embedded optical flow sensor. An architecture is proposed, that realises the iterative Horn-Schunck algorithm in a pipelined manner. This modification allows to achieve data throughput of 175 MPixels/s and makes processing of Full HD video stream (1; 920 × 1; 080 @ 60 fps) possible. The structure of the optical flow module as well as pre- and post-filtering blocks and a flow reliability computation unit is described in details. Three versions of optical flow modules, with different numerical precision, working frequency and obtained results accuracy are proposed. The errors caused by switching from floating- to fixed-point computations are also evaluated. The described architecture was tested on popular sequences from an optical flow dataset of the Middlebury University. It achieves state-of-the-art results among hardware implementations of single scale methods. The designed fixed-point architecture achieves performance of 418 GOPS with power efficiency of 34 GOPS/W. The proposed floating-point module achieves 103 GFLOPS, with power efficiency of 24 GFLOPS/W. Moreover, a 100 times speedup compared to a modern CPU with SIMD support is reported. A complete, working vision system realized on Xilinx VC707 evaluation board is also presented. It is able to compute optical flow for Full HD video stream received from an HDMI camera in real-time. The obtained results prove that FPGA devices are an ideal platform for embedded vision systems. PMID:24526303
Allen, William J; Rizzo, Robert C
2014-02-24
False negative docking outcomes for highly symmetric molecules are a barrier to the accurate evaluation of docking programs, scoring functions, and protocols. This work describes an implementation of a symmetry-corrected root-mean-square deviation (RMSD) method into the program DOCK based on the Hungarian algorithm for solving the minimum assignment problem, which dynamically assigns atom correspondence in molecules with symmetry. The algorithm adds only a trivial amount of computation time to the RMSD calculations and is shown to increase the reported overall docking success rate by approximately 5% when tested over 1043 receptor-ligand systems. For some families of protein systems the results are even more dramatic, with success rate increases up to 16.7%. Several additional applications of the method are also presented including as a pairwise similarity metric to compare molecules during de novo design, as a scoring function to rank-order virtual screening results, and for the analysis of trajectories from molecular dynamics simulation. The new method, including source code, is available to registered users of DOCK6 ( http://dock.compbio.ucsf.edu ). PMID:24410429
NASA Astrophysics Data System (ADS)
Baker, B. I.; Friberg, P. A.
2014-12-01
Modern seismic networks typically deploy three component (3C) sensors, but still fail to utilize all of the information available in the seismograms when performing automated phase picking for real-time event location. In most cases a variation on a short term over long term average threshold detector is used for picking and then an association program is used to assign phase types to the picks. However, the 3C waveforms from an earthquake contain an abundance of information related to the P and S phases in both their polarization and energy partitioning. An approach that has been overlooked and has demonstrated encouraging results is the Component Energy Comparison Method (CECM) by Nagano et al. as published in Geophysics 1989. CECM is well suited to being used in real-time because the calculation is not computationally intensive. Furthermore, the CECM method has fewer tuning variables (3) than traditional pickers in Earthworm such as the Rex Allen algorithm (N=18) or even the Anthony Lomax Filter Picker module (N=5). In addition to computing the CECM detector we study the detector sensitivity by rotating the signal into principle components as well as estimating the P phase onset from a curvature function describing the CECM as opposed to the CECM itself. We present our results implementing this algorithm in a real-time module for Earthworm and show the improved phase picks as compared to the traditional single component pickers using Earthworm.
2015-01-01
False negative docking outcomes for highly symmetric molecules are a barrier to the accurate evaluation of docking programs, scoring functions, and protocols. This work describes an implementation of a symmetry-corrected root-mean-square deviation (RMSD) method into the program DOCK based on the Hungarian algorithm for solving the minimum assignment problem, which dynamically assigns atom correspondence in molecules with symmetry. The algorithm adds only a trivial amount of computation time to the RMSD calculations and is shown to increase the reported overall docking success rate by approximately 5% when tested over 1043 receptor–ligand systems. For some families of protein systems the results are even more dramatic, with success rate increases up to 16.7%. Several additional applications of the method are also presented including as a pairwise similarity metric to compare molecules during de novo design, as a scoring function to rank-order virtual screening results, and for the analysis of trajectories from molecular dynamics simulation. The new method, including source code, is available to registered users of DOCK6 (http://dock.compbio.ucsf.edu). PMID:24410429
An implementation of the TFQMR - algorithm on a distributed memory machine
Buecker, M.
1994-12-31
A fundamental task of numerical computing is to solve linear systems A{rvec x} = {rvec b}, which can be labeled as (1). Such systems are essential parts of many scientific problems, e.g. finite element methods contain linear systems to approximate partial differential equations. Linear systems can be solved by direct as well as iterative methods. Using a direct method means to factorize the coefficient matrix. A typical and well known direct method is the Gaussian elimination. Direct methods work well as long as the systems remain small. Unfortunately, there is need for the solution of large linear systems. When systems get large direct methods result in enormous computing time because of their complexity. As an example take (1) with a N x N matrix A. Then the solution of (1) can be computed by Gaussian elimination requiring O(N{sup 3}) operations. Additionally, an implementation of a direct method has to take care of the high storage requirement. In contrast to direct methods, iterative methods use successive approximations to obtain more accurate solutions to a linear system at each step. The iteration process generates a sequence of iterates x{sub n} converging to the solution x of (1). The iteration process ends if either x{sub n} fulfills a chosen convergence criterion or breakdowns, i.e. division by zero, occur. Possible breakdowns can be avoided by look-ahead techniques.
Automated infrasound signal detection algorithms implemented in MatSeis - Infra Tool.
Hart, Darren
2004-07-01
MatSeis's infrasound analysis tool, Infra Tool, uses frequency slowness processing to deconstruct the array data into three outputs per processing step: correlation, azimuth and slowness. Until now, an experienced analyst trained to recognize a pattern observed in outputs from signal processing manually accomplished infrasound signal detection. Our goal was to automate the process of infrasound signal detection. The critical aspect of infrasound signal detection is to identify consecutive processing steps where the azimuth is constant (flat) while the time-lag correlation of the windowed waveform is above background value. These two statements describe the arrival of a correlated set of wavefronts at an array. The Hough Transform and Inverse Slope methods are used to determine the representative slope for a specified number of azimuth data points. The representative slope is then used in conjunction with associated correlation value and azimuth data variance to determine if and when an infrasound signal was detected. A format for an infrasound signal detection output file is also proposed. The detection output file will list the processed array element names, followed by detection characteristics for each method. Each detection is supplied with a listing of frequency slowness processing characteristics: human time (YYYY/MM/DD HH:MM:SS.SSS), epochal time, correlation, fstat, azimuth (deg) and trace velocity (km/s). As an example, a ground truth event was processed using the four-element DLIAR infrasound array located in New Mexico. The event is known as the Watusi chemical explosion, which occurred on 2002/09/28 at 21:25:17 with an explosive yield of 38,000 lb TNT equivalent. Knowing the source and array location, the array-to-event distance was computed to be approximately 890 km. This test determined the station-to-event azimuth (281.8 and 282.1 degrees) to within 1.6 and 1.4 degrees for the Inverse Slope and Hough Transform detection algorithms, respectively, and
Soares, Thereza A.; Alexov, Emil
2014-01-01
Four chemotypes of the Rough lipopolysaccharides (LPS) membrane from Pseudomonas aeruginosa were investigated by a combined approach of explicit water molecular dynamics (MD) simulations and Poisson-Boltzmann continuum electrostatics with the goal to deliver the distribution of the electrostatic potential across the membrane. For the purpose of this investigation, a new tool for modeling the electrostatic potential profile along the axis normal to the membrane, MEMPOT, was developed and implemented in DelPhi. Applying MEMPOT on the snapshots obtained by MD simulations, two observations were made: (a) the average electrostatic potential has a complex profile, but is mostly positive inside the membrane due to the presence of Ca2+ ions which overcompensate for the negative potential created by lipid phosphate groups; and (b) correct modeling of the electrostatic potential profile across the membrane requires taking into account the water phase, while neglecting it (vacuum calculations) results in dramatic changes including a reversal of the sign of the potential inside the membrane. Furthermore, using DelPhi to assign different dielectric constants for different regions of the LPS membranes, it was investigated whether a single frame structure before MD simulations with appropriate dielectric constants for the lipid tails, inner, and the external leaflet regions, can deliver the same average electrostatic potential distribution as obtained from the MD-generated ensemble of structures. Indeed, this can be attained by using smaller dielectric constant for the tail and inner leaflet regions (mostly hydrophobic) than for the external leaflet region (hydrophilic) and the optimal dielectric constant values are chemotype-specific. PMID:24799021
Implementation of a parallel algorithm for thermo-chemical nonequilibrium flow simulations
Wong, C.C.; Blottner, F.G.; Payne, J.L.; Soetrisno, M.
1995-01-01
Massively parallel (MP) computing is considered to be the future direction of high performance computing. When engineers apply this new MP computing technology to solve large-scale problems, one major interest is what is the maximum problem size that a MP computer can handle. To determine the maximum size, it is important to address the code scalability issue. Scalability implies whether the code can provide an increase in performance proportional to an increase in problem size. If the size of the problem increases, by utilizing more computer nodes, the ideal elapsed time to simulate a problem should not increase much. Hence one important task in the development of the MP computing technology is to ensure scalability. A scalable code is an efficient code. In order to obtain good scaled performance, it is necessary to first have the code optimized for a single node performance before proceeding to a large-scale simulation with a large number of computer nodes. This paper will discuss the implementation of a massively parallel computing strategy and the process of optimization to improve the scaled performance. Specifically, we will look at domain decomposition, resource management in the code, communication overhead, and problem mapping. By incorporating these improvements and adopting an efficient MP computing strategy, an efficiency of about 85% and 96%, respectively, has been achieved using 64 nodes on MP computers for both perfect gas and chemically reactive gas problems. A comparison of the performance between MP computers and a vectorized computer, such as Cray-YMP, will also be presented.
NASA Astrophysics Data System (ADS)
Hohil, Myron E.; Desai, Sachi; Morcos, Amir
2006-05-01
(PAWSS) Limited Objective Experiment (LOE) conducted by Joint Project Manager for Nuclear Biological Contamination Avoidance (JPM NBC CA) and a matrixed team from Edgewood Chemical and Biological Center (ECBC) at ranges exceeding 3km. The details of the fieldtest experiment and real-time implementation/integration of the standalone acoustic sensor system are discussed herein.
NASA Astrophysics Data System (ADS)
Hohil, Myron E.; Desai, Sachi; Morcos, Amir
2006-09-01
(PAWSS) Limited Objective Experiment (LOE) conducted by Joint Project Manager for Nuclear Biological Contamination Avoidance (JPM NBC CA) and a matrixed team from Edgewood Chemical and Biological Center (ECBC) at ranges exceeding 3km. The details of the field-test experiment and real-time implementation/integration of the stand-alone acoustic sensor system are discussed herein.
Ariyawansa, K.A.; Hudson, D.D.
1989-01-01
We describe a benchmark parallel version of the Van Slyke and Wets algorithm for two-stage stochastic programs and an implementation of that algorithm on the Sequent/Balance. We also report results of a numerical experiment using random test problems and our implementation. These performance results, to the best of our knowledge, are the first available for the Van Slyke and Wets algorithm on a parallel processor. They indicate that the benchmark implementation parallelizes well, and that even with the use of parallel processing, problems with random variables having large numbers of realizations can take prohibitively large amounts of computation for solution. Thus, they demonstrate the need for exploiting both parallelization and approximation for the solution of stochastic programs. 15 refs., 18 tabs.
Gan, Yixiang; Kamlah, Marc
2008-07-01
In this investigation, a thermo-mechanical model of pebble beds is adopted and developed based on experiments by Dr. Reimann at Forschungszentrum Karlsruhe (FZK). The framework of the present material model is composed of a non-linear elastic law, the Drucker-Prager-Cap theory, and a modified creep law. Furthermore, the volumetric inelastic strain dependent thermal conductivity of beryllium pebble beds is taken into account and full thermo-mechanical coupling is considered. Investigation showed that the Drucker-Prager-Cap model implemented in ABAQUS can not fulfill the requirements of both the prediction of large creep strains and the hardening behaviour caused by creep, which are of importance with respect to the application of pebble beds in fusion blankets. Therefore, UMAT (user defined material's mechanical behaviour) and UMATHT (user defined material's thermal behaviour) routines are used to re-implement the present thermo-mechanical model in ABAQUS. An elastic predictor radial return mapping algorithm is used to solve the non-associated plasticity iteratively, and a proper tangent stiffness matrix is obtained for cost-efficiency in the calculation. An explicit creep mechanism is adopted for the prediction of time-dependent behaviour in order to represent large creep strains in high temperature. Finally, the thermo-mechanical interactions are implemented in a UMATHT routine for the coupled analysis. The oedometric compression tests and creep tests of pebble beds at different temperatures are simulated with the help of the present UMAT and UMATHT routines, and the comparison between the simulation and the experiments is made. (authors)
Kanade, T.; Webb, J.A.
1987-06-01
The parallel-vision algorithm design and implementation project was established to facilitate vision programming on parallel architectures, particularly low-level vision and robot vehicle-control algorithms on the Carnegie Mellon Warp machine. To this end, the author have (1) demonstrated the use of the Warp machine in several different algorithms; (2) developed a specialized programming language, called Apply, for low-level vision programming on parallel architectures in general, and Warp in particular; (3) used Warp as a research tool in vision, as opposed to using it only for research in parallel vision; and (4) developed a significant library of low-level vision programs for use on Warp.
Sebastiani, Giada
2009-01-01
Chronic hepatitis B and C together with alcoholic and non-alcoholic fatty liver diseases represent the major causes of progressive liver disease that can eventually evolve into cirrhosis and its end-stage complications, including decompensation, bleeding and liver cancer. Formation and accumulation of fibrosis in the liver is the common pathway that leads to an evolutive liver disease. Precise definition of liver fibrosis stage is essential for management of the patient in clinical practice since the presence of bridging fibrosis represents a strong indication for antiviral therapy for chronic viral hepatitis, while cirrhosis requires a specific follow-up including screening for esophageal varices and hepatocellular carcinoma. Liver biopsy has always represented the standard of reference for assessment of hepatic fibrosis but it has some limitations being invasive, costly and prone to sampling errors. Recently, blood markers and instrumental methods have been proposed for the non-invasive assessment of liver fibrosis. However, there are still some doubts as to their implementation in clinical practice and a real consensus on how and when to use them is not still available. This is due to an unsatisfactory accuracy for some of them, and to an incomplete validation for others. Some studies suggest that performance of non-invasive methods for liver fibrosis assessment may increase when they are combined. Combination algorithms of non-invasive methods for assessing liver fibrosis may represent a rational and reliable approach to implement non-invasive assessment of liver fibrosis in clinical practice and to reduce rather than abolish liver biopsies. PMID:19437558
Breuer, Christian; Lucas, Martin; Schütze, Frank-Walter; Claus, Peter
2007-01-01
A multi-criteria optimisation procedure based on genetic algorithms is carried out in search of advanced heterogeneous catalysts for total oxidation. Simple but flexible software routines have been created to be applied within a search space of more then 150,000 individuals. The general catalyst design includes mono-, bi- and trimetallic compositions assembled out of 49 different metals and depleted on an Al2O3 support in up to nine amount levels. As an efficient tool for high-throughput screening and perfectly matched to the requirements of heterogeneous gas phase catalysis - especially for applications technically run in honeycomb structures - the multi-channel monolith reactor is implemented to evaluate the catalyst performances. Out of a multi-component feed-gas, the conversion rates of carbon monoxide (CO) and a model hydrocarbon (HC) are monitored in parallel. In combination with further restrictions to preparation and pre-treatment a primary screening can be conducted, promising to provide results close to technically applied catalysts. Presented are the resulting performances of the optimisation process for the first catalyst generations and the prospect of its auto-adaptation to specified optimisation goals. PMID:17266517
NASA Astrophysics Data System (ADS)
Bouallegue, Kais; Chaari, Abdessattar
In this study, one propose to study a numeric type strategy permitting the generation of any shape of path in view of the scheduling of the trajectories for a car-like mobile robot where the planned motions considered are continuous sequences in the space of the robot. These paths are programmed in order to have some types of closed or open trajectories. One is interested in the motion control of the robot from an initial position to a final position while optimizing the consumed energy in its alternated circular motion on both sides of the segment joining these two points. In this study, one presents a new method based on a numeric approach conceived from the kinematics equations of the robot. This new technique of numeric, adaptive and dynamic control of the robot is implemented on DSP21065L of the SHARC family. This algorithm assures the robot control of an initial position of departure to a final position of arrival without the existence of obstacles.
Ezra, Elishai; Maor, Idan; Bavli, Danny; Shalom, Itai; Levy, Gahl; Prill, Sebastian; Jaeger, Magnus S; Nahmias, Yaakov
2015-08-01
Microfluidic applications range from combinatorial synthesis to high throughput screening, with platforms integrating analog perfusion components, digitally controlled micro-valves and a range of sensors that demand a variety of communication protocols. Currently, discrete control units are used to regulate and monitor each component, resulting in scattered control interfaces that limit data integration and synchronization. Here, we present a microprocessor-based control unit, utilizing the MS Gadgeteer open framework that integrates all aspects of microfluidics through a high-current electronic circuit that supports and synchronizes digital and analog signals for perfusion components, pressure elements, and arbitrary sensor communication protocols using a plug-and-play interface. The control unit supports an integrated touch screen and TCP/IP interface that provides local and remote control of flow and data acquisition. To establish the ability of our control unit to integrate and synchronize complex microfluidic circuits we developed an equi-pressure combinatorial mixer. We demonstrate the generation of complex perfusion sequences, allowing the automated sampling, washing, and calibrating of an electrochemical lactate sensor continuously monitoring hepatocyte viability following exposure to the pesticide rotenone. Importantly, integration of an optical sensor allowed us to implement automated optimization protocols that require different computational challenges including: prioritized data structures in a genetic algorithm, distributed computational efforts in multiple-hill climbing searches and real-time realization of probabilistic models in simulated annealing. Our system offers a comprehensive solution for establishing optimization protocols and perfusion sequences in complex microfluidic circuits. PMID:26227212
Implementation of a New DTSTEP Algorithm for use in RELAP5-3D and PVMEXEC Completion Report
Dr. George L Mesina
2010-12-01
The PVM Coupling methodology for decomposing a complex model into domains onto which individual programs may be applied has proven effective for solving many multi-physics problems. There have been, from the outset, some detailed and/or long-running models that cause the process to fail. This project addressed the PVM coupling issues surrounding the DTSTEP subroutines on RELAP5-3D and PVMEXEC. Some 25 errors are listed in Tables 1 and 18 and in Section 11. These arise from deficiencies in the floating point calculation and testing of time steps, cumulative time, and time targets. The algorithmic replacement of floating point control of these items with integer based timestepping was developed and implemented. The result of the first phase, undertaken by the INL was that all but three of these issues were resolved. Moreover, two conceptual errors in DTSTEP that were not PVM coupling related were discovered and solved. The final, and most difficult three PVM Bettis User Problems, were solved during the Bettis phase of development and debugging. In 8 months since the conclusion of the project, no further DTSTEP related PVM Coupling errors have been reported.
NASA Astrophysics Data System (ADS)
Pandey, P.; De Ridder, K.; van Lipzig, N.
2009-04-01
Clouds play a very important role in the Earth's climate system, as they form an intermediate layer between Sun and the Earth. Satellite remote sensing systems are the only means to provide information about clouds on large scales. The geostationary satellite, Meteosat Second Generation (MSG) has onboard an imaging radiometer, the Spinning Enhanced Visible and Infrared Imager (SEVIRI). SEVIRI is a 12 channel imager, with 11 channels observing the earth's full disk with a temporal resolution of 15 min and spatial resolution of 3 km at nadir, and a high resolution visible (HRV) channel. The visible channels (0.6 µm and 0.81 µm) and near infrared channel (1.6µm) of SEVIRI are being used to retrieve the cloud optical thickness (COT). The study domain is over Europe covering the region between 35°N - 70°N and 10°W - 30°E. SEVIRI level 1.5 images over this domain are being acquired from the European Organisation for the Exploitation of Meteorological Satellites (EUMETSAT) archive. The processing of this imagery, involves a number of steps before estimating the COT. The steps involved in pre-processing are as follows. First, the digital count number is acquired from the imagery. Image geo-coding is performed in order to relate the pixel positions to the corresponding longitude and latitude. Solar zenith angle is determined as a function of latitude and time. The radiometric conversion is done using the values of offsets and slopes of each band. The values of radiance obtained are then used to calculate the reflectance for channels in the visible spectrum using the information of solar zenith angle. An attempt is made to estimate the COT from the observed radiances. A semi analytical algorithm [Kokhanovsky et al., 2003] is implemented for the estimation of cloud optical thickness from the visible spectrum of light intensity reflected from clouds. The asymptotical solution of the radiative transfer equation, for clouds with large optical thickness, is the basis of
NASA Astrophysics Data System (ADS)
Abdelazim, S.; Santoro, D.; Arend, M.; Moshary, F.; Ahmed, S.
2015-05-01
In this paper, we present two signal processing algorithms implemented using the FPGA. The first algorithm involves explicate time gating of received signals that correspond to a desired spatial resolution, performing a Fast Fourier Transform (FFT) calculation on each individual time gate, taking the square modulus of the FFT to form a power spectrum and then accumulating these power spectra for 10k return signals. The second algorithm involves calculating the autocorrelation of the backscattered signals and then accumulating the autocorrelation for 10k pulses. Efficient implementation of each of these two signal processing algorithms on an FPGA is challenging because it requires there to be tradeoffs between retaining the full data word width, managing the amount of on chip memory used and respecting the constraints imposed by the data width of the FPGA. A description of the approach used to manage these tradeoffs for each of the two signal processing algorithms are presented and explained in this article. Results of atmospheric measurements obtained through these two embedded programming techniques are also presented.
NASA Astrophysics Data System (ADS)
Kubica, Jacek M.
2000-10-01
The use of the downhill Simplex algorithm in reconstruction of optical parameters of planar silica waveguides is described. The original Nelder-Mead approach has been modified to include physical constraints of the waveguide system. Numerical results are provided to illustrate the behavior of the modified algorithm.
NASA Astrophysics Data System (ADS)
Smith, D. E.; Felizardo, C.; Minson, S. E.; Boese, M.; Langbein, J. O.; Guillemot, C.; Murray, J. R.
2015-12-01
The earthquake early warning (EEW) systems in California and elsewhere can greatly benefit from algorithms that generate estimates of finite-fault parameters. These estimates could significantly improve real-time shaking calculations and yield important information for immediate disaster response. Minson et al. (2015) determined that combining FinDer's seismic-based algorithm (Böse et al., 2012) with BEFORES' geodetic-based algorithm (Minson et al., 2014) yields a more robust and informative joint solution than using either algorithm alone. FinDer examines the distribution of peak ground accelerations from seismic stations and determines the best finite-fault extent and strike from template matching. BEFORES employs a Bayesian framework to search for the best slip inversion over all possible fault geometries in terms of strike and dip. Using FinDer and BEFORES together generates estimates of finite-fault extent, strike, dip, preferred slip, and magnitude. To yield the quickest, most flexible, and open-source version of the joint algorithm, we translated BEFORES and FinDer from Matlab into C++. We are now developing a C++ Application Protocol Interface for these two algorithms to be connected to the seismic and geodetic data flowing from the EEW system. The interface that is being developed will also enable communication between the two algorithms to generate the joint solution of finite-fault parameters. Once this interface is developed and implemented, the next step will be to run test seismic and geodetic data through the system via the Earthworm module, Tank Player. This will allow us to examine algorithm performance on simulated data and past real events.
NASA Astrophysics Data System (ADS)
Frydendall, J.; Brandt, J.; Christensen, J. H.
2009-08-01
A simple data assimilation algorithm based on statistical interpolation has been developed and coupled to a long-range chemistry transport model, the Danish Eulerian Operational Model (DEOM), applied for air pollution forecasting at the National Environmental Research Institute (NERI), Denmark. In this paper, the algorithm and the results from experiments designed to find the optimal setup of the algorithm are described. The algorithm has been developed and optimized via eight different experiments where the results from different model setups have been tested against measurements from the EMEP (European Monitoring and Evaluation Programme) network covering a half-year period, April-September 1999. The best performing setup of the data assimilation algorithm for surface ozone concentrations has been found, including the combination of determining the covariances using the Hollingsworth method, varying the correlation length according to the number of adjacent observation stations and applying the assimilation routine at three successive hours during the morning. Improvements in the correlation coefficient in the range of 0.1 to 0.21 between the results from the reference and the optimal configuration of the data assimilation algorithm, were found. The data assimilation algorithm will in the future be used in the operational THOR integrated air pollution forecast system, which includes the DEOM.
NASA Astrophysics Data System (ADS)
Frydendall, J.; Brandt, J.; Christensen, J. H.
2009-03-01
A simple data assimilation algorithm based on statistical interpolation has been developed and coupled to a long-range chemistry transport model, the Danish Eulerian Operational Model (DEOM), applied for air pollution forecasting at the National Environmental Research Institute (NERI), Denmark. In this paper, the algorithm and the results from experiments designed to find the optimal setup of the algorithm are described. The algorithm has been developed and optimized via eight different experiments where the results from different model setups have been tested against measurements from the EMEP (European Monitoring and Evaluation Programme) network covering a half-year period, April-September 1999. The best performing setup of the data assimilation algorithm for surface ozone concentrations has been found, including the combination of determining the covariances using the Hollingsworth method, varying the correlation length according to the number of adjacent observation stations and applying the assimilation routine at three successive hours during the morning. Improvements in the correlation coefficient in the range of 0.1 to 0.21 between the results from the reference and the optimal configuration of the data assimilation algorithm, were found. The data assimilation algorithm will in the future be used in the operational THOR integrated air pollution forecast system, which includes the DEOM.
NASA Astrophysics Data System (ADS)
Kress, R. L.; Jansen, J. F.; Noakes, M. W.
When suspended payloads are moved with an overhead crane, pendulum like oscillations are naturally introduced. This presents a problem any time a crane is used, especially when expensive and/or delicate objects are moved, when moving in a cluttered and/or hazardous environment, and when objects are to be placed in tight locations. Damped-oscillation control algorithms have been demonstrated over the past several years for laboratory-scale robotic systems on dc motor-driven overhead cranes. Most overhead cranes presently in use in industry are driven by ac induction motors; consequently, Oak Ridge National Laboratory has implemented damped-oscillation crane control on one of its existing facility ac induction motor-driven overhead cranes. The purpose of this test was to determine feasibility, to work out control and interfacing specifications, and to establish the capability of newly available ac motor control hardware with respect to use in damped-oscillation-controlled systems. Flux vector inverter drives are used to investigate their acceptability for damped-oscillation crane control. The purpose of this paper is to describe the experimental implementation of a control algorithm on a full-sized, two-degree-of-freedom, industrial crane; describe the experimental evaluation of the controller including robustness to payload length changes; explain the results of experiments designed to determine the hardware required for implementation of the control algorithms; and to provide a theoretical description of the controller.
Kress, R.L.; Jansen, J.F.; Noakes, M.W.
1994-05-01
When suspended payloads are moved with an overhead crane, pendulum like oscillations are naturally introduced. This presents a problem any time a crane is used, especially when expensive and/or delicate objects are moved, when moving in a cluttered an or hazardous environment, and when objects are to be placed in tight locations. Damped-oscillation control algorithms have been demonstrated over the past several years for laboratory-scale robotic systems on dc motor-driven overhead cranes. Most overhead cranes presently in use in industry are driven by ac induction motors; consequently, Oak Ridge National Laboratory has implemented damped-oscillation crane control on one of its existing facility ac induction motor-driven overhead cranes. The purpose of this test was to determine feasibility, to work out control and interfacing specifications, and to establish the capability of newly available ac motor control hardware with respect to use in damped-oscillation-controlled systems. Flux vector inverter drives are used to investigate their acceptability for damped-oscillation crane control. The purpose of this paper is to describe the experimental implementation of a control algorithm on a full-sized, two-degree-of-freedom, industrial crane; describe the experimental evaluation of the controller including robustness to payload length changes; explain the results of experiments designed to determine the hardware required for implementation of the control algorithms; and to provide a theoretical description of the controller.
NASA Technical Reports Server (NTRS)
Nyangweso, Emmanuel; Bole, Brian
2014-01-01
Successful prediction and management of battery life using prognostic algorithms through ground and flight tests is important for performance evaluation of electrical systems. This paper details the design of test beds suitable for replicating loading profiles that would be encountered in deployed electrical systems. The test bed data will be used to develop and validate prognostic algorithms for predicting battery discharge time and battery failure time. Online battery prognostic algorithms will enable health management strategies. The platform used for algorithm demonstration is the EDGE 540T electric unmanned aerial vehicle (UAV). The fully designed test beds developed and detailed in this paper can be used to conduct battery life tests by controlling current and recording voltage and temperature to develop a model that makes a prediction of end-of-charge and end-of-life of the system based on rapid state of health (SOH) assessment.
Fevotte, F.; Lathuiliere, B.
2013-07-01
The large increase in computing power over the past few years now makes it possible to consider developing 3D full-core heterogeneous deterministic neutron transport solvers for reference calculations. Among all approaches presented in the literature, the method first introduced in [1] seems very promising. It consists in iterating over resolutions of 2D and ID MOC problems by taking advantage of prismatic geometries without introducing approximations of a low order operator such as diffusion. However, before developing a solver with all industrial options at EDF, several points needed to be clarified. In this work, we first prove the convergence of this iterative process, under some assumptions. We then present our high-performance, parallel implementation of this algorithm in the MICADO solver. Benchmarking the solver against the Takeda case shows that the 2D-1D coupling algorithm does not seem to affect the spatial convergence order of the MOC solver. As for performance issues, our study shows that even though the data distribution is suited to the 2D solver part, the efficiency of the ID part is sufficient to ensure a good parallel efficiency of the global algorithm. After this study, the main remaining difficulty implementation-wise is about the memory requirement of a vector used for initialization. An efficient acceleration operator will also need to be developed. (authors)
NASA Astrophysics Data System (ADS)
Paz, Abel; Plaza, Antonio; Plaza, Javier
2009-08-01
Automatic target detection in hyperspectral images is a task that has attracted a lot of attention recently. In the last few years, several algoritms have been developed for this purpose, including the well-known RX algorithm for anomaly detection, or the automatic target detection and classification algorithm (ATDCA), which uses an orthogonal subspace projection (OSP) approach to extract a set of spectrally distinct targets automatically from the input hyperspectral data. Depending on the complexity and dimensionality of the analyzed image scene, the target/anomaly detection process may be computationally very expensive, a fact that limits the possibility of utilizing this process in time-critical applications. In this paper, we develop computationally efficient parallel versions of both the RX and ATDCA algorithms for near real-time exploitation of these algorithms. In the case of ATGP, we use several distance metrics in addition to the OSP approach. The parallel versions are quantitatively compared in terms of target detection accuracy, using hyperspectral data collected by NASA's Airborne Visible Infra-Red Imaging Spectrometer (AVIRIS) over the World Trade Center in New York, five days after the terrorist attack of September 11th, 2001, and also in terms of parallel performance, using a massively Beowulf cluster available at NASA's Goddard Space Flight Center in Maryland.
Ewing, R.E.; Saevareid, O.; Shen, J.
1994-12-31
A multigrid algorithm for the cell-centered finite difference on equilateral triangular grids for solving second-order elliptic problems is proposed. This finite difference is a four-point star stencil in a two-dimensional domain and a five-point star stencil in a three dimensional domain. According to the authors analysis, the advantages of this finite difference are that it is an O(h{sup 2})-order accurate numerical scheme for both the solution and derivatives on equilateral triangular grids, the structure of the scheme is perhaps the simplest, and its corresponding multigrid algorithm is easily constructed with an optimal convergence rate. They are interested in relaxation of the equilateral triangular grid condition to certain general triangular grids and the application of this multigrid algorithm as a numerically reasonable preconditioner for the lowest-order Raviart-Thomas mixed triangular finite element method. Numerical test results are presented to demonstrate their analytical results and to investigate the applications of this multigrid algorithm on general triangular grids.
Dragas, Jelena; Jackel, David; Hierlemann, Andreas; Franke, Felix
2015-03-01
Reliable real-time low-latency spike sorting with large data throughput is essential for studies of neural network dynamics and for brain-machine interfaces (BMIs), in which the stimulation of neural networks is based on the networks' most recent activity. However, the majority of existing multi-electrode spike-sorting algorithms are unsuited for processing high quantities of simultaneously recorded data. Recording from large neuronal networks using large high-density electrode sets (thousands of electrodes) imposes high demands on the data-processing hardware regarding computational complexity and data transmission bandwidth; this, in turn, entails demanding requirements in terms of chip area, memory resources and processing latency. This paper presents computational complexity optimization techniques, which facilitate the use of spike-sorting algorithms in large multi-electrode-based recording systems. The techniques are then applied to a previously published algorithm, on its own, unsuited for large electrode set recordings. Further, a real-time low-latency high-performance VLSI hardware architecture of the modified algorithm is presented, featuring a folded structure capable of processing the activity of hundreds of neurons simultaneously. The hardware is reconfigurable “on-the-fly” and adaptable to the nonstationarities of neuronal recordings. By transmitting exclusively spike time stamps and/or spike waveforms, its real-time processing offers the possibility of data bandwidth and data storage reduction. PMID:25415989
Link, W.A.; Barker, R.J.
2008-01-01
Judicious choice of candidate generating distributions improves efficiency of the Metropolis-Hastings algorithm. In Bayesian applications, it is sometimes possible to identify an approximation to the target posterior distribution; this approximate posterior distribution is a good choice for candidate generation. These observations are applied to analysis of the Cormack-Jolly-Seber model and its extensions. ?? Springer Science+Business Media, LLC 2007.
Link, W.A.; Barker, R.J.
2008-01-01
Judicious choice of candidate generating distributions improves efficiency of the Metropolis-Hastings algorithm. In Bayesian applications, it is sometimes possible to identify an approximation to the target posterior distribution; this approximate posterior distribution is a good choice for candidate generation. These observations are applied to analysis of the Cormack?Jolly?Seber model and its extensions.
Continuous-Variable Quantum Computation of Oracle Decision Problems
NASA Astrophysics Data System (ADS)
Adcock, Mark R. A.
Quantum information processing is appealing due its ability to solve certain problems quantitatively faster than classical information processing. Most quantum algorithms have been studied in discretely parameterized systems, but many quantum systems are continuously parameterized. The field of quantum optics in particular has sophisticated techniques for manipulating continuously parameterized quantum states of light, but the lack of a code-state formalism has hindered the study of quantum algorithms in these systems. To address this situation, a code-state formalism for the solution of oracle decision problems in continuously-parameterized quantum systems is developed. Quantum information processing is appealing due its ability to solve certain problems quantitatively faster than classical information processing. Most quantum algorithms have been studied in discretely parameterized systems, but many quantum systems are continuously parameterized. The field of quantum optics in particular has sophisticated techniques for manipulating continuously parameterized quantum states of light, but the lack of a code-state formalism has hindered the study of quantum algorithms in these systems. To address this situation, a code-state formalism for the solution of oracle decision problems in continuously-parameterized quantum systems is developed. In the infinite-dimensional case, we study continuous-variable quantum algorithms for the solution of the Deutsch--Jozsa oracle decision problem implemented within a single harmonic-oscillator. Orthogonal states are used as the computational bases, and we show that, contrary to a previous claim in the literature, this implementation of quantum information processing has limitations due to a position-momentum trade-off of the Fourier transform. We further demonstrate that orthogonal encoding bases are not unique, and using the coherent states of the harmonic oscillator as the computational bases, our formalism enables quantifying
NASA Astrophysics Data System (ADS)
Li, Zhenping; Grotenhuis, Michael; Wu, Xiangqian; Schmit, Timothy J.; Schmidt, Chris; Schreiner, Anthony J.; Nelson, James P.; Yu, Fangfang; Bysal, Hyre
2014-01-01
Channel-to-channel co-registration is an important performance metric for the Geostationary Operational Environmental Satellite (GOES) Imager, and large co-registration errors can have a significant impact on the reliability of derived products that rely on combinations of multiple infrared (IR) channels. Affected products include the cloud mask, fog and fire detection. This is especially the case for GOES-13, in which the co-registration error between channels 2 (3.9 μm) and 4 (10.7 μm) can be as large as 1 pixel (or ˜4 km) in the east-west direction. The GOES Imager IR channel-to-channel co-registration characterization (GII4C) algorithm is presented, which allows a systematic calculation of the co-registration error between GOES IR channel image pairs. The procedure for determining the co-registration error as a function of time is presented. The algorithm characterizes the co-registration error between corresponding images from two channels by spatially transforming one image using the fast Fourier transformation resampling algorithm and determining the distance of the transformation that yields the maximum correlation in brightness temperature. The GII4C algorithm is an area-based approach which does not depend on a fixed set of control points that may be impacted by the presence of clouds. In fact, clouds are a feature that enhances the correlations. The results presented show very large correlations over the majority of Earth-viewing pixels, with stable algorithm results. Verification of the algorithm output is discussed, and a global spatial-spectral gradient asymmetry parameter is defined. The results show that the spatial-spectral gradient asymmetry is strongly correlated to the co-registration error and can be an effective global metric for the quality of the channel-to-channel co-registration characterization algorithm. Implementation of the algorithm in the GOES ground system is presented. This includes an offline component to determine the time
NASA Technical Reports Server (NTRS)
Whiffen, Gregory J.
2006-01-01
Mystic software is designed to compute, analyze, and visualize optimal high-fidelity, low-thrust trajectories, The software can be used to analyze inter-planetary, planetocentric, and combination trajectories, Mystic also provides utilities to assist in the operation and navigation of low-thrust spacecraft. Mystic will be used to design and navigate the NASA's Dawn Discovery mission to orbit the two largest asteroids, The underlying optimization algorithm used in the Mystic software is called Static/Dynamic Optimal Control (SDC). SDC is a nonlinear optimal control method designed to optimize both 'static variables' (parameters) and dynamic variables (functions of time) simultaneously. SDC is a general nonlinear optimal control algorithm based on Bellman's principal.
NASA Astrophysics Data System (ADS)
Liu, Qianshun; Liu, Yan; Yu, Feihong
2013-08-01
As a kind of film device, band-pass filter is widely used in pattern recognition, infrared detection, optical fiber communication, etc. In this paper, an algorithm for automatic measurement of band-pass filter quality criterion is proposed based on the proven theory calculation of derivate spectral transmittance of filter formula. Firstly, wavelet transform to reduce spectrum data noises is used. Secondly, combining with the Gaussian curve fitting and least squares method, the algorithm fits spectrum curve and searches the peak. Finally, some parameters for judging band-pass filter quality are figure out. Based on the algorithm, a pipeline for band-pass filters automatic measurement system has been designed that can scan the filter array automatically and display spectral transmittance of each filter. At the same time, the system compares the measuring result with the user defined standards to determine if the filter is qualified or not. The qualified product will be market with green color, and the unqualified product will be marked with red color. With the experiments verification, the automatic measurement system basically realized comprehensive, accurate and rapid measurement of band-pass filter quality and achieved the expected results.
A 0.13-µm implementation of 5 Gb/s and 3-mW folded parallel architecture for AES algorithm
NASA Astrophysics Data System (ADS)
Rahimunnisa, K.; Karthigaikumar, P.; Kirubavathy, J.; Jayakumar, J.; Kumar, S. Suresh
2014-02-01
A new architecture for encrypting and decrypting the confidential data using Advanced Encryption Standard algorithm is presented in this article. This structure combines the folded structure with parallel architecture to increase the throughput. The whole architecture achieved high throughput with less power. The proposed architecture is implemented in 0.13-µm Complementary metal-oxide-semiconductor (CMOS) technology. The proposed structure is compared with different existing structures, and from the result it is proved that the proposed structure gives higher throughput and less power compared to existing works.
NASA Astrophysics Data System (ADS)
Giacometto, F. J.; Vilardy, J. M.; Torres, C. O.; Mattos, L.
2011-01-01
Among the most used biometric signals to set personal security permissions, taker increasingly importance biometric iris recognition based on their textures and images of blood vessels due to the rich in these two unique characteristics that are unique to each individual. This paper presents an implementation of an algorithm characterization and correlation of templates created for biometric authentication based on iris texture analysis programmed on a FPGA (Field Programmable Gate Array), authentication is based on processes like characterization methods based on frequency analysis of the sample, and frequency correlation to obtain the expected results of authentication.
Fox, Andrew; Williams, Mathew; Richardson, Andrew D.; Cameron, David; Gove, Jeffrey H.; Quaife, Tristan; Ricciuto, Daniel M; Reichstein, Markus; Tomelleri, Enrico; Trudinger, Cathy; Van Wijk, Mark T.
2009-10-01
We describe a model-data fusion (MDF) inter-comparison project (REFLEX), which compared various algorithms for estimating carbon (C) model parameters consistent with both measured carbon fluxes and states and a simple C model. Participants were provided with the model and with both synthetic net ecosystem exchange (NEE) ofCO2 and leaf area index (LAI) data, generated from the model with added noise, and observed NEE and LAI data from two eddy covariance sites. Participants endeavoured to estimate model parameters and states consistent with the model for all cases over the two years for which data were provided, and generate predictions for one additional year without observations. Nine participants contributed results using Metropolis algorithms, Kalman filters and a genetic algorithm. For the synthetic data case, parameter estimates compared well with the true values. The results of the analyses indicated that parameters linked directly to gross primary production (GPP) and ecosystem respiration, such as those related to foliage allocation and turnover, or temperature sensitivity of heterotrophic respiration,were best constrained and characterised. Poorly estimated parameters were those related to the allocation to and turnover of fine root/wood pools. Estimates of confidence intervals varied among algorithms, but several algorithms successfully located the true values of annual fluxes from synthetic experiments within relatively narrow 90% confidence intervals, achieving>80% success rate and mean NEE confidence intervals <110 gCm-2 year-1 for the synthetic case. Annual C flux estimates generated by participants generally agreed with gap-filling approaches using half-hourly data. The estimation of ecosystem respiration and GPP through MDF agreed well with outputs from partitioning studies using half-hourly data. Confidence limits on annual NEE increased by an average of 88% in the prediction year compared to the previous year, when data were available. Confidence
NASA Astrophysics Data System (ADS)
Bujnowski, K.; Pucyk, P.; Pozniak, K. T.; Romaniuk, R. S.
2008-01-01
The European XFEL project uses the LLRF system for stabilization of a vector sum of the RF field in 32 superconducting cavities. A dedicated, high performance photonics and electronics and software was built. To provide high system availability an appropriate test environment as well as diagnostics was designed. A real time simulation subsystem was designed which is based on dedicated electronics using FPGA technology and robust simulation models implemented in VHDL. The paper presents an architecture of the system framework which allows for easy and flexible conversion of MATLAB language structures directly into FPGA implementable grid of parameterized and simple DSP processors. The decomposition of MATLAB grammar was described as well as optimization process and FPGA implementation issues.
Gonen, Yotam; Rytwo, Giora
2009-01-01
A Matlab implemented computer code for spectral resolution is presented. The code enables the user to resolve the UV-visible absorption spectrum of a mixture of up to 3 previously known components, to the individual components, thus, evaluating their quantities. The resolving procedure is based on searching the combination of the components which yields the spectrum which is the most similar (minimal RMSE) to the measured spectrum of the mixture. Examples of using the software for pKa value estimation and multicomponent analysis are presented and other implementations are suggested. PMID:20072668
NASA Astrophysics Data System (ADS)
Lee, Dah-Jye; Anderson, Jonathan D.; Archibald, James K.
2008-12-01
Many image and signal processing techniques have been applied to medical and health care applications in recent years. In this paper, we present a robust signal processing approach that can be used to solve the correspondence problem for an embedded stereo vision sensor to provide real-time visual guidance to the visually impaired. This approach is based on our new one-dimensional (1D) spline-based genetic algorithm to match signals. The algorithm processes image data lines as 1D signals to generate a dense disparity map, from which 3D information can be extracted. With recent advances in electronics technology, this 1D signal matching technique can be implemented and executed in parallel in hardware such as field-programmable gate arrays (FPGAs) to provide real-time feedback about the environment to the user. In order to complement (not replace) traditional aids for the visually impaired such as canes and Seeing Eyes dogs, vision systems that provide guidance to the visually impaired must be affordable, easy to use, compact, and free from attributes that are awkward or embarrassing to the user. "Seeing Eye Glasses," an embedded stereo vision system utilizing our new algorithm, meets all these requirements.
Miller, L.F.
1980-11-01
A brief description of the Oak Ridge Reactor Pool Side Facility (ORR-PSF) and of the associated control system is given. The ORR-PSF capsule temperatures are controlled by a digital computer which regulates the percent power delivered to electrical heaters. The total electrical power which can be input to a particular heater is determined by the setting of an associated variac. This report concentrates on the description of the ORR-PSF irradiation experiment computer control algorithm. The algorithm is an implementation of a discrete-time, state variable, optimal control approach. The Riccati equation is solved for a discretized system model to determine the control law. Experiments performed to obtain system model parameters are described. Results of performance evaluation experiments are also presented. The control algorithm maintains both capsule temperatures within a 288/sup 0/C +-10/sup 0/C band as required. The pressure vessel capsule temperatures are effectively maintained within a 288/sup 0/C +-5/sup 0/C band.
Chen, Guangye; Chacon, Luis; Barnes, Daniel C
2012-01-01
Recently, a fully implicit, energy- and charge-conserving particle-in-cell method has been developed for multi-scale, full-f kinetic simulations [G. Chen, et al., J. Comput. Phys. 230, 18 (2011)]. The method employs a Jacobian-free Newton-Krylov (JFNK) solver and is capable of using very large timesteps without loss of numerical stability or accuracy. A fundamental feature of the method is the segregation of particle orbit integrations from the field solver, while remaining fully self-consistent. This provides great flexibility, and dramatically improves the solver efficiency by reducing the degrees of freedom of the associated nonlinear system. However, it requires a particle push per nonlinear residual evaluation, which makes the particle push the most time-consuming operation in the algorithm. This paper describes a very efficient mixed-precision, hybrid CPU-GPU implementation of the implicit PIC algorithm. The JFNK solver is kept on the CPU (in double precision), while the inherent data parallelism of the particle mover is exploited by implementing it in single-precision on a graphics processing unit (GPU) using CUDA. Performance-oriented optimizations, with the aid of an analytical performance model, the roofline model, are employed. Despite being highly dynamic, the adaptive, charge-conserving particle mover algorithm achieves up to 300 400 GOp/s (including single-precision floating-point, integer, and logic operations) on a Nvidia GeForce GTX580, corresponding to 20 25% absolute GPU efficiency (against the peak theoretical performance) and 50-70% intrinsic efficiency (against the algorithm s maximum operational throughput, which neglects all latencies). This is about 200-300 times faster than an equivalent serial CPU implementation. When the single-precision GPU particle mover is combined with a double-precision CPU JFNK field solver, overall performance gains 100 vs. the double-precision CPU-only serial version are obtained, with no apparent loss of
Frolov, Vladimir; Backhaus, Scott N.; Chertkov, Michael
2014-01-14
We explore optimization methods for planning the placement, sizing and operations of Flexible Alternating Current Transmission System (FACTS) devices installed to relieve transmission grid congestion. We limit our selection of FACTS devices to Series Compensation (SC) devices that can be represented by modification of the inductance of transmission lines. Our master optimization problem minimizes the l_{1} norm of the inductance modification subject to the usual line thermal-limit constraints. We develop heuristics that reduce this non-convex optimization to a succession of Linear Programs (LP) which are accelerated further using cutting plane methods. The algorithm solves an instance of the MatPower Polish Grid model (3299 lines and 2746 nodes) in 40 seconds per iteration on a standard laptop—a speed up that allows the sizing and placement of a family of SC devices to correct a large set of anticipated congestions. We observe that our algorithm finds feasible solutions that are always sparse, i.e., SC devices are placed on only a few lines. In a companion manuscript, we demonstrate our approach on realistically-sized networks that suffer congestion from a range of causes including generator retirement. In this manuscript, we focus on the development of our approach, investigate its structure on a small test system subject to congestion from uniform load growth, and demonstrate computational efficiency on a realistically-sized network.
Frolov, Vladimir; Backhaus, Scott; Chertkov, Misha
2014-10-24
We explore optimization methods for planning the placement, sizing and operations of Flexible Alternating Current Transmission System (FACTS) devices installed to relieve transmission grid congestion. We limit our selection of FACTS devices to Series Compensation (SC) devices that can be represented by modification of the inductance of transmission lines. Our master optimization problem minimizes the l_{1} norm of the inductance modification subject to the usual line thermal-limit constraints. We develop heuristics that reduce this non-convex optimization to a succession of Linear Programs (LP) which are accelerated further using cutting plane methods. The algorithm solves an instance of the MatPower Polish Grid model (3299 lines and 2746 nodes) in 40 seconds per iteration on a standard laptop—a speed up that allows the sizing and placement of a family of SC devices to correct a large set of anticipated congestions. We observe that our algorithm finds feasible solutions that are always sparse, i.e., SC devices are placed on only a few lines. In a companion manuscript, we demonstrate our approach on realistically-sized networks that suffer congestion from a range of causes including generator retirement. In this manuscript, we focus on the development of our approach, investigate its structure on a small test system subject to congestion from uniform load growth, and demonstrate computational efficiency on a realistically-sized network.
Frolov, Vladimir; Backhaus, Scott; Chertkov, Misha
2014-10-24
We explore optimization methods for planning the placement, sizing and operations of Flexible Alternating Current Transmission System (FACTS) devices installed to relieve transmission grid congestion. We limit our selection of FACTS devices to Series Compensation (SC) devices that can be represented by modification of the inductance of transmission lines. Our master optimization problem minimizes the l1 norm of the inductance modification subject to the usual line thermal-limit constraints. We develop heuristics that reduce this non-convex optimization to a succession of Linear Programs (LP) which are accelerated further using cutting plane methods. The algorithm solves an instance of the MatPower Polishmore » Grid model (3299 lines and 2746 nodes) in 40 seconds per iteration on a standard laptop—a speed up that allows the sizing and placement of a family of SC devices to correct a large set of anticipated congestions. We observe that our algorithm finds feasible solutions that are always sparse, i.e., SC devices are placed on only a few lines. In a companion manuscript, we demonstrate our approach on realistically-sized networks that suffer congestion from a range of causes including generator retirement. In this manuscript, we focus on the development of our approach, investigate its structure on a small test system subject to congestion from uniform load growth, and demonstrate computational efficiency on a realistically-sized network.« less
NASA Astrophysics Data System (ADS)
Frolov, Vladimir; Backhaus, Scott; Chertkov, Misha
2014-10-01
We explore optimization methods for planning the placement, sizing and operations of flexible alternating current transmission system (FACTS) devices installed to relieve transmission grid congestion. We limit our selection of FACTS devices to series compensation (SC) devices that can be represented by modification of the inductance of transmission lines. Our master optimization problem minimizes the l1 norm of the inductance modification subject to the usual line thermal-limit constraints. We develop heuristics that reduce this non-convex optimization to a succession of linear programs (LP) that are accelerated further using cutting plane methods. The algorithm solves an instance of the MatPower Polish Grid model (3299 lines and 2746 nodes) in 40 seconds per iteration on a standard laptop—a speed that allows the sizing and placement of a family of SC devices to correct a large set of anticipated congestions. We observe that our algorithm finds feasible solutions that are always sparse, i.e., SC devices are placed on only a few lines. In a companion manuscript, we demonstrate our approach on realistically sized networks that suffer congestion from a range of causes, including generator retirement. In this manuscript, we focus on the development of our approach, investigate its structure on a small test system subject to congestion from uniform load growth, and demonstrate computational efficiency on a realistically sized network.
NASA Technical Reports Server (NTRS)
Hinton, David A.
2001-01-01
A ground-based system has been developed to demonstrate the feasibility of automating the process of collecting relevant weather data, predicting wake vortex behavior from a data base of aircraft, prescribing safe wake vortex spacing criteria, estimating system benefit, and comparing predicted and observed wake vortex behavior. This report describes many of the system algorithms, features, limitations, and lessons learned, as well as suggested system improvements. The system has demonstrated concept feasibility and the potential for airport benefit. Significant opportunities exist however for improved system robustness and optimization. A condensed version of the development lab book is provided along with samples of key input and output file types. This report is intended to document the technical development process and system architecture, and to augment archived internal documents that provide detailed descriptions of software and file formats.
NASA Astrophysics Data System (ADS)
Renner, A.; Furtado, H.; Seppenwoolde, Y.; Birkfellner, W.; Georg, D.
2016-03-01
A radiotherapy (RT) treatment can last for several weeks. In that time organ motion and shape changes introduce uncertainty in dose application. Monitoring and quantifying the change can yield a more precise irradiation margin definition and thereby reduce dose delivery to healthy tissue and adjust tumor targeting. Deformable image registration (DIR) has the potential to fulfill this task by calculating a deformation field (DF) between a planning CT and a repeated CT of the altered anatomy. Application of the DF on the original contours yields new contours that can be used for an adapted treatment plan. DIR is a challenging method and therefore needs careful user interaction. Without a proper graphical user interface (GUI) a misregistration cannot be easily detected by visual inspection and the results cannot be fine-tuned by changing registration parameters. To provide a DIR algorithm with such a GUI available for everyone, we created the extension Featurelet-Registration for the open source software platform 3D Slicer. The registration logic is an upgrade of an in-house-developed DIR method, which is a featurelet-based piecewise rigid registration. The so called "featurelets" are equally sized rectangular subvolumes of the moving image which are rigidly registered to rectangular search regions on the fixed image. The output is a deformed image and a deformation field. Both can be visualized directly in 3D Slicer facilitating the interpretation and quantification of the results. For validation of the registration accuracy two deformable phantoms were used. The performance was benchmarked against a demons algorithm with comparable results.
NASA Astrophysics Data System (ADS)
Anderson, Richard P.
An algorithm for precision approach guidance using GPS and a MicroElectroMechanical Systems/Inertial Navigation System (MEMS/INS) has been developed to meet the Required Navigational Performance (RNP) at a cost that is suitable for General Aviation (GA) applications. This scheme allows for accurate approach guidance (Category I) using Wide Area Augmentation System (WAAS) at locations not served by ILS, MLS or other types of precision landing guidance, thereby greatly expanding the number of useable airports in poor weather. At locations served by a Local Area Augmentation System (LAAS), Category III-like navigation is possible with the novel idea of a Missed Approach Time (MAT) that is similar to a Missed Approach Point (MAP) but not fixed in space. Though certain augmented types of GPS have sufficient precision for approach navigation, its use alone is insufficient to meet RNP due to an inability to monitor loss, degradation or intentional spoofing and meaconing of the GPS signal. A redundant navigation system and a health monitoring system must be added to acquire sufficient reliability, safety and time-to-alert as stated by required navigation performance. An inertial navigation system is the best choice, as it requires no external radio signals and its errors are complementary to GPS. An aiding Kalman filter is used to derive parameters that monitor the correlation between the GPS and MEMS/INS. These approach guidance parameters determines the MAT for a given RNP and provide the pilot or autopilot with proceed/do-not-proceed decision in real time. The enabling technology used to derive the guidance program is a MEMS gyroscope and accelerometer package in conjunction with a single-antenna pseudo-attitude algorithm. To be viable for most GA applications, the hardware must be reasonably priced. The MEMS gyros allows for the first cost-effective INS package to be developed. With lower cost, however, comes higher drift rates and a more dependence on GPS aiding. In
NASA Astrophysics Data System (ADS)
Hong, S.; Yu, X.; Park, S. K.; Choi, Y.-S.; Myoung, B.
2014-10-01
Optimization of land surface models has been challenging due to the model complexity and uncertainty. In this study, we performed scheme-based model optimizations by designing a framework for coupling "the micro-genetic algorithm" (micro-GA) and "the Noah land surface model with multiple physics options" (Noah-MP). Micro-GA controls the scheme selections among eight different land surface parameterization categories, each containing 2-4 schemes, in Noah-MP in order to extract the optimal scheme combination that achieves the best skill score. This coupling framework was successfully applied to the optimizations of evapotranspiration and runoff simulations in terms of surface water balance over the Han River basin in Korea, showing outstanding speeds in searching for the optimal scheme combination. Taking advantage of the natural selection mechanism in micro-GA, we explored the model sensitivity to scheme selections and the scheme interrelationship during the micro-GA evolution process. This information is helpful for better understanding physical parameterizations and hence it is expected to be effectively used for further optimizations with uncertain parameters in a specific set of schemes.
Garcia-Pareja, S.; Galan, P.; Manzano, F.; Brualla, L.; Lallena, A. M.
2010-07-15
Purpose: In this work, the authors describe an approach which has been developed to drive the application of different variance-reduction techniques to the Monte Carlo simulation of photon and electron transport in clinical accelerators. Methods: The new approach considers the following techniques: Russian roulette, splitting, a modified version of the directional bremsstrahlung splitting, and the azimuthal particle redistribution. Their application is controlled by an ant colony algorithm based on an importance map. Results: The procedure has been applied to radiosurgery beams. Specifically, the authors have calculated depth-dose profiles, off-axis ratios, and output factors, quantities usually considered in the commissioning of these beams. The agreement between Monte Carlo results and the corresponding measurements is within {approx}3%/0.3 mm for the central axis percentage depth dose and the dose profiles. The importance map generated in the calculation can be used to discuss simulation details in the different parts of the geometry in a simple way. The simulation CPU times are comparable to those needed within other approaches common in this field. Conclusions: The new approach is competitive with those previously used in this kind of problems (PSF generation or source models) and has some practical advantages that make it to be a good tool to simulate the radiation transport in problems where the quantities of interest are difficult to obtain because of low statistics.
Sung, Wen-Tsai; Lin, Jia-Syun
2013-01-01
This work aims to develop a smart LED lighting system, which is remotely controlled by Android apps via handheld devices, e.g., smartphones, tablets, and so forth. The status of energy use is reflected by readings displayed on a handheld device, and it is treated as a criterion in the lighting mode design of a system. A multimeter, a wireless light dimmer, an IR learning remote module, etc. are connected to a server by means of RS 232/485 and a human computer interface on a touch screen. The wireless data communication is designed to operate in compliance with the ZigBee standard, and signal processing on sensed data is made through a self adaptive weighted data fusion algorithm. A low variation in data fusion together with a high stability is experimentally demonstrated in this work. The wireless light dimmer as well as the IR learning remote module can be instructed directly by command given on the human computer interface, and the reading on a multimeter can be displayed thereon via the server. This proposed smart LED lighting system can be remotely controlled and self learning mode can be enabled by a single handheld device via WiFi transmission. Hence, this proposal is validated as an approach to power monitoring for home appliances, and is demonstrated as a digital home network in consideration of energy efficiency.
Bostrom, G; Atkinson, D; Rice, A
2015-04-01
Cavity ringdown spectroscopy (CRDS) uses the exponential decay constant of light exiting a high-finesse resonance cavity to determine analyte concentration, typically via absorption. We present a high-throughput data acquisition system that determines the decay constant in near real time using the discrete Fourier transform algorithm on a field programmable gate array (FPGA). A commercially available, high-speed, high-resolution, analog-to-digital converter evaluation board system is used as the platform for the system, after minor hardware and software modifications. The system outputs decay constants at maximum rate of 4.4 kHz using an 8192-point fast Fourier transform by processing the intensity decay signal between ringdown events. We present the details of the system, including the modifications required to adapt the evaluation board to accurately process the exponential waveform. We also demonstrate the performance of the system, both stand-alone and incorporated into our existing CRDS system. Details of FPGA, microcontroller, and circuitry modifications are provided in the Appendix and computer code is available upon request from the authors. PMID:25933840
Shin, J.; Malusare, P.; Hovland, P. D.; Mathematics and Computer Science
2008-01-01
Automatic differentiation (AD) has been expanding its role in scientific computing. While several AD tools have been actively developed and used, a wide range of problems remain to be solved. Activity analysis allows AD tools to generate derivative code for fewer variables, leading to a faster run time of the output code. This paper describes a new context-sensitive, flow-sensitive (CSFS) activity analysis, which is developed by extending an existing context-sensitive, flow-insensitive (CSFI) activity analysis. Our experiments with eight benchmarks show that the new CSFS activity analysis is more than 27 times slower but reduces 8 overestimations for the MIT General Circulation Model (MITgcm) and 1 for an ODE solver (c2) compared with the existing CSFI activity analysis implementation. Although the number of reduced overestimations looks small, the additionally identified passive variables may significantly reduce tedious human effort in maintaining a large code base such as MITgcm.
NASA Astrophysics Data System (ADS)
Korneev, Boris; Levchenko, Vadim
2016-02-01
Interaction between a shock wave and an inhomogeneity in fluid has complicated behavior, including vortex and turbulence generating, mixing, shock wave scattering and reflection. In the present paper we deal with the numerical simulation of the considered process. The Euler equations of unsteady inviscid compressible three-dimensional flow are used into the four-equation model of multicomponent flow. These equations are discretized using the RKDG numerical method. It is implemented with the help of the DiamondTorre algorithm, so the effective GPGPU solver is obtained having outstanding computing properties. With its use we carry out several sets of numerical experiments of shock-bubble interaction problem. The bubble deformation and mixture formation is observed.
Jarvis, Joseph N; Govender, Nelesh; Chiller, Tom; Park, Benjamin J; Longley, Nicky; Meintjes, Graeme; Bekker, Linda-Gail; Wood, Robin; Lawn, Stephen D; Harrison, Thomas S
2012-01-01
HIV-associated cryptococcal meningitis (CM) is estimated to cause over half a million deaths annually in Africa. Many of these deaths are preventable. Screening patients for subclinical cryptococcal infection at the time of entry into antiretroviral therapy programs using cryptococcal antigen (CRAG) immunoassays is highly effective in identifying patients at risk of developing CM, allowing these patients to then be targeted with "preemptive" therapy to prevent the development of severe disease. Such CRAG screening programs are currently being implemented in a number of countries; however, a strong evidence base and clear guidance on how to manage patients with subclinical cryptococcal infection identified by screening are lacking. We review the available evidence and propose a treatment algorithm for the management of patients with asymptomatic cryptococcal antigenemia. PMID:23015379
D'Azevedo, Ed F; Nintcheu Fata, Sylvain
2012-01-01
A collocation boundary element code for solving the three-dimensional Laplace equation, publicly available from \\url{http://www.intetec.org}, has been adapted to run on an Nvidia Tesla general purpose graphics processing unit (GPU). Global matrix assembly and LU factorization of the resulting dense matrix were performed on the GPU. Out-of-core techniques were used to solve problems larger than available GPU memory. The code achieved over eight times speedup in matrix assembly and about 56~Gflops/sec in the LU factorization using only 512~Mbytes of GPU memory. Details of the GPU implementation and comparisons with the standard sequential algorithm are included to illustrate the performance of the GPU code.
Brown, Douglas (Sandia National Laboratories, Livermore, CA); Gray, Genetha Anne (Sandia National Laboratories, Livermore, CA)
2005-10-01
Due to the nature of many infectious agents, such as anthrax, symptoms may either take several days to manifest or resemble those of less serious illnesses leading to misdiagnosis. Thus, bioterrorism attacks that include the release of such agents are particularly dangerous and potentially deadly. For this reason, a system is needed for the quick and correct identification of disease outbreaks. The Real-time Outbreak Disease Surveillance System (RODS), initially developed by Carnegie Mellon University and the University of Pittsburgh, was created to meet this need. The RODS software implements different classifiers for pertinent health surveillance data in order to determine whether or not an outbreak has occurred. In an effort to improve the capability of RODS at detecting outbreaks, we incorporate a data fusion method. Data fusion is used to improve the results of a single classification by combining the output of multiple classifiers. This paper documents the first stages of the development of a data fusion system that can combine the output of the classifiers included in RODS.
2014-01-01
Background Protein-protein docking is an in silico method to predict the formation of protein complexes. Due to limited computational resources, the protein-protein docking approach has been developed under the assumption of rigid docking, in which one of the two protein partners remains rigid during the protein associations and water contribution is ignored or implicitly presented. Despite obtaining a number of acceptable complex predictions, it seems to-date that most initial rigid docking algorithms still find it difficult or even fail to discriminate successfully the correct predictions from the other incorrect or false positive ones. To improve the rigid docking results, re-ranking is one of the effective methods that help re-locate the correct predictions in top high ranks, discriminating them from the other incorrect ones. In this paper, we propose a new re-ranking technique using a new energy-based scoring function, namely IFACEwat - a combined Interface Atomic Contact Energy (IFACE) and water effect. The IFACEwat aims to further improve the discrimination of the near-native structures of the initial rigid docking algorithm ZDOCK3.0.2. Unlike other re-ranking techniques, the IFACEwat explicitly implements interfacial water into the protein interfaces to account for the water-mediated contacts during the protein interactions. Results Our results showed that the IFACEwat increased both the numbers of the near-native structures and improved their ranks as compared to the initial rigid docking ZDOCK3.0.2. In fact, the IFACEwat achieved a success rate of 83.8% for Antigen/Antibody complexes, which is 10% better than ZDOCK3.0.2. As compared to another re-ranking technique ZRANK, the IFACEwat obtains success rates of 92.3% (8% better) and 90% (5% better) respectively for medium and difficult cases. When comparing with the latest published re-ranking method F2Dock, the IFACEwat performed equivalently well or even better for several Antigen/Antibody complexes
Algorithms and Algorithmic Languages.
ERIC Educational Resources Information Center
Veselov, V. M.; Koprov, V. M.
This paper is intended as an introduction to a number of problems connected with the description of algorithms and algorithmic languages, particularly the syntaxes and semantics of algorithmic languages. The terms "letter, word, alphabet" are defined and described. The concept of the algorithm is defined and the relation between the algorithm and…
Staples, P.
1999-04-01
At the request of the KAEA, MAEC and DOE-NN42 analysis algorithms and diagrams to help implement the safe packaging of spent fuel assemblies were developed. These analysis algorithms are based on observable radiation signatures measured with the IAEA`s UnderWater Neutron Coincidence Counter (UWNCC), which was designed at Los Alamos, and shared with the BN-350 facility. The parameters to be determined from the UWNCC measurement station are the assembly mass, the assembly type (Blanket, Driver or Other) and total heat output. The total assembly mass is provided from a load cell located on a crane installed at the facility that is used to position the assembly during measurements in the UWNCC. The load cell readout can be observed on the crane control panel or via the IAEA`s UWNCC control software. The assembly type identification uses a relationship between the measured total gamma-ray output and the measured neutron doubles rate from the assembly at the axial mid-plane position. The neutron doubles rate is proportional to the quantity of fissile mass contained within the assembly and the total gamma-ray output is proportional to the burn-up of the assembly. The heat output is determined from the measured total gamma-ray output of the assemblies. This measured value is converted to heat output based on a correlation to heat calculations performed by ANL for the set of assemblies measured during the calibration and UWNCC verification phase of the project. The figures attached in the appendix are the figures transmitted to the facility for use during packaging.
Reasoning about systolic algorithms
Purushothaman, S.
1986-01-01
Systolic algorithms are a class of parallel algorithms, with small grain concurrency, well suited for implementation in VLSI. They are intended to be implemented as high-performance, computation-bound back-end processors and are characterized by a tesselating interconnection of identical processing elements. This dissertation investigates the problem of providing correctness of systolic algorithms. The following are reported in this dissertation: (1) a methodology for verifying correctness of systolic algorithms based on solving the representation of an algorithm as recurrence equations. The methodology is demonstrated by proving the correctness of a systolic architecture for optimal parenthesization. (2) The implementation of mechanical proofs of correctness of two systolic algorithms, a convolution algorithm and an optimal parenthesization algorithm, using the Boyer-Moore theorem prover. (3) An induction principle for proving correctness of systolic arrays which are modular. Two attendant inference rules, weak equivalence and shift transformation, which capture equivalent behavior of systolic arrays, are also presented.
Vasan, S N Swetadri; Ionita, Ciprian N; Titus, A H; Cartwright, A N; Bednarek, D R; Rudin, S
2012-02-23
We present the image processing upgrades implemented on a Graphics Processing Unit (GPU) in the Control, Acquisition, Processing, and Image Display System (CAPIDS) for the custom Micro-Angiographic Fluoroscope (MAF) detector. Most of the image processing currently implemented in the CAPIDS system is pixel independent; that is, the operation on each pixel is the same and the operation on one does not depend upon the result from the operation on the other, allowing the entire image to be processed in parallel. GPU hardware was developed for this kind of massive parallel processing implementation. Thus for an algorithm which has a high amount of parallelism, a GPU implementation is much faster than a CPU implementation. The image processing algorithm upgrades implemented on the CAPIDS system include flat field correction, temporal filtering, image subtraction, roadmap mask generation and display window and leveling. A comparison between the previous and the upgraded version of CAPIDS has been presented, to demonstrate how the improvement is achieved. By performing the image processing on a GPU, significant improvements (with respect to timing or frame rate) have been achieved, including stable operation of the system at 30 fps during a fluoroscopy run, a DSA run, a roadmap procedure and automatic image windowing and leveling during each frame. PMID:24027619
NASA Astrophysics Data System (ADS)
Swetadri Vasan, S. N.; Ionita, Ciprian N.; Titus, A. H.; Cartwright, A. N.; Bednarek, D. R.; Rudin, S.
2012-03-01
We present the image processing upgrades implemented on a Graphics Processing Unit (GPU) in the Control, Acquisition, Processing, and Image Display System (CAPIDS) for the custom Micro-Angiographic Fluoroscope (MAF) detector. Most of the image processing currently implemented in the CAPIDS system is pixel independent; that is, the operation on each pixel is the same and the operation on one does not depend upon the result from the operation on the other, allowing the entire image to be processed in parallel. GPU hardware was developed for this kind of massive parallel processing implementation. Thus for an algorithm which has a high amount of parallelism, a GPU implementation is much faster than a CPU implementation. The image processing algorithm upgrades implemented on the CAPIDS system include flat field correction, temporal filtering, image subtraction, roadmap mask generation and display window and leveling. A comparison between the previous and the upgraded version of CAPIDS has been presented, to demonstrate how the improvement is achieved. By performing the image processing on a GPU, significant improvements (with respect to timing or frame rate) have been achieved, including stable operation of the system at 30 fps during a fluoroscopy run, a DSA run, a roadmap procedure and automatic image windowing and leveling during each frame.
Vasan, S.N. Swetadri; Ionita, Ciprian N.; Titus, A.H.; Cartwright, A.N.; Bednarek, D.R.; Rudin, S.
2012-01-01
We present the image processing upgrades implemented on a Graphics Processing Unit (GPU) in the Control, Acquisition, Processing, and Image Display System (CAPIDS) for the custom Micro-Angiographic Fluoroscope (MAF) detector. Most of the image processing currently implemented in the CAPIDS system is pixel independent; that is, the operation on each pixel is the same and the operation on one does not depend upon the result from the operation on the other, allowing the entire image to be processed in parallel. GPU hardware was developed for this kind of massive parallel processing implementation. Thus for an algorithm which has a high amount of parallelism, a GPU implementation is much faster than a CPU implementation. The image processing algorithm upgrades implemented on the CAPIDS system include flat field correction, temporal filtering, image subtraction, roadmap mask generation and display window and leveling. A comparison between the previous and the upgraded version of CAPIDS has been presented, to demonstrate how the improvement is achieved. By performing the image processing on a GPU, significant improvements (with respect to timing or frame rate) have been achieved, including stable operation of the system at 30 fps during a fluoroscopy run, a DSA run, a roadmap procedure and automatic image windowing and leveling during each frame. PMID:24027619
Secret Key Crypto Implementations
NASA Astrophysics Data System (ADS)
Bertoni, Guido Marco; Melzani, Filippo
This chapter presents the algorithm selected in 2001 as the Advanced Encryption Standard. This algorithm is the base for implementing security and privacy based on symmetric key solutions in almost all new applications. Secret key algorithms are used in combination with modes of operation to provide different security properties. The most used modes of operation are presented in this chapter. Finally an overview of the different techniques of software and hardware implementations is given.
Riley, D.J.; Turner, C.D.
1991-06-01
Two methods for modeling arbitrary narrow apertures in finite- difference time-domain (FDTD) codes are presented in this paper. The first technique is based on the hybrid thin-slot algorithm (HTSA) which models the aperture physics using an integral equation approach. This method can model slots that are narrow both in width and depth with regard to the FDTD spatial cell, but is restricted to planar apertures. The second method is based on a contour technique that directly modifies the FDTD equations local to the aperture. The contour method is geometrically more flexible than the HTSA, but the depth of the aperture is restricted to the actual FDTD mesh. A technique to incorporate both narrow-aperture algorithms into the FDTD code, TSAR, based on a slot data file'' is presented in this paper. Results for a variety of complex aperture contours are provided, and limitations of the algorithms are discussed.
Cascade Error Projection: A New Learning Algorithm
NASA Technical Reports Server (NTRS)
Duong, T. A.; Stubberud, A. R.; Daud, T.; Thakoor, A. P.
1995-01-01
A new neural network architecture and a hardware implementable learning algorithm is proposed. The algorithm, called cascade error projection (CEP), handles lack of precision and circuit noise better than existing algorithms.
Grant, A. D.; Chihota, V.; Ginindza, S.; Mvusi, L.; Churchyard, G. J.; Fielding, K.L.
2016-01-01
Introduction and Background: Diagnostic tests for tuberculosis (TB) using sputum have suboptimal sensitivity among HIV-positive persons. We assessed health care worker adherence to TB diagnostic algorithms after negative sputum test results. Methods: The XTEND (Xpert for TB—Evaluating a New Diagnostic) trial compared outcomes among people tested for TB in primary care clinics using Xpert MTB/RIF vs. smear microscopy as the initial test. We analyzed data from XTEND participants who were HIV positive or HIV status unknown, whose initial sputum Xpert MTB/RIF or microscopy result was negative. If chest radiography, sputum culture, or hospital referral took place, the algorithm for TB diagnosis was considered followed. Analysis of intervention (Xpert MTB/RIF) effect on algorithm adherence used methods for cluster-randomized trials with small number of clusters. Results: Among 4037 XTEND participants with initial negative test results, 2155 (53%) reported being or testing HIV positive and 540 (14%) had unknown HIV status. Among 2155 HIV-positive participants [684 (32%) male, mean age 37 years (range, 18–79 years)], there was evidence of algorithm adherence among 515 (24%). Adherence was less likely among persons tested initially with Xpert MTB/RIF vs. smear [14% (142/1031) vs. 32% (364/1122), adjusted risk ratio 0.34 (95% CI: 0.17 to 0.65)] and for participants with unknown vs. positive HIV status [59/540 (11%) vs. 507/2155 (24%)]. Conclusions: We observed poorer adherence to TB diagnostic algorithms among HIV-positive persons tested initially with Xpert MTB/RIF vs. microscopy. Poor adherence to TB diagnostic algorithms and incomplete coverage of HIV testing represents a missed opportunity to diagnose TB and HIV, and may contribute to TB mortality. PMID:26966843
Energy Science and Technology Software Center (ESTSC)
2013-07-29
The OpenEIS Algorithm package seeks to provide a low-risk path for building owners, service providers and managers to explore analytical methods for improving building control and operational efficiency. Users of this software can analyze building data, and learn how commercial implementations would provide long-term value. The code also serves as a reference implementation for developers who wish to adapt the algorithms for use in commercial tools or service offerings.
Brunet-Benkhoucha, Malik; Verhaegen, Frank; Lassalle, Stephanie; Beliveau-Nadeau, Dominic; Reniers, Brigitte; Donath, David; Taussky, Daniel; Carrier, Jean-Francois
2009-11-15
Purpose: The low dose rate brachytherapy procedure would benefit from an intraoperative postimplant dosimetry verification technique to identify possible suboptimal dose coverage and suggest a potential reimplantation. The main objective of this project is to develop an efficient, operator-free, intraoperative seed detection technique using the imaging modalities available in a low dose rate brachytherapy treatment room. Methods: This intraoperative detection allows a complete dosimetry calculation that can be performed right after an I-125 prostate seed implantation, while the patient is still under anesthesia. To accomplish this, a digital tomosynthesis-based algorithm was developed. This automatic filtered reconstruction of the 3D volume requires seven projections acquired over a total angle of 60 deg. with an isocentric imaging system. Results: A phantom study was performed to validate the technique that was used in a retrospective clinical study involving 23 patients. In the patient study, the automatic tomosynthesis-based reconstruction yielded seed detection rates of 96.7% and 2.6% false positives. The seed localization error obtained with a phantom study is 0.4{+-}0.4 mm. The average time needed for reconstruction is below 1 min. The reconstruction algorithm also provides the seed orientation with an uncertainty of 10 deg. {+-}8 deg. The seed detection algorithm presented here is reliable and was efficiently used in the clinic. Conclusions: When combined with an appropriate coregistration technique to identify the organs in the seed coordinate system, this algorithm will offer new possibilities for a next generation of clinical brachytherapy systems.
NASA Astrophysics Data System (ADS)
Rodebaugh, Raymond Francis, Jr.
2000-11-01
In this project we applied modifications of the Fermi- Eyges multiple scattering theory to attempt to achieve the goals of a fast, accurate electron dose calculation algorithm. The dose was first calculated for an ``average configuration'' based on the patient's anatomy using a modification of the Hogstrom algorithm. It was split into a measured central axis depth dose component based on the material between the source and the dose calculation point, and an off-axis component based on the physics of multiple coulomb scattering for the average configuration. The former provided the general depth dose characteristics along the beam fan lines, while the latter provided the effects of collimation. The Gaussian localized heterogeneities theory of Jette provided the lateral redistribution of the electron fluence by heterogeneities. Here we terminated Jette's infinite series of fluence redistribution terms after the second term. Experimental comparison data were collected for 1 cm thick x 1 cm diameter air and aluminum pillboxes using the Varian 2100C linear accelerator at Rush-Presbyterian- St. Luke's Medical Center. For an air pillbox, the algorithm results were in reasonable agreement with measured data at both 9 and 20 MeV. For the Aluminum pill box, there were significant discrepancies between the results of this algorithm and experiment. This was particularly apparent for the 9 MeV beam. Of course a one cm thick Aluminum heterogeneity is unlikely to be encountered in a clinical situation; the thickness, linear stopping power, and linear scattering power of Aluminum are all well above what would normally be encountered. We found that the algorithm is highly sensitive to the choice of the average configuration. This is an indication that the series of fluence redistribution terms does not converge fast enough to terminate after the second term. It also makes it difficult to apply the algorithm to cases where there are no a priori means of choosing the best average
Ojala, Jarkko; Kapanen, Mika; Hyödynmaa, Simo
2016-06-01
New version 13.6.23 of the electron Monte Carlo (eMC) algorithm in Varian Eclipse™ treatment planning system has a model for 4MeV electron beam and some general improvements for dose calculation. This study provides the first overall accuracy assessment of this algorithm against full Monte Carlo (MC) simulations for electron beams from 4MeV to 16MeV with most emphasis on the lower energy range. Beams in a homogeneous water phantom and clinical treatment plans were investigated including measurements in the water phantom. Two different material sets were used with full MC: (1) the one applied in the eMC algorithm and (2) the one included in the Eclipse™ for other algorithms. The results of clinical treatment plans were also compared to those of the older eMC version 11.0.31. In the water phantom the dose differences against the full MC were mostly less than 3% with distance-to-agreement (DTA) values within 2mm. Larger discrepancies were obtained in build-up regions, at depths near the maximum electron ranges and with small apertures. For the clinical treatment plans the overall dose differences were mostly within 3% or 2mm with the first material set. Larger differences were observed for a large 4MeV beam entering curved patient surface with extended SSD and also in regions of large dose gradients. Still the DTA values were within 3mm. The discrepancies between the eMC and the full MC were generally larger for the second material set. The version 11.0.31 performed always inferiorly, when compared to the 13.6.23. PMID:27189311
Competing Sudakov veto algorithms
NASA Astrophysics Data System (ADS)
Kleiss, Ronald; Verheyen, Rob
2016-07-01
We present a formalism to analyze the distribution produced by a Monte Carlo algorithm. We perform these analyses on several versions of the Sudakov veto algorithm, adding a cutoff, a second variable and competition between emission channels. The formal analysis allows us to prove that multiple, seemingly different competition algorithms, including those that are currently implemented in most parton showers, lead to the same result. Finally, we test their performance in a semi-realistic setting and show that there are significantly faster alternatives to the commonly used algorithms.
Algorithm Visualization System for Teaching Spatial Data Algorithms
ERIC Educational Resources Information Center
Nikander, Jussi; Helminen, Juha; Korhonen, Ari
2010-01-01
TRAKLA2 is a web-based learning environment for data structures and algorithms. The system delivers automatically assessed algorithm simulation exercises that are solved using a graphical user interface. In this work, we introduce a novel learning environment for spatial data algorithms, SDA-TRAKLA2, which has been implemented on top of the…
NASA Astrophysics Data System (ADS)
Solanki, K.; Hauksson, E.; Kanamori, H.; Friberg, P.; Wu, Y.
2006-12-01
A necessary first step toward the goal of implementing proof-of-concept projects for earthquake early warning (EEW) is the real-time testing of the seismological algorithms. To provide the most appropriate environment, the CISN has designed and implemented a platform for such testing. We are testing the amplitude (Pd) and period (Tau-c) monitor developed for providing on-site earthquake early warning (EEW) using data from a single seismic station. We have designed and implemented a framework generator that can automatically generate code for waveform- processing systems. The framework generator is based on Code Worker software www.codeworker.org, which provides APIs and a scripting language to build parsers and template processing engines. Higher-level description of the waveform processing system is required to generate the waveform-processing framework. We have implemented Domain Specific Language DSL to provide description of the waveform-processing framework. The framework generator allows the developer to focus more on the waveform processing algorithms and frees him/her from repetitive and tedious coding tasks. It also has an automatic gap detector, transparent buffer management, and built in thread management. We have implemented the waveform-processing framework to process real-time waveforms coming from the dataloggers deployed throughout southern California by the Southern California Seismic Network. The system also has the capability of processing data from archived events to facilitate off-line testing. An application feeds data from MiniSEED packets into the Wave Data Area (WDA). The system that grabs the data from the WDA processes each real-time data stream independently. To verify results, sac files are generated at each processing step. Currently, we are processing broadband data streams from 160 stations and determining Pd and Tau-c as local earthquakes occur in southern California. We present the results from this testing and compare the
High-performance combinatorial algorithms
Pinar, Ali
2003-10-31
Combinatorial algorithms have long played an important role in many applications of scientific computing such as sparse matrix computations and parallel computing. The growing importance of combinatorial algorithms in emerging applications like computational biology and scientific data mining calls for development of a high performance library for combinatorial algorithms. Building such a library requires a new structure for combinatorial algorithms research that enables fast implementation of new algorithms. We propose a structure for combinatorial algorithms research that mimics the research structure of numerical algorithms. Numerical algorithms research is nicely complemented with high performance libraries, and this can be attributed to the fact that there are only a small number of fundamental problems that underlie numerical solvers. Furthermore there are only a handful of kernels that enable implementation of algorithms for these fundamental problems. Building a similar structure for combinatorial algorithms will enable efficient implementations for existing algorithms and fast implementation of new algorithms. Our results will promote utilization of combinatorial techniques and will impact research in many scientific computing applications, some of which are listed.
Kannan, Ravishekar; Guo, Peng; Przekwas, Andrzej
2016-06-01
This paper is the first in a series wherein efficient computational methods are developed and implemented to accurately quantify the transport, deposition, and clearance of the microsized particles (range of interest: 2 to 10 µm) in the human respiratory tract. In particular, this paper (part I) deals with (i) development of a detailed 3D computational finite volume mesh comprising of the NOPL (nasal, oral, pharyngeal and larynx), trachea and several airway generations; (ii) use of CFD Research Corporation's finite volume Computational Biology (CoBi) flow solver to obtain the flow physics for an oral inhalation simulation; (iii) implement a novel and accurate nodal inverse distance weighted Eulerian-Lagrangian formulation to accurately obtain the deposition, and (iv) development of Wind-Kessel boundary condition algorithm. This new Wind-Kessel boundary condition algorithm allows the 'escaped' particles to reenter the airway through the outlets, thereby to an extent accounting for the drawbacks of having a finite number of lung generations in the computational mesh. The deposition rates in the NOPL, trachea, the first and second bifurcation were computed, and they were in reasonable accord with the Typical Path Length model. The quantitatively validated results indicate that these developments will be useful for (i) obtaining depositions in diseased lungs (because of asthma and COPD), for which there are no empirical models, and (ii) obtaining the secondary clearance (mucociliary clearance) of the deposited particles. Copyright © 2015 John Wiley & Sons, Ltd. PMID:26317686
Semioptimal practicable algorithmic cooling
Elias, Yuval; Mor, Tal; Weinstein, Yossi
2011-04-15
Algorithmic cooling (AC) of spins applies entropy manipulation algorithms in open spin systems in order to cool spins far beyond Shannon's entropy bound. Algorithmic cooling of nuclear spins was demonstrated experimentally and may contribute to nuclear magnetic resonance spectroscopy. Several cooling algorithms were suggested in recent years, including practicable algorithmic cooling (PAC) and exhaustive AC. Practicable algorithms have simple implementations, yet their level of cooling is far from optimal; exhaustive algorithms, on the other hand, cool much better, and some even reach (asymptotically) an optimal level of cooling, but they are not practicable. We introduce here semioptimal practicable AC (SOPAC), wherein a few cycles (typically two to six) are performed at each recursive level. Two classes of SOPAC algorithms are proposed and analyzed. Both attain cooling levels significantly better than PAC and are much more efficient than the exhaustive algorithms. These algorithms are shown to bridge the gap between PAC and exhaustive AC. In addition, we calculated the number of spins required by SOPAC in order to purify qubits for quantum computation. As few as 12 and 7 spins are required (in an ideal scenario) to yield a mildly pure spin (60% polarized) from initial polarizations of 1% and 10%, respectively. In the latter case, about five more spins are sufficient to produce a highly pure spin (99.99% polarized), which could be relevant for fault-tolerant quantum computing.
NASA Technical Reports Server (NTRS)
Wang, Lui; Bayer, Steven E.
1991-01-01
Genetic algorithms are mathematical, highly parallel, adaptive search procedures (i.e., problem solving methods) based loosely on the processes of natural genetics and Darwinian survival of the fittest. Basic genetic algorithms concepts are introduced, genetic algorithm applications are introduced, and results are presented from a project to develop a software tool that will enable the widespread use of genetic algorithm technology.
Williams, P. T.; Dickson, T. L.; Yin, S.
2007-12-01
The current regulations to insure that nuclear reactor pressure vessels (RPVs) maintain their structural integrity when subjected to transients such as pressurized thermal shock (PTS) events were derived from computational models developed in the early-to-mid 1980s. Since that time, advancements and refinements in relevant technologies that impact RPV integrity assessment have led to an effort by the NRC to re-evaluate its PTS regulations. Updated computational methodologies have been developed through interactions between experts in the relevant disciplines of thermal hydraulics, probabilistic risk assessment, materials embrittlement, fracture mechanics, and inspection (flaw characterization). Contributors to the development of these methodologies include the NRC staff, their contractors, and representatives from the nuclear industry. These updated methodologies have been integrated into the Fracture Analysis of Vessels -- Oak Ridge (FAVOR, v06.1) computer code developed for the NRC by the Heavy Section Steel Technology (HSST) program at Oak Ridge National Laboratory (ORNL). The FAVOR, v04.1, code represents the baseline NRC-selected applications tool for re-assessing the current PTS regulations. This report is intended to document the technical bases for the assumptions, algorithms, methods, and correlations employed in the development of the FAVOR, v06.1, code.
NASA Astrophysics Data System (ADS)
Gentry, R. W.
2002-12-01
The Shelby Farms test site in Shelby County, Tennessee is being developed to better understand recharge hydraulics to the Memphis aquifer in areas where leakage through an overlying aquitard occurs. The site is unique in that it demonstrates many opportunities for interdisciplinary research regarding environmental tracers, anthropogenic impacts and inverse modeling. The objective of the research funding the development of the test site is to better understand the groundwater hydrology and hydraulics between a shallow alluvial aquifer and the Memphis aquifer given an area of leakage, defined as an aquitard window. The site is situated in an area on the boundary of a highly developed urban area and is currently being used by an agricultural research agency and a local recreational park authority. Also, an abandoned landfill is situated to the immediate south of the window location. Previous research by the USGS determined the location of the aquitard window subsequent to the landfill closure. Inverse modeling using a genetic algorithm approach has identified the likely extents of the area of the window given an interaquifer accretion rate. These results, coupled with additional fieldwork, have been used to guide the direction of the field studies and the overall design of the research project. This additional work has encompassed the drilling of additional monitoring wells in nested groups by rotasonic drilling methods. The core collected during the drilling will provide additional constraints to the physics of the problem that may provide additional help in redefining the conceptual model. The problem is non-unique with respect to the leakage area and accretion rate and further research is being performed to provide some idea of the advective flow paths using a combination of tritium and 3He analyses and geochemistry. The outcomes of the research will result in a set of benchmark data and physical infrastructure that can be used to evaluate other environmental
Old And New Algorithms For Toeplitz Systems
NASA Astrophysics Data System (ADS)
Brent, Richard P.
1988-02-01
Toeplitz linear systems and Toeplitz least squares problems commonly arise in digital signal processing. In this paper we survey some old, "well known" algorithms and some recent algorithms for solving these problems. We concentrate our attention on algorithms which can be implemented efficiently on a variety of parallel machines (including pipelined vector processors and systolic arrays). We distinguish between algorithms which require inner products, and algorithms which avoid inner products, and thus are better suited to parallel implementation on some parallel architectures. Finally, we mention some "asymptotically fast" 0(n(log n)2) algorithms and compare them with 0(n2) algorithms.
Jara, Maximilian; Reese, Tim; Malinowski, Maciej; Valle, Erika; Seehofer, Daniel; Puhl, Gero; Neuhaus, Peter; Pratschke, Johann; Stockmann, Martin
2015-01-01
Objectives Post-hepatectomy liver failure has a major impact on patient outcome. This study aims to explore the impact of the integration of a novel patient-centred evaluation, the LiMAx algorithm, on perioperative patient outcome after hepatectomy. Methods Trends in perioperative variables and morbidity and mortality rates in 1170 consecutive patients undergoing elective hepatectomy between January 2006 and December 2011 were analysed retrospectively. Propensity score matching was used to compare the effects on morbidity and mortality of the integration of the LiMAx algorithm into clinical practice. Results Over the study period, the proportion of complex hepatectomies increased from 29.1% in 2006 to 37.7% in 2011 (P = 0.034). Similarly, the proportion of patients with liver cirrhosis selected for hepatic surgery rose from 6.9% in 2006 to 11.3% in 2011 (P = 0.039). Despite these increases, rates of post-hepatectomy liver failure fell from 24.7% in 2006 to 9.0% in 2011 (P < 0.001) and liver failure-related postoperative mortality decreased from 4.0% in 2006 to 0.9% in 2011 (P = 0.014). Propensity score matching was associated with reduced rates of post-hepatectomy liver failure [24.7% (n = 77) versus 11.2% (n = 35); P < 0.001] and related mortality [3.8% (n = 12) versus 1.0% (n = 3); P = 0.035]. Conclusions Postoperative liver failure and postoperative liver failure-related mortality decreased in patients undergoing hepatectomy following the implementation of the LiMAx algorithm. PMID:26058324
Long-Boyle, Janel; Savic, Rada; Yan, Shirley; Bartelink, Imke; Musick, Lisa; French, Deborah; Law, Jason; Horn, Biljana; Cowan, Morton J.; Dvorak, Christopher C.
2014-01-01
Background Population pharmacokinetic (PK) studies of busulfan in children have shown that individualized model-based algorithms provide improved targeted busulfan therapy when compared to conventional dosing. The adoption of population PK models into routine clinical practice has been hampered by the tendency of pharmacologists to develop complex models too impractical for clinicians to use. The authors aimed to develop a population PK model for busulfan in children that can reliably achieve therapeutic exposure (concentration-at-steady-state, Css) and implement a simple, model-based tool for the initial dosing of busulfan in children undergoing HCT. Patients and Methods Model development was conducted using retrospective data available in 90 pediatric and young adult patients who had undergone HCT with busulfan conditioning. Busulfan drug levels and potential covariates influencing drug exposure were analyzed using the non-linear mixed effects modeling software, NONMEM. The final population PK model was implemented into a clinician-friendly, Microsoft Excel-based tool and used to recommend initial doses of busulfan in a group of 21 pediatric patients prospectively dosed based on the population PK model. Results Modeling of busulfan time-concentration data indicates busulfan CL displays non-linearity in children, decreasing up to approximately 20% between the concentrations of 250–2000 ng/mL. Important patient-specific covariates found to significantly impact busulfan CL were actual body weight and age. The percentage of individuals achieving a therapeutic Css was significantly higher in subjects receiving initial doses based on the population PK model (81%) versus historical controls dosed on conventional guidelines (52%) (p = 0.02). Conclusion When compared to the conventional dosing guidelines, the model-based algorithm demonstrates significant improvement for providing targeted busulfan therapy in children and young adults. PMID:25162216
NASA Technical Reports Server (NTRS)
Gregory, Kyle J.; Hill, Joanne E. (Editor); Black, J. Kevin; Baumgartner, Wayne H.; Jahoda, Keith
2016-01-01
A fundamental challenge in a spaceborne application of a gas-based Time Projection Chamber (TPC) for observation of X-ray polarization is handling the large amount of data collected. The TPC polarimeter described uses the APV-25 Application Specific Integrated Circuit (ASIC) to readout a strip detector. Two dimensional photoelectron track images are created with a time projection technique and used to determine the polarization of the incident X-rays. The detector produces a 128x30 pixel image per photon interaction with each pixel registering 12 bits of collected charge. This creates challenging requirements for data storage and downlink bandwidth with only a modest incidence of photons and can have a significant impact on the overall mission cost. An approach is described for locating and isolating the photoelectron track within the detector image, yielding a much smaller data product, typically between 8x8 pixels and 20x20 pixels. This approach is implemented using a Microsemi RT-ProASIC3-3000 Field-Programmable Gate Array (FPGA), clocked at 20 MHz and utilizing 10.7k logic gates (14% of FPGA), 20 Block RAMs (17% of FPGA), and no external RAM. Results will be presented, demonstrating successful photoelectron track cluster detection with minimal impact to detector dead-time.
NASA Astrophysics Data System (ADS)
Gregory, Kyle J.; Hill, Joanne E.; Black, J. Kevin; Baumgartner, Wayne H.; Jahoda, Keith
2016-05-01
A fundamental challenge in a spaceborne application of a gas-based Time Projection Chamber (TPC) for observation of X-ray polarization is handling the large amount of data collected. The TPC polarimeter described uses the APV-25 Application Specific Integrated Circuit (ASIC) to readout a strip detector. Two dimensional photo- electron track images are created with a time projection technique and used to determine the polarization of the incident X-rays. The detector produces a 128x30 pixel image per photon interaction with each pixel registering 12 bits of collected charge. This creates challenging requirements for data storage and downlink bandwidth with only a modest incidence of photons and can have a significant impact on the overall mission cost. An approach is described for locating and isolating the photoelectron track within the detector image, yielding a much smaller data product, typically between 8x8 pixels and 20x20 pixels. This approach is implemented using a Microsemi RT-ProASIC3-3000 Field-Programmable Gate Array (FPGA), clocked at 20 MHz and utilizing 10.7k logic gates (14% of FPGA), 20 Block RAMs (17% of FPGA), and no external RAM. Results will be presented, demonstrating successful photoelectron track cluster detection with minimal impact to detector dead-time.
Seamless Merging of Hypertext and Algorithm Animation
ERIC Educational Resources Information Center
Karavirta, Ville
2009-01-01
Online learning material that students use by themselves is one of the typical usages of algorithm animation (AA). Thus, the integration of algorithm animations into hypertext is seen as an important topic today to promote the usage of algorithm animation in teaching. This article presents an algorithm animation viewer implemented purely using…
An algorithm for generating abstract syntax trees
NASA Technical Reports Server (NTRS)
Noonan, R. E.
1985-01-01
The notion of an abstract syntax is discussed. An algorithm is presented for automatically deriving an abstract syntax directly from a BNF grammar. The implementation of this algorithm and its application to the grammar for Modula are discussed.
Two Algorithms for Processing Electronic Nose Data
NASA Technical Reports Server (NTRS)
Young, Rebecca; Linnell, Bruce
2007-01-01
Two algorithms for processing the digitized readings of electronic noses, and computer programs to implement the algorithms, have been devised in a continuing effort to increase the utility of electronic noses as means of identifying airborne compounds and measuring their concentrations. One algorithm identifies the two vapors in a two-vapor mixture and estimates the concentration of each vapor (in principle, this algorithm could be extended to more than two vapors). The other algorithm identifies a single vapor and estimates its concentration.
Parallel algorithms for unconstrained optimizations by multisplitting
He, Qing
1994-12-31
In this paper a new parallel iterative algorithm for unconstrained optimization using the idea of multisplitting is proposed. This algorithm uses the existing sequential algorithms without any parallelization. Some convergence and numerical results for this algorithm are presented. The experiments are performed on an Intel iPSC/860 Hyper Cube with 64 nodes. It is interesting that the sequential implementation on one node shows that if the problem is split properly, the algorithm converges much faster than one without splitting.
An efficient algorithm for function optimization: modified stem cells algorithm
NASA Astrophysics Data System (ADS)
Taherdangkoo, Mohammad; Paziresh, Mahsa; Yazdi, Mehran; Bagheri, Mohammad
2013-03-01
In this paper, we propose an optimization algorithm based on the intelligent behavior of stem cell swarms in reproduction and self-organization. Optimization algorithms, such as the Genetic Algorithm (GA), Particle Swarm Optimization (PSO) algorithm, Ant Colony Optimization (ACO) algorithm and Artificial Bee Colony (ABC) algorithm, can give solutions to linear and non-linear problems near to the optimum for many applications; however, in some case, they can suffer from becoming trapped in local optima. The Stem Cells Algorithm (SCA) is an optimization algorithm inspired by the natural behavior of stem cells in evolving themselves into new and improved cells. The SCA avoids the local optima problem successfully. In this paper, we have made small changes in the implementation of this algorithm to obtain improved performance over previous versions. Using a series of benchmark functions, we assess the performance of the proposed algorithm and compare it with that of the other aforementioned optimization algorithms. The obtained results prove the superiority of the Modified Stem Cells Algorithm (MSCA).
Linearization algorithms for line transfer
Scott, H.A.
1990-11-06
Complete linearization is a very powerful technique for solving multi-line transfer problems that can be used efficiently with a variety of transfer formalisms. The linearization algorithm we describe is computationally very similar to ETLA, but allows an effective treatment of strongly-interacting lines. This algorithm has been implemented (in several codes) with two different transfer formalisms in all three one-dimensional geometries. We also describe a variation of the algorithm that handles saturable laser transport. Finally, we present a combination of linearization with a local approximate operator formalism, which has been implemented in two dimensions and is being developed in three dimensions. 11 refs.
STAR Algorithm Integration Team - Facilitating operational algorithm development
NASA Astrophysics Data System (ADS)
Mikles, V. J.
2015-12-01
The NOAA/NESDIS Center for Satellite Research and Applications (STAR) provides technical support of the Joint Polar Satellite System (JPSS) algorithm development and integration tasks. Utilizing data from the S-NPP satellite, JPSS generates over thirty Environmental Data Records (EDRs) and Intermediate Products (IPs) spanning atmospheric, ocean, cryosphere, and land weather disciplines. The Algorithm Integration Team (AIT) brings technical expertise and support to product algorithms, specifically in testing and validating science algorithms in a pre-operational environment. The AIT verifies that new and updated algorithms function in the development environment, enforces established software development standards, and ensures that delivered packages are functional and complete. AIT facilitates the development of new JPSS-1 algorithms by implementing a review approach based on the Enterprise Product Lifecycle (EPL) process. Building on relationships established during the S-NPP algorithm development process and coordinating directly with science algorithm developers, the AIT has implemented structured reviews with self-contained document suites. The process has supported algorithm improvements for products such as ozone, active fire, vegetation index, and temperature and moisture profiles.
The Chopthin Algorithm for Resampling
NASA Astrophysics Data System (ADS)
Gandy, Axel; Lau, F. Din-Houn
2016-08-01
Resampling is a standard step in particle filters and more generally sequential Monte Carlo methods. We present an algorithm, called chopthin, for resampling weighted particles. In contrast to standard resampling methods the algorithm does not produce a set of equally weighted particles; instead it merely enforces an upper bound on the ratio between the weights. Simulation studies show that the chopthin algorithm consistently outperforms standard resampling methods. The algorithms chops up particles with large weight and thins out particles with low weight, hence its name. It implicitly guarantees a lower bound on the effective sample size. The algorithm can be implemented efficiently, making it practically useful. We show that the expected computational effort is linear in the number of particles. Implementations for C++, R (on CRAN), Python and Matlab are available.
NASA Astrophysics Data System (ADS)
Abrams, Daniel S.
This thesis describes several new quantum algorithms. These include a polynomial time algorithm that uses a quantum fast Fourier transform to find eigenvalues and eigenvectors of a Hamiltonian operator, and that can be applied in cases (commonly found in ab initio physics and chemistry problems) for which all known classical algorithms require exponential time. Fast algorithms for simulating many body Fermi systems are also provided in both first and second quantized descriptions. An efficient quantum algorithm for anti-symmetrization is given as well as a detailed discussion of a simulation of the Hubbard model. In addition, quantum algorithms that calculate numerical integrals and various characteristics of stochastic processes are described. Two techniques are given, both of which obtain an exponential speed increase in comparison to the fastest known classical deterministic algorithms and a quadratic speed increase in comparison to classical Monte Carlo (probabilistic) methods. I derive a simpler and slightly faster version of Grover's mean algorithm, show how to apply quantum counting to the problem, develop some variations of these algorithms, and show how both (apparently distinct) approaches can be understood from the same unified framework. Finally, the relationship between physics and computation is explored in some more depth, and it is shown that computational complexity theory depends very sensitively on physical laws. In particular, it is shown that nonlinear quantum mechanics allows for the polynomial time solution of NP-complete and #P oracle problems. Using the Weinberg model as a simple example, the explicit construction of the necessary gates is derived from the underlying physics. Nonlinear quantum algorithms are also presented using Polchinski type nonlinearities which do not allow for superluminal communication. (Copies available exclusively from MIT Libraries, Rm. 14- 0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)
A Robustly Stabilizing Model Predictive Control Algorithm
NASA Technical Reports Server (NTRS)
Ackmece, A. Behcet; Carson, John M., III
2007-01-01
A model predictive control (MPC) algorithm that differs from prior MPC algorithms has been developed for controlling an uncertain nonlinear system. This algorithm guarantees the resolvability of an associated finite-horizon optimal-control problem in a receding-horizon implementation.
In-Trail Procedure (ITP) Algorithm Design
NASA Technical Reports Server (NTRS)
Munoz, Cesar A.; Siminiceanu, Radu I.
2007-01-01
The primary objective of this document is to provide a detailed description of the In-Trail Procedure (ITP) algorithm, which is part of the Airborne Traffic Situational Awareness In-Trail Procedure (ATSA-ITP) application. To this end, the document presents a high level description of the ITP Algorithm and a prototype implementation of this algorithm in the programming language C.
Unifying parametrized VLSI Jacobi algorithms and architectures
NASA Astrophysics Data System (ADS)
Deprettere, Ed F. A.; Moonen, Marc
1993-11-01
Implementing Jacobi algorithms in parallel VLSI processor arrays is a non-trivial task, in particular when the algorithms are parametrized with respect to size and the architectures are parametrized with respect to space-time trade-offs. The paper is concerned with an approach to implement several time-adaptive Jacobi-type algorithms on a parallel processor array, using only Cordic arithmetic and asynchronous communications, such that any degree of parallelism, ranging from single-processor up to full-size array implementation, is supported by a `universal' processing unit. This result is attributed to a gracious interplay between algorithmic and architectural engineering.
The Superior Lambert Algorithm
NASA Astrophysics Data System (ADS)
der, G.
2011-09-01
Lambert algorithms are used extensively for initial orbit determination, mission planning, space debris correlation, and missile targeting, just to name a few applications. Due to the significance of the Lambert problem in Astrodynamics, Gauss, Battin, Godal, Lancaster, Gooding, Sun and many others (References 1 to 15) have provided numerous formulations leading to various analytic solutions and iterative methods. Most Lambert algorithms and their computer programs can only work within one revolution, break down or converge slowly when the transfer angle is near zero or 180 degrees, and their multi-revolution limitations are either ignored or barely addressed. Despite claims of robustness, many Lambert algorithms fail without notice, and the users seldom have a clue why. The DerAstrodynamics lambert2 algorithm, which is based on the analytic solution formulated by Sun, works for any number of revolutions and converges rapidly at any transfer angle. It provides significant capability enhancements over every other Lambert algorithm in use today. These include improved speed, accuracy, robustness, and multirevolution capabilities as well as implementation simplicity. Additionally, the lambert2 algorithm provides a powerful tool for solving the angles-only problem without artificial singularities (pointed out by Gooding in Reference 16), which involves 3 lines of sight captured by optical sensors, or systems such as the Air Force Space Surveillance System (AFSSS). The analytic solution is derived from the extended Godal’s time equation by Sun, while the iterative method of solution is that of Laguerre, modified for robustness. The Keplerian solution of a Lambert algorithm can be extended to include the non-Keplerian terms of the Vinti algorithm via a simple targeting technique (References 17 to 19). Accurate analytic non-Keplerian trajectories can be predicted for satellites and ballistic missiles, while performing at least 100 times faster in speed than most
Efficient demultiplexing algorithm for noncontiguous carriers
NASA Technical Reports Server (NTRS)
Thanawala, A. A.; Kwatra, S. C.; Jamali, M. M.; Budinger, J.
1992-01-01
A channel separation algorithm for the frequency division multiple access/time division multiplexing (FDMA/TDM) scheme is presented. It is shown that implementation using this algorithm can be more effective than the fast Fourier transform (FFT) algorithm when only a small number of carriers need to be selected from many, such as satellite Earth terminals. The algorithm is based on polyphase filtering followed by application of a generalized Walsh-Hadamard transform (GWHT). Comparison of the transform technique used in this algorithm with discrete Fourier transform (DFT) and FFT is given. Estimates of the computational rates and power requirements to implement this system are also given.
Sobel, E.; Lange, K.; O`Connell, J.R.
1996-12-31
Haplotyping is the logical process of inferring gene flow in a pedigree based on phenotyping results at a small number of genetic loci. This paper formalizes the haplotyping problem and suggests four algorithms for haplotype reconstruction. These algorithms range from exhaustive enumeration of all haplotype vectors to combinatorial optimization by simulated annealing. Application of the algorithms to published genetic analyses shows that manual haplotyping is often erroneous. Haplotyping is employed in screening pedigrees for phenotyping errors and in positional cloning of disease genes from conserved haplotypes in population isolates. 26 refs., 6 figs., 3 tabs.
Power spectral estimation algorithms
NASA Technical Reports Server (NTRS)
Bhatia, Manjit S.
1989-01-01
Algorithms to estimate the power spectrum using Maximum Entropy Methods were developed. These algorithms were coded in FORTRAN 77 and were implemented on the VAX 780. The important considerations in this analysis are: (1) resolution, i.e., how close in frequency two spectral components can be spaced and still be identified; (2) dynamic range, i.e., how small a spectral peak can be, relative to the largest, and still be observed in the spectra; and (3) variance, i.e., how accurate the estimate of the spectra is to the actual spectra. The application of the algorithms based on Maximum Entropy Methods to a variety of data shows that these criteria are met quite well. Additional work in this direction would help confirm the findings. All of the software developed was turned over to the technical monitor. A copy of a typical program is included. Some of the actual data and graphs used on this data are also included.
NASA Astrophysics Data System (ADS)
Reda, Ibrahim; Andreas, Afshin
2015-04-01
The Solar Position Algorithm (SPA) calculates the solar zenith and azimuth angles in the period from the year -2000 to 6000, with uncertainties of +/- 0.0003 degrees based on the date, time, and location on Earth. SPA is implemented in C; in addition to being available for download, an online calculator using this code is available at http://www.nrel.gov/midc/solpos/spa.html.
CORDIC Algorithms: Theory And Extensions
NASA Astrophysics Data System (ADS)
Delosme, Jean-Marc
1989-11-01
Optimum algorithms for signal processing are notoriously costly to implement since they usually require intensive linear algebra operations to be performed at very high rates. In these cases a cost-effective solution is to design a pipelined or parallel architecture with special-purpose VLSI processors. One may often lower the hardware cost of such a dedicated architecture by using processors that implement CORDIC-like arithmetic algorithms. Indeed, with CORDIC algorithms, the evaluation and the application of an operation, such as determining a rotation that brings a vector onto another one and rotating other vectors by that amount, require the same time on identical processors and can be fully overlapped in most cases, thus leading to highly efficient implementations. We have shown earlier that a necessary condition for a CORDIC-type algorithm to exist is that the function to be implemented can be represented in terms of a matrix exponential. This paper refines this condition to the ability to represent , the desired function in terms of a rational representation of a matrix exponential. This insight gives us a powerful tool for the design of new CORDIC algorithms. This is demonstrated by rederiving classical CORDIC algorithms and introducing several new ones, for Jacobi rotations, three and higher dimensional rotations, etc.
Developing dataflow algorithms
Hiromoto, R.E. ); Bohm, A.P.W. . Dept. of Computer Science)
1991-01-01
Our goal is to study the performance of a collection of numerical algorithms written in Id which is available to users of Motorola's dataflow machine Monsoon. We will study the dataflow performance of these implementations first under the parallel profiling simulator Id World, and second in comparison with actual dataflow execution on the Motorola Monsoon. This approach will allow us to follow the computational and structural details of the parallel algorithms as implemented on dataflow systems. When running our programs on the Id World simulator we will examine the behaviour of algorithms at dataflow graph level, where each instruction takes one timestep and data becomes available at the next. This implies that important machine level phenomena such as the effect that global communication time may have on the computation are not addressed. These phenomena will be addressed when we run our programs on the Monsoon hardware. Potential ramifications for compilation techniques, functional programming style, and program efficiency are significant to this study. In a later stage of our research we will compare the efficiency of Id programs to programs written in other languages. This comparison will be of a rather qualitative nature as there are too many degrees of freedom in a language implementation for a quantitative comparison to be of interest. We begin our study by examining one routine that exhibit different computational characteristics. This routine and its corresponding characteristics is Fast Fourier Transforms; computational parallelism and data dependences between the butterfly shuffles.
Cobweb/3: A portable implementation
NASA Technical Reports Server (NTRS)
Mckusick, Kathleen; Thompson, Kevin
1990-01-01
An algorithm is examined for data clustering and incremental concept formation. An overview is given of the Cobweb/3 system and the algorithm on which it is based, as well as the practical details of obtaining and running the system code. The implementation features a flexible user interface which includes a graphical display of the concept hierarchies that the system constructs.
Algorithms and Requirements for Measuring Network Bandwidth
Jin, Guojun
2002-12-08
This report unveils new algorithms for actively measuring (not estimating) available bandwidths with very low intrusion, computing cross traffic, thus estimating the physical bandwidth, provides mathematical proof that the algorithms are accurate, and addresses conditions, requirements, and limitations for new and existing algorithms for measuring network bandwidths. The paper also discusses a number of important terminologies and issues for network bandwidth measurement, and introduces a fundamental parameter -Maximum Burst Size that is critical for implementing algorithms based on multiple packets.
TVFMCATS. Time Variant Floating Mean Counting Algorithm
Huffman, R.K.
1999-05-01
This software was written to test a time variant floating mean counting algorithm. The algorithm was developed by Westinghouse Savannah River Company and a provisional patent has been filed on the algorithm. The test software was developed to work with the Val Tech model IVB prototype version II count rate meter hardware. The test software was used to verify the algorithm developed by WSRC could be correctly implemented with the vendor`s hardware.
Time Variant Floating Mean Counting Algorithm
Energy Science and Technology Software Center (ESTSC)
1999-06-03
This software was written to test a time variant floating mean counting algorithm. The algorithm was developed by Westinghouse Savannah River Company and a provisional patent has been filed on the algorithm. The test software was developed to work with the Val Tech model IVB prototype version II count rate meter hardware. The test software was used to verify the algorithm developed by WSRC could be correctly implemented with the vendor''s hardware.
Research on Routing Selection Algorithm Based on Genetic Algorithm
NASA Astrophysics Data System (ADS)
Gao, Guohong; Zhang, Baojian; Li, Xueyong; Lv, Jinna
The hereditary algorithm is a kind of random searching and method of optimizing based on living beings natural selection and hereditary mechanism. In recent years, because of the potentiality in solving complicate problems and the successful application in the fields of industrial project, hereditary algorithm has been widely concerned by the domestic and international scholar. Routing Selection communication has been defined a standard communication model of IP version 6.This paper proposes a service model of Routing Selection communication, and designs and implements a new Routing Selection algorithm based on genetic algorithm.The experimental simulation results show that this algorithm can get more resolution at less time and more balanced network load, which enhances search ratio and the availability of network resource, and improves the quality of service.
Protein structure optimization with a "Lamarckian" ant colony algorithm.
Oakley, Mark T; Richardson, E Grace; Carr, Harriet; Johnston, Roy L
2013-01-01
We describe the LamarckiAnt algorithm: a search algorithm that combines the features of a "Lamarckian" genetic algorithm and ant colony optimization. We have implemented this algorithm for the optimization of BLN model proteins, which have frustrated energy landscapes and represent a challenge for global optimization algorithms. We demonstrate that LamarckiAnt performs competitively with other state-of-the-art optimization algorithms. PMID:24407312
Chang, C.Y.
1986-01-01
New results on efficient forms of decoding convolutional codes based on Viterbi and stack algorithms using systolic array architecture are presented. Some theoretical aspects of systolic arrays are also investigated. First, systolic array implementation of Viterbi algorithm is considered, and various properties of convolutional codes are derived. A technique called strongly connected trellis decoding is introduced to increase the efficient utilization of all the systolic array processors. The issues dealing with the composite branch metric generation, survivor updating, overall system architecture, throughput rate, and computations overhead ratio are also investigated. Second, the existing stack algorithm is modified and restated in a more concise version so that it can be efficiently implemented by a special type of systolic array called systolic priority queue. Three general schemes of systolic priority queue based on random access memory, shift register, and ripple register are proposed. Finally, a systematic approach is presented to design systolic arrays for certain general classes of recursively formulated algorithms.
Variable Selection using MM Algorithms
Hunter, David R.; Li, Runze
2009-01-01
Variable selection is fundamental to high-dimensional statistical modeling. Many variable selection techniques may be implemented by maximum penalized likelihood using various penalty functions. Optimizing the penalized likelihood function is often challenging because it may be nondifferentiable and/or nonconcave. This article proposes a new class of algorithms for finding a maximizer of the penalized likelihood for a broad class of penalty functions. These algorithms operate by perturbing the penalty function slightly to render it differentiable, then optimizing this differentiable function using a minorize-maximize (MM) algorithm. MM algorithms are useful extensions of the well-known class of EM algorithms, a fact that allows us to analyze the local and global convergence of the proposed algorithm using some of the techniques employed for EM algorithms. In particular, we prove that when our MM algorithms converge, they must converge to a desirable point; we also discuss conditions under which this convergence may be guaranteed. We exploit the Newton-Raphson-like aspect of these algorithms to propose a sandwich estimator for the standard errors of the estimators. Our method performs well in numerical tests. PMID:19458786
NASA Technical Reports Server (NTRS)
Barth, Timothy J.; Lomax, Harvard
1987-01-01
The past decade has seen considerable activity in algorithm development for the Navier-Stokes equations. This has resulted in a wide variety of useful new techniques. Some examples for the numerical solution of the Navier-Stokes equations are presented, divided into two parts. One is devoted to the incompressible Navier-Stokes equations, and the other to the compressible form.
CORDIC algorithms in four dimensions
NASA Astrophysics Data System (ADS)
Delosme, Jean-Marc; Hsiao, Shen-Fu
1990-11-01
CORDIC algorithms offer an attractive alternative to multiply-and-add based algorithms for the implementation of two-dimensional rotations preserving either norm: (x2 + 2) or (x2 _ y2)/2 Indeed these norms whose computation is a significant part of the evaluation of the two-dimensional rotations are computed much more easily by the CORDIC algorithms. However the part played by norm computations in the evaluation of rotations becomes quickly small as the dimension of the space increases. Thus in spaces of dimension 5 or more there is no practical alternative to multiply-and-add based algorithms. In the intermediate region dimensions 3 and 4 extensions of the CORDIC algorithms are an interesting option. The four-dimensional extensions are particularly elegant and are the main object of this paper.
Tomasz Plawski, J. Hovater
2010-09-01
A digital low level radio frequency (RF) system typically incorporates either a heterodyne or direct sampling technique, followed by fast ADCs, then an FPGA, and finally a transmitting DAC. This universal platform opens up the possibilities for a variety of control algorithm implementations. The foremost concern for an RF control system is cavity field stability, and to meet the required quality of regulation, the chosen control system needs to have sufficient feedback gain. In this paper we will investigate the effectiveness of the regulation for three basic control system algorithms: I&Q (In-phase and Quadrature), Amplitude & Phase and digital SEL (Self Exciting Loop) along with the example of the Jefferson Lab 12 GeV cavity field control system.
NASA Astrophysics Data System (ADS)
Deprit, André; Palacián, Jesúus; Deprit, Etienne
2001-03-01
The relegation algorithm extends the method of normalization by Lie transformations. Given a Hamiltonian that is a power series ℋ = ℋ0+ ɛℋ1+ ... of a small parameter ɛ, normalization constructs a map which converts the principal part ℋ0into an integral of the transformed system — relegation does the same for an arbitrary function ℋ[G]. If the Lie derivative induced by ℋ[G] is semi-simple, a double recursion produces the generator of the relegating transformation. The relegation algorithm is illustrated with an elementary example borrowed from galactic dynamics; the exercise serves as a standard against which to test software implementations. Relegation is also applied to the more substantial example of a Keplerian system perturbed by radiation pressure emanating from a rotating source.
Reactive Collision Avoidance Algorithm
NASA Technical Reports Server (NTRS)
Scharf, Daniel; Acikmese, Behcet; Ploen, Scott; Hadaegh, Fred
2010-01-01
The reactive collision avoidance (RCA) algorithm allows a spacecraft to find a fuel-optimal trajectory for avoiding an arbitrary number of colliding spacecraft in real time while accounting for acceleration limits. In addition to spacecraft, the technology can be used for vehicles that can accelerate in any direction, such as helicopters and submersibles. In contrast to existing, passive algorithms that simultaneously design trajectories for a cluster of vehicles working to achieve a common goal, RCA is implemented onboard spacecraft only when an imminent collision is detected, and then plans a collision avoidance maneuver for only that host vehicle, thus preventing a collision in an off-nominal situation for which passive algorithms cannot. An example scenario for such a situation might be when a spacecraft in the cluster is approaching another one, but enters safe mode and begins to drift. Functionally, the RCA detects colliding spacecraft, plans an evasion trajectory by solving the Evasion Trajectory Problem (ETP), and then recovers after the collision is avoided. A direct optimization approach was used to develop the algorithm so it can run in real time. In this innovation, a parameterized class of avoidance trajectories is specified, and then the optimal trajectory is found by searching over the parameters. The class of trajectories is selected as bang-off-bang as motivated by optimal control theory. That is, an avoiding spacecraft first applies full acceleration in a constant direction, then coasts, and finally applies full acceleration to stop. The parameter optimization problem can be solved offline and stored as a look-up table of values. Using a look-up table allows the algorithm to run in real time. Given a colliding spacecraft, the properties of the collision geometry serve as indices of the look-up table that gives the optimal trajectory. For multiple colliding spacecraft, the set of trajectories that avoid all spacecraft is rapidly searched on
Efficient multicomponent fuel algorithm
NASA Astrophysics Data System (ADS)
Torres, D. J.; O'Rourke, P. J.; Amsden, A. A.
2003-03-01
We derive equations for multicomponent fuel evaporation in airborne fuel droplets and wall films, and implement the model into KIVA-3V. Temporal and spatial variations in liquid droplet composition and temperature are not modelled but solved for by discretizing the interior of the droplet in an implicit and computationally efficient way. We find that an interior discretization is necessary to correctly compute the evolution of the droplet composition. The details of the one-dimensional numerical algorithm are described. Numerical simulations of multicomponent evaporation are performed for single droplets and compared to experimental data.
NASA Technical Reports Server (NTRS)
Vardi, A.
1984-01-01
The representation min t s.t. F(I)(x). - t less than or equal to 0 for all i is examined. An active set strategy is designed of functions: active, semi-active, and non-active. This technique will help in preventing zigzagging which often occurs when an active set strategy is used. Some of the inequality constraints are handled with slack variables. Also a trust region strategy is used in which at each iteration there is a sphere around the current point in which the local approximation of the function is trusted. The algorithm is implemented into a successful computer program. Numerical results are provided.
Myers, Timothy
2006-09-01
The use of protocols or care algorithms in medical facilities has increased in the managed care environment. The definition and application of care algorithms, with a particular focus on the treatment of acute bronchospasm, are explored in this review. The benefits and goals of using protocols, especially in the treatment of asthma, to standardize patient care based on clinical guidelines and evidence-based medicine are explained. Ideally, evidence-based protocols should translate research findings into best medical practices that would serve to better educate patients and their medical providers who are administering these protocols. Protocols should include evaluation components that can monitor, through some mechanism of quality assurance, the success and failure of the instrument so that modifications can be made as necessary. The development and design of an asthma care algorithm can be accomplished by using a four-phase approach: phase 1, identifying demographics, outcomes, and measurement tools; phase 2, reviewing, negotiating, and standardizing best practice; phase 3, testing and implementing the instrument and collecting data; and phase 4, analyzing the data and identifying areas of improvement and future research. The experiences of one medical institution that implemented an asthma care algorithm in the treatment of pediatric asthma are described. Their care algorithms served as tools for decision makers to provide optimal asthma treatment in children. In addition, the studies that used the asthma care algorithm to determine the efficacy and safety of ipratropium bromide and levalbuterol in children with asthma are described. PMID:16945065
Threshold extended ID3 algorithm
NASA Astrophysics Data System (ADS)
Kumar, A. B. Rajesh; Ramesh, C. Phani; Madhusudhan, E.; Padmavathamma, M.
2012-04-01
Information exchange over insecure networks needs to provide authentication and confidentiality to the database in significant problem in datamining. In this paper we propose a novel authenticated multiparty ID3 Algorithm used to construct multiparty secret sharing decision tree for implementation in medical transactions.
SMAP's Radar OBP Algorithm Development
NASA Technical Reports Server (NTRS)
Le, Charles; Spencer, Michael W.; Veilleux, Louise; Chan, Samuel; He, Yutao; Zheng, Jason; Nguyen, Kayla
2009-01-01
An approach for algorithm specifications and development is described for SMAP's radar onboard processor with multi-stage demodulation and decimation bandpass digital filter. Point target simulation is used to verify and validate the filter design with the usual radar performance parameters. Preliminary FPGA implementation is also discussed.
Spatial search algorithms on Hanoi networks
NASA Astrophysics Data System (ADS)
Marquezino, Franklin de Lima; Portugal, Renato; Boettcher, Stefan
2013-01-01
We use the abstract search algorithm and its extension due to Tulsi to analyze a spatial quantum search algorithm that finds a marked vertex in Hanoi networks of degree 4 faster than classical algorithms. We also analyze the effect of using non-Groverian coins that take advantage of the small-world structure of the Hanoi networks. We obtain the scaling of the total cost of the algorithm as a function of the number of vertices. We show that Tulsi's technique plays an important role to speed up the searching algorithm. We can improve the algorithm's efficiency by choosing a non-Groverian coin if we do not implement Tulsi's method. Our conclusions are based on numerical implementations.
Empirical study of parallel LRU simulation algorithms
NASA Technical Reports Server (NTRS)
Carr, Eric; Nicol, David M.
1994-01-01
This paper reports on the performance of five parallel algorithms for simulating a fully associative cache operating under the LRU (Least-Recently-Used) replacement policy. Three of the algorithms are SIMD, and are implemented on the MasPar MP-2 architecture. Two other algorithms are parallelizations of an efficient serial algorithm on the Intel Paragon. One SIMD algorithm is quite simple, but its cost is linear in the cache size. The two other SIMD algorithm are more complex, but have costs that are independent on the cache size. Both the second and third SIMD algorithms compute all stack distances; the second SIMD algorithm is completely general, whereas the third SIMD algorithm presumes and takes advantage of bounds on the range of reference tags. Both MIMD algorithm implemented on the Paragon are general and compute all stack distances; they differ in one step that may affect their respective scalability. We assess the strengths and weaknesses of these algorithms as a function of problem size and characteristics, and compare their performance on traces derived from execution of three SPEC benchmark programs.
Regier, Michael D; Moodie, Erica E M
2016-05-01
We propose an extension of the EM algorithm that exploits the common assumption of unique parameterization, corrects for biases due to missing data and measurement error, converges for the specified model when standard implementation of the EM algorithm has a low probability of convergence, and reduces a potentially complex algorithm into a sequence of smaller, simpler, self-contained EM algorithms. We use the theory surrounding the EM algorithm to derive the theoretical results of our proposal, showing that an optimal solution over the parameter space is obtained. A simulation study is used to explore the finite sample properties of the proposed extension when there is missing data and measurement error. We observe that partitioning the EM algorithm into simpler steps may provide better bias reduction in the estimation of model parameters. The ability to breakdown a complicated problem in to a series of simpler, more accessible problems will permit a broader implementation of the EM algorithm, permit the use of software packages that now implement and/or automate the EM algorithm, and make the EM algorithm more accessible to a wider and more general audience. PMID:27227718
Study of a global search algorithm for optimal control.
NASA Technical Reports Server (NTRS)
Brocker, D. H.; Kavanaugh, W. P.; Stewart, E. C.
1967-01-01
Adaptive random search algorithm utilizing boundary cost-function hypersurfaces measurement to implement Pontryagin maximum principle, discussing hybrid computer use, iterative solution and convergence properties
Sequential and Parallel Algorithms for Spherical Interpolation
NASA Astrophysics Data System (ADS)
De Rossi, Alessandra
2007-09-01
Given a large set of scattered points on a sphere and their associated real values, we analyze sequential and parallel algorithms for the construction of a function defined on the sphere satisfying the interpolation conditions. The algorithms we implemented are based on a local interpolation method using spherical radial basis functions and the Inverse Distance Weighted method. Several numerical results show accuracy and efficiency of the algorithms.
Algorithms for builder guidelines
Balcomb, J.D.; Lekov, A.B.
1989-06-01
The Builder Guidelines are designed to make simple, appropriate guidelines available to builders for their specific localities. Builders may select from passive solar and conservation strategies with different performance potentials. They can then compare the calculated results for their particular house design with a typical house in the same location. Algorithms used to develop the Builder Guidelines are described. The main algorithms used are the monthly solar ratio (SLR) method for winter heating, the diurnal heat capacity (DHC) method for temperature swing, and a new simplified calculation method (McCool) for summer cooling. This paper applies the algorithms to estimate the performance potential of passive solar strategies, and the annual heating and cooling loads of various combinations of conservation and passive solar strategies. The basis of the McCool method is described. All three methods are implemented in a microcomputer program used to generate the guideline numbers. Guidelines for Denver, Colorado, are used to illustrate the results. The structure of the guidelines and worksheet booklets are also presented. 5 refs., 3 tabs.
Evaluating super resolution algorithms
NASA Astrophysics Data System (ADS)
Kim, Youn Jin; Park, Jong Hyun; Shin, Gun Shik; Lee, Hyun-Seung; Kim, Dong-Hyun; Park, Se Hyeok; Kim, Jaehyun
2011-01-01
This study intends to establish a sound testing and evaluation methodology based upon the human visual characteristics for appreciating the image restoration accuracy; in addition to comparing the subjective results with predictions by some objective evaluation methods. In total, six different super resolution (SR) algorithms - such as iterative back-projection (IBP), robust SR, maximum a posteriori (MAP), projections onto convex sets (POCS), a non-uniform interpolation, and frequency domain approach - were selected. The performance comparison between the SR algorithms in terms of their restoration accuracy was carried out through both subjectively and objectively. The former methodology relies upon the paired comparison method that involves the simultaneous scaling of two stimuli with respect to image restoration accuracy. For the latter, both conventional image quality metrics and color difference methods are implemented. Consequently, POCS and a non-uniform interpolation outperformed the others for an ideal situation, while restoration based methods appear more accurate to the HR image in a real world case where any prior information about the blur kernel is remained unknown. However, the noise-added-image could not be restored successfully by any of those methods. The latest International Commission on Illumination (CIE) standard color difference equation CIEDE2000 was found to predict the subjective results accurately and outperformed conventional methods for evaluating the restoration accuracy of those SR algorithms.
Portable Health Algorithms Test System
NASA Technical Reports Server (NTRS)
Melcher, Kevin J.; Wong, Edmond; Fulton, Christopher E.; Sowers, Thomas S.; Maul, William A.
2010-01-01
A document discusses the Portable Health Algorithms Test (PHALT) System, which has been designed as a means for evolving the maturity and credibility of algorithms developed to assess the health of aerospace systems. Comprising an integrated hardware-software environment, the PHALT system allows systems health management algorithms to be developed in a graphical programming environment, to be tested and refined using system simulation or test data playback, and to be evaluated in a real-time hardware-in-the-loop mode with a live test article. The integrated hardware and software development environment provides a seamless transition from algorithm development to real-time implementation. The portability of the hardware makes it quick and easy to transport between test facilities. This hard ware/software architecture is flexible enough to support a variety of diagnostic applications and test hardware, and the GUI-based rapid prototyping capability is sufficient to support development execution, and testing of custom diagnostic algorithms. The PHALT operating system supports execution of diagnostic algorithms under real-time constraints. PHALT can perform real-time capture and playback of test rig data with the ability to augment/ modify the data stream (e.g. inject simulated faults). It performs algorithm testing using a variety of data input sources, including real-time data acquisition, test data playback, and system simulations, and also provides system feedback to evaluate closed-loop diagnostic response and mitigation control.
Mapping algorithms on regular parallel architectures
Lee, P.
1989-01-01
It is significant that many of time-intensive scientific algorithms are formulated as nested loops, which are inherently regularly structured. In this dissertation the relations between the mathematical structure of nested loop algorithms and the architectural capabilities required for their parallel execution are studied. The architectural model considered in depth is that of an arbitrary dimensional systolic array. The mathematical structure of the algorithm is characterized by classifying its data-dependence vectors according to the new ZERO-ONE-INFINITE property introduced. Using this classification, the first complete set of necessary and sufficient conditions for correct transformation of a nested loop algorithm onto a given systolic array of an arbitrary dimension by means of linear mappings is derived. Practical methods to derive optimal or suboptimal systolic array implementations are also provided. The techniques developed are used constructively to develop families of implementations satisfying various optimization criteria and to design programmable arrays efficiently executing classes of algorithms. In addition, a Computer-Aided Design system running on SUN workstations has been implemented to help in the design. The methodology, which deals with general algorithms, is illustrated by synthesizing linear and planar systolic array algorithms for matrix multiplication, a reindexed Warshall-Floyd transitive closure algorithm, and the longest common subsequence algorithm.
Algorithmic synthesis using Python compiler
NASA Astrophysics Data System (ADS)
Cieszewski, Radoslaw; Romaniuk, Ryszard; Pozniak, Krzysztof; Linczuk, Maciej
2015-09-01
This paper presents a python to VHDL compiler. The compiler interprets an algorithmic description of a desired behavior written in Python and translate it to VHDL. FPGA combines many benefits of both software and ASIC implementations. Like software, the programmed circuit is flexible, and can be reconfigured over the lifetime of the system. FPGAs have the potential to achieve far greater performance than software as a result of bypassing the fetch-decode-execute operations of traditional processors, and possibly exploiting a greater level of parallelism. This can be achieved by using many computational resources at the same time. Creating parallel programs implemented in FPGAs in pure HDL is difficult and time consuming. Using higher level of abstraction and High-Level Synthesis compiler implementation time can be reduced. The compiler has been implemented using the Python language. This article describes design, implementation and results of created tools.
Algorithms versus architectures for computational chemistry
NASA Technical Reports Server (NTRS)
Partridge, H.; Bauschlicher, C. W., Jr.
1986-01-01
The algorithms employed are computationally intensive and, as a result, increased performance (both algorithmic and architectural) is required to improve accuracy and to treat larger molecular systems. Several benchmark quantum chemistry codes are examined on a variety of architectures. While these codes are only a small portion of a typical quantum chemistry library, they illustrate many of the computationally intensive kernels and data manipulation requirements of some applications. Furthermore, understanding the performance of the existing algorithm on present and proposed supercomputers serves as a guide for future programs and algorithm development. The algorithms investigated are: (1) a sparse symmetric matrix vector product; (2) a four index integral transformation; and (3) the calculation of diatomic two electron Slater integrals. The vectorization strategies are examined for these algorithms for both the Cyber 205 and Cray XMP. In addition, multiprocessor implementations of the algorithms are looked at on the Cray XMP and on the MIT static data flow machine proposed by DENNIS.
Search properties of some sequential decoding algorithms.
NASA Technical Reports Server (NTRS)
Geist, J. M.
1973-01-01
Sequential decoding procedures are studied in the context of selecting a path through a tree. Several algorithms are considered, and their properties are compared. It is shown that the stack algorithm introduced by Zigangirov (1966) and by Jelinek (1969) is essentially equivalent to the Fano algorithm with regard to the set of nodes examined and the path selected, although the description, implementation, and action of the two algorithms are quite different. A modified Fano algorithm is introduced, in which the quantizing parameter is eliminated. It can be inferred from limited simulation results that, at least in some applications, the new algorithm is computationally inferior to the old. However, it is of some theoretical interest since the conventional Fano algorithm may be considered to be a quantized version of it.
Fast algorithms for transport models
Manteuffel, T.A.
1992-12-01
The objective of this project is the development of numerical solution techniques for deterministic models of the transport of neutral and charged particles and the demonstration of their effectiveness in both a production environment and on advanced architecture computers. The primary focus is on various versions of the linear Boltzman equation. These equations are fundamental in many important applications. This project is an attempt to integrate the development of numerical algorithms with the process of developing production software. A major thrust of this reject will be the implementation of these algorithms on advanced architecture machines that reside at the Advanced Computing Laboratory (ACL) at Los Alamos National Laboratories (LANL).
Fontana, W.
1990-12-13
In this paper complex adaptive systems are defined by a self- referential loop in which objects encode functions that act back on these objects. A model for this loop is presented. It uses a simple recursive formal language, derived from the lambda-calculus, to provide a semantics that maps character strings into functions that manipulate symbols on strings. The interaction between two functions, or algorithms, is defined naturally within the language through function composition, and results in the production of a new function. An iterated map acting on sets of functions and a corresponding graph representation are defined. Their properties are useful to discuss the behavior of a fixed size ensemble of randomly interacting functions. This function gas'', or Turning gas'', is studied under various conditions, and evolves cooperative interaction patterns of considerable intricacy. These patterns adapt under the influence of perturbations consisting in the addition of new random functions to the system. Different organizations emerge depending on the availability of self-replicators.
A parallel algorithm for global routing
NASA Technical Reports Server (NTRS)
Brouwer, Randall J.; Banerjee, Prithviraj
1990-01-01
A Parallel Hierarchical algorithm for Global Routing (PHIGURE) is presented. The router is based on the work of Burstein and Pelavin, but has many extensions for general global routing and parallel execution. Main features of the algorithm include structured hierarchical decomposition into separate independent tasks which are suitable for parallel execution and adaptive simplex solution for adding feedthroughs and adjusting channel heights for row-based layout. Alternative decomposition methods and the various levels of parallelism available in the algorithm are examined closely. The algorithm is described and results are presented for a shared-memory multiprocessor implementation.