EV Charging Algorithm Implementation with User Price Preference
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, Bin; Hu, Boyang; Qiu, Charlie
2015-02-17
in this paper, we propose and implement a smart Electric Vehicle (EV) charging algorithm to control the EV charging infrastructures according to users’ price preferences. EVSE (Electric Vehicle Supply Equipment), equipped with bidirectional communication devices and smart meters, can be remotely monitored by the proposed charging algorithm applied to EV control center and mobile app. On the server side, ARIMA model is utilized to fit historical charging load data and perform day-ahead prediction. A pricing strategy with energy bidding policy is proposed and implemented to generate a charging price list to be broadcasted to EV users through mobile app. Onmore » the user side, EV drivers can submit their price preferences and daily travel schedules to negotiate with Control Center to consume the expected energy and minimize charging cost simultaneously. The proposed algorithm is tested and validated through the experimental implementations in UCLA parking lots.« less
Scenario Decomposition for 0-1 Stochastic Programs: Improvements and Asynchronous Implementation
Ryan, Kevin; Rajan, Deepak; Ahmed, Shabbir
2016-05-01
We recently proposed scenario decomposition algorithm for stochastic 0-1 programs finds an optimal solution by evaluating and removing individual solutions that are discovered by solving scenario subproblems. In our work, we develop an asynchronous, distributed implementation of the algorithm which has computational advantages over existing synchronous implementations of the algorithm. Improvements to both the synchronous and asynchronous algorithm are proposed. We also test the results on well known stochastic 0-1 programs from the SIPLIB test library and is able to solve one previously unsolved instance from the test set.
NASA Astrophysics Data System (ADS)
Ham, Woonchul; Song, Chulgyu
2017-05-01
In this paper, we propose a new three-dimensional stereo image reconstruction algorithm for a photoacoustic medical imaging system. We also introduce and discuss a new theoretical algorithm by using the physical concept of Radon transform. The main key concept of proposed theoretical algorithm is to evaluate the existence possibility of the acoustic source within a searching region by using the geometric distance between each sensor element of acoustic detector and the corresponding searching region denoted by grid. We derive the mathematical equation for the magnitude of the existence possibility which can be used for implementing a new proposed algorithm. We handle and derive mathematical equations of proposed algorithm for the one-dimensional sensing array case as well as two dimensional sensing array case too. A mathematical k-wave simulation data are used for comparing the image quality of the proposed algorithm with that of general conventional algorithm in which the FFT should be necessarily used. From the k-wave Matlab simulation results, we can prove the effectiveness of the proposed reconstruction algorithm.
Ho, ThienLuan; Oh, Seung-Rohk
2017-01-01
Approximate string matching with k-differences has a number of practical applications, ranging from pattern recognition to computational biology. This paper proposes an efficient memory-access algorithm for parallel approximate string matching with k-differences on Graphics Processing Units (GPUs). In the proposed algorithm, all threads in the same GPUs warp share data using warp-shuffle operation instead of accessing the shared memory. Moreover, we implement the proposed algorithm by exploiting the memory structure of GPUs to optimize its performance. Experiment results for real DNA packages revealed that the performance of the proposed algorithm and its implementation archived up to 122.64 and 1.53 times compared to that of sequential algorithm on CPU and previous parallel approximate string matching algorithm on GPUs, respectively. PMID:29016700
A cellular automata based FPGA realization of a new metaheuristic bat-inspired algorithm
NASA Astrophysics Data System (ADS)
Progias, Pavlos; Amanatiadis, Angelos A.; Spataro, William; Trunfio, Giuseppe A.; Sirakoulis, Georgios Ch.
2016-10-01
Optimization algorithms are often inspired by processes occuring in nature, such as animal behavioral patterns. The main concern with implementing such algorithms in software is the large amounts of processing power they require. In contrast to software code, that can only perform calculations in a serial manner, an implementation in hardware, exploiting the inherent parallelism of single-purpose processors, can prove to be much more efficient both in speed and energy consumption. Furthermore, the use of Cellular Automata (CA) in such an implementation would be efficient both as a model for natural processes, as well as a computational paradigm implemented well on hardware. In this paper, we propose a VHDL implementation of a metaheuristic algorithm inspired by the echolocation behavior of bats. More specifically, the CA model is inspired by the metaheuristic algorithm proposed earlier in the literature, which could be considered at least as efficient than other existing optimization algorithms. The function of the FPGA implementation of our algorithm is explained in full detail and results of our simulations are also demonstrated.
Regier, Michael D; Moodie, Erica E M
2016-05-01
We propose an extension of the EM algorithm that exploits the common assumption of unique parameterization, corrects for biases due to missing data and measurement error, converges for the specified model when standard implementation of the EM algorithm has a low probability of convergence, and reduces a potentially complex algorithm into a sequence of smaller, simpler, self-contained EM algorithms. We use the theory surrounding the EM algorithm to derive the theoretical results of our proposal, showing that an optimal solution over the parameter space is obtained. A simulation study is used to explore the finite sample properties of the proposed extension when there is missing data and measurement error. We observe that partitioning the EM algorithm into simpler steps may provide better bias reduction in the estimation of model parameters. The ability to breakdown a complicated problem in to a series of simpler, more accessible problems will permit a broader implementation of the EM algorithm, permit the use of software packages that now implement and/or automate the EM algorithm, and make the EM algorithm more accessible to a wider and more general audience.
Yang, Changju; Kim, Hyongsuk; Adhikari, Shyam Prasad; Chua, Leon O.
2016-01-01
A hybrid learning method of a software-based backpropagation learning and a hardware-based RWC learning is proposed for the development of circuit-based neural networks. The backpropagation is known as one of the most efficient learning algorithms. A weak point is that its hardware implementation is extremely difficult. The RWC algorithm, which is very easy to implement with respect to its hardware circuits, takes too many iterations for learning. The proposed learning algorithm is a hybrid one of these two. The main learning is performed with a software version of the BP algorithm, firstly, and then, learned weights are transplanted on a hardware version of a neural circuit. At the time of the weight transplantation, a significant amount of output error would occur due to the characteristic difference between the software and the hardware. In the proposed method, such error is reduced via a complementary learning of the RWC algorithm, which is implemented in a simple hardware. The usefulness of the proposed hybrid learning system is verified via simulations upon several classical learning problems. PMID:28025566
Pant, Jeevan K; Krishnan, Sridhar
2014-04-01
A new algorithm for the reconstruction of electrocardiogram (ECG) signals and a dictionary learning algorithm for the enhancement of its reconstruction performance for a class of signals are proposed. The signal reconstruction algorithm is based on minimizing the lp pseudo-norm of the second-order difference, called as the lp(2d) pseudo-norm, of the signal. The optimization involved is carried out using a sequential conjugate-gradient algorithm. The dictionary learning algorithm uses an iterative procedure wherein a signal reconstruction and a dictionary update steps are repeated until a convergence criterion is satisfied. The signal reconstruction step is implemented by using the proposed signal reconstruction algorithm and the dictionary update step is implemented by using the linear least-squares method. Extensive simulation results demonstrate that the proposed algorithm yields improved reconstruction performance for temporally correlated ECG signals relative to the state-of-the-art lp(1d)-regularized least-squares and Bayesian learning based algorithms. Also for a known class of signals, the reconstruction performance of the proposed algorithm can be improved by applying it in conjunction with a dictionary obtained using the proposed dictionary learning algorithm.
An improved non-uniformity correction algorithm and its hardware implementation on FPGA
NASA Astrophysics Data System (ADS)
Rong, Shenghui; Zhou, Huixin; Wen, Zhigang; Qin, Hanlin; Qian, Kun; Cheng, Kuanhong
2017-09-01
The Non-uniformity of Infrared Focal Plane Arrays (IRFPA) severely degrades the infrared image quality. An effective non-uniformity correction (NUC) algorithm is necessary for an IRFPA imaging and application system. However traditional scene-based NUC algorithm suffers the image blurring and artificial ghosting. In addition, few effective hardware platforms have been proposed to implement corresponding NUC algorithms. Thus, this paper proposed an improved neural-network based NUC algorithm by the guided image filter and the projection-based motion detection algorithm. First, the guided image filter is utilized to achieve the accurate desired image to decrease the artificial ghosting. Then a projection-based moving detection algorithm is utilized to determine whether the correction coefficients should be updated or not. In this way the problem of image blurring can be overcome. At last, an FPGA-based hardware design is introduced to realize the proposed NUC algorithm. A real and a simulated infrared image sequences are utilized to verify the performance of the proposed algorithm. Experimental results indicated that the proposed NUC algorithm can effectively eliminate the fix pattern noise with less image blurring and artificial ghosting. The proposed hardware design takes less logic elements in FPGA and spends less clock cycles to process one frame of image.
NASA Astrophysics Data System (ADS)
Wu, Yuanfeng; Gao, Lianru; Zhang, Bing; Zhao, Haina; Li, Jun
2014-01-01
We present a parallel implementation of the optimized maximum noise fraction (G-OMNF) transform algorithm for feature extraction of hyperspectral images on commodity graphics processing units (GPUs). The proposed approach explored the algorithm data-level concurrency and optimized the computing flow. We first defined a three-dimensional grid, in which each thread calculates a sub-block data to easily facilitate the spatial and spectral neighborhood data searches in noise estimation, which is one of the most important steps involved in OMNF. Then, we optimized the processing flow and computed the noise covariance matrix before computing the image covariance matrix to reduce the original hyperspectral image data transmission. These optimization strategies can greatly improve the computing efficiency and can be applied to other feature extraction algorithms. The proposed parallel feature extraction algorithm was implemented on an Nvidia Tesla GPU using the compute unified device architecture and basic linear algebra subroutines library. Through the experiments on several real hyperspectral images, our GPU parallel implementation provides a significant speedup of the algorithm compared with the CPU implementation, especially for highly data parallelizable and arithmetically intensive algorithm parts, such as noise estimation. In order to further evaluate the effectiveness of G-OMNF, we used two different applications: spectral unmixing and classification for evaluation. Considering the sensor scanning rate and the data acquisition time, the proposed parallel implementation met the on-board real-time feature extraction.
de Paula, Lauro C. M.; Soares, Anderson S.; de Lima, Telma W.; Delbem, Alexandre C. B.; Coelho, Clarimar J.; Filho, Arlindo R. G.
2014-01-01
Several variable selection algorithms in multivariate calibration can be accelerated using Graphics Processing Units (GPU). Among these algorithms, the Firefly Algorithm (FA) is a recent proposed metaheuristic that may be used for variable selection. This paper presents a GPU-based FA (FA-MLR) with multiobjective formulation for variable selection in multivariate calibration problems and compares it with some traditional sequential algorithms in the literature. The advantage of the proposed implementation is demonstrated in an example involving a relatively large number of variables. The results showed that the FA-MLR, in comparison with the traditional algorithms is a more suitable choice and a relevant contribution for the variable selection problem. Additionally, the results also demonstrated that the FA-MLR performed in a GPU can be five times faster than its sequential implementation. PMID:25493625
de Paula, Lauro C M; Soares, Anderson S; de Lima, Telma W; Delbem, Alexandre C B; Coelho, Clarimar J; Filho, Arlindo R G
2014-01-01
Several variable selection algorithms in multivariate calibration can be accelerated using Graphics Processing Units (GPU). Among these algorithms, the Firefly Algorithm (FA) is a recent proposed metaheuristic that may be used for variable selection. This paper presents a GPU-based FA (FA-MLR) with multiobjective formulation for variable selection in multivariate calibration problems and compares it with some traditional sequential algorithms in the literature. The advantage of the proposed implementation is demonstrated in an example involving a relatively large number of variables. The results showed that the FA-MLR, in comparison with the traditional algorithms is a more suitable choice and a relevant contribution for the variable selection problem. Additionally, the results also demonstrated that the FA-MLR performed in a GPU can be five times faster than its sequential implementation.
Recursive Implementations of the Consider Filter
NASA Technical Reports Server (NTRS)
Zanetti, Renato; DSouza, Chris
2012-01-01
One method to account for parameters errors in the Kalman filter is to consider their effect in the so-called Schmidt-Kalman filter. This work addresses issues that arise when implementing a consider Kalman filter as a real-time, recursive algorithm. A favorite implementation of the Kalman filter as an onboard navigation subsystem is the UDU formulation. A new way to implement a UDU consider filter is proposed. The non-optimality of the recursive consider filter is also analyzed, and a modified algorithm is proposed to overcome this limitation.
Eroglu, Duygu Yilmaz; Ozmutlu, H Cenk
2014-01-01
We developed mixed integer programming (MIP) models and hybrid genetic-local search algorithms for the scheduling problem of unrelated parallel machines with job sequence and machine-dependent setup times and with job splitting property. The first contribution of this paper is to introduce novel algorithms which make splitting and scheduling simultaneously with variable number of subjobs. We proposed simple chromosome structure which is constituted by random key numbers in hybrid genetic-local search algorithm (GAspLA). Random key numbers are used frequently in genetic algorithms, but it creates additional difficulty when hybrid factors in local search are implemented. We developed algorithms that satisfy the adaptation of results of local search into the genetic algorithms with minimum relocation operation of genes' random key numbers. This is the second contribution of the paper. The third contribution of this paper is three developed new MIP models which are making splitting and scheduling simultaneously. The fourth contribution of this paper is implementation of the GAspLAMIP. This implementation let us verify the optimality of GAspLA for the studied combinations. The proposed methods are tested on a set of problems taken from the literature and the results validate the effectiveness of the proposed algorithms.
NASA Astrophysics Data System (ADS)
Genovese, Mariangela; Napoli, Ettore
2013-05-01
The identification of moving objects is a fundamental step in computer vision processing chains. The development of low cost and lightweight smart cameras steadily increases the request of efficient and high performance circuits able to process high definition video in real time. The paper proposes two processor cores aimed to perform the real time background identification on High Definition (HD, 1920 1080 pixel) video streams. The implemented algorithm is the OpenCV version of the Gaussian Mixture Model (GMM), an high performance probabilistic algorithm for the segmentation of the background that is however computationally intensive and impossible to implement on general purpose CPU with the constraint of real time processing. In the proposed paper, the equations of the OpenCV GMM algorithm are optimized in such a way that a lightweight and low power implementation of the algorithm is obtained. The reported performances are also the result of the use of state of the art truncated binary multipliers and ROM compression techniques for the implementation of the non-linear functions. The first circuit has commercial FPGA devices as a target and provides speed and logic resource occupation that overcome previously proposed implementations. The second circuit is oriented to an ASIC (UMC-90nm) standard cell implementation. Both implementations are able to process more than 60 frames per second in 1080p format, a frame rate compatible with HD television.
Xiao, Kai; Chen, Danny Z; Hu, X Sharon; Zhou, Bo
2012-12-01
The three-dimensional digital differential analyzer (3D-DDA) algorithm is a widely used ray traversal method, which is also at the core of many convolution∕superposition (C∕S) dose calculation approaches. However, porting existing C∕S dose calculation methods onto graphics processing unit (GPU) has brought challenges to retaining the efficiency of this algorithm. In particular, straightforward implementation of the original 3D-DDA algorithm inflicts a lot of branch divergence which conflicts with the GPU programming model and leads to suboptimal performance. In this paper, an efficient GPU implementation of the 3D-DDA algorithm is proposed, which effectively reduces such branch divergence and improves performance of the C∕S dose calculation programs running on GPU. The main idea of the proposed method is to convert a number of conditional statements in the original 3D-DDA algorithm into a set of simple operations (e.g., arithmetic, comparison, and logic) which are better supported by the GPU architecture. To verify and demonstrate the performance improvement, this ray traversal method was integrated into a GPU-based collapsed cone convolution∕superposition (CCCS) dose calculation program. The proposed method has been tested using a water phantom and various clinical cases on an NVIDIA GTX570 GPU. The CCCS dose calculation program based on the efficient 3D-DDA ray traversal implementation runs 1.42 ∼ 2.67× faster than the one based on the original 3D-DDA implementation, without losing any accuracy. The results show that the proposed method can effectively reduce branch divergence in the original 3D-DDA ray traversal algorithm and improve the performance of the CCCS program running on GPU. Considering the wide utilization of the 3D-DDA algorithm, various applications can benefit from this implementation method.
Jamming protection of spread spectrum RFID system
NASA Astrophysics Data System (ADS)
Mazurek, Gustaw
2006-10-01
This paper presents a new transform-domain processing algorithm for rejection of narrowband interferences in RFID/DS-CDMA systems. The performance of the proposed algorithm has been verified via computer simulations. Implementation issues have been discussed. The algorithm can be implemented in the FPGA or DSP technology.
Kawakami, Tomoya; Fujita, Naotaka; Yoshihisa, Tomoki; Tsukamoto, Masahiko
2014-01-01
In recent years, sensors become popular and Home Energy Management System (HEMS) takes an important role in saving energy without decrease in QoL (Quality of Life). Currently, many rule-based HEMSs have been proposed and almost all of them assume "IF-THEN" rules. The Rete algorithm is a typical pattern matching algorithm for IF-THEN rules. Currently, we have proposed a rule-based Home Energy Management System (HEMS) using the Rete algorithm. In the proposed system, rules for managing energy are processed by smart taps in network, and the loads for processing rules and collecting data are distributed to smart taps. In addition, the number of processes and collecting data are reduced by processing rules based on the Rete algorithm. In this paper, we evaluated the proposed system by simulation. In the simulation environment, rules are processed by a smart tap that relates to the action part of each rule. In addition, we implemented the proposed system as HEMS using smart taps.
Implementation of a parallel protein structure alignment service on cloud.
Hung, Che-Lun; Lin, Yaw-Ling
2013-01-01
Protein structure alignment has become an important strategy by which to identify evolutionary relationships between protein sequences. Several alignment tools are currently available for online comparison of protein structures. In this paper, we propose a parallel protein structure alignment service based on the Hadoop distribution framework. This service includes a protein structure alignment algorithm, a refinement algorithm, and a MapReduce programming model. The refinement algorithm refines the result of alignment. To process vast numbers of protein structures in parallel, the alignment and refinement algorithms are implemented using MapReduce. We analyzed and compared the structure alignments produced by different methods using a dataset randomly selected from the PDB database. The experimental results verify that the proposed algorithm refines the resulting alignments more accurately than existing algorithms. Meanwhile, the computational performance of the proposed service is proportional to the number of processors used in our cloud platform.
Implementation of a Parallel Protein Structure Alignment Service on Cloud
Hung, Che-Lun; Lin, Yaw-Ling
2013-01-01
Protein structure alignment has become an important strategy by which to identify evolutionary relationships between protein sequences. Several alignment tools are currently available for online comparison of protein structures. In this paper, we propose a parallel protein structure alignment service based on the Hadoop distribution framework. This service includes a protein structure alignment algorithm, a refinement algorithm, and a MapReduce programming model. The refinement algorithm refines the result of alignment. To process vast numbers of protein structures in parallel, the alignment and refinement algorithms are implemented using MapReduce. We analyzed and compared the structure alignments produced by different methods using a dataset randomly selected from the PDB database. The experimental results verify that the proposed algorithm refines the resulting alignments more accurately than existing algorithms. Meanwhile, the computational performance of the proposed service is proportional to the number of processors used in our cloud platform. PMID:23671842
Cache-Oblivious parallel SIMD Viterbi decoding for sequence search in HMMER.
Ferreira, Miguel; Roma, Nuno; Russo, Luis M S
2014-05-30
HMMER is a commonly used bioinformatics tool based on Hidden Markov Models (HMMs) to analyze and process biological sequences. One of its main homology engines is based on the Viterbi decoding algorithm, which was already highly parallelized and optimized using Farrar's striped processing pattern with Intel SSE2 instruction set extension. A new SIMD vectorization of the Viterbi decoding algorithm is proposed, based on an SSE2 inter-task parallelization approach similar to the DNA alignment algorithm proposed by Rognes. Besides this alternative vectorization scheme, the proposed implementation also introduces a new partitioning of the Markov model that allows a significantly more efficient exploitation of the cache locality. Such optimization, together with an improved loading of the emission scores, allows the achievement of a constant processing throughput, regardless of the innermost-cache size and of the dimension of the considered model. The proposed optimized vectorization of the Viterbi decoding algorithm was extensively evaluated and compared with the HMMER3 decoder to process DNA and protein datasets, proving to be a rather competitive alternative implementation. Being always faster than the already highly optimized ViterbiFilter implementation of HMMER3, the proposed Cache-Oblivious Parallel SIMD Viterbi (COPS) implementation provides a constant throughput and offers a processing speedup as high as two times faster, depending on the model's size.
NASA Astrophysics Data System (ADS)
Zhang, Zhenhai; Li, Kejie; Wu, Xiaobing; Zhang, Shujiang
2008-03-01
The unwrapped and correcting algorithm based on Coordinate Rotation Digital Computer (CORDIC) and bilinear interpolation algorithm was presented in this paper, with the purpose of processing dynamic panoramic annular image. An original annular panoramic image captured by panoramic annular lens (PAL) can be unwrapped and corrected to conventional rectangular image without distortion, which is much more coincident with people's vision. The algorithm for panoramic image processing is modeled by VHDL and implemented in FPGA. The experimental results show that the proposed panoramic image algorithm for unwrapped and distortion correction has the lower computation complexity and the architecture for dynamic panoramic image processing has lower hardware cost and power consumption. And the proposed algorithm is valid.
VLSI implementation of RSA encryption system using ancient Indian Vedic mathematics
NASA Astrophysics Data System (ADS)
Thapliyal, Himanshu; Srinivas, M. B.
2005-06-01
This paper proposes the hardware implementation of RSA encryption/decryption algorithm using the algorithms of Ancient Indian Vedic Mathematics that have been modified to improve performance. The recently proposed hierarchical overlay multiplier architecture is used in the RSA circuitry for multiplication operation. The most significant aspect of the paper is the development of a division architecture based on Straight Division algorithm of Ancient Indian Vedic Mathematics and embedding it in RSA encryption/decryption circuitry for improved efficiency. The coding is done in Verilog HDL and the FPGA synthesis is done using Xilinx Spartan library. The results show that RSA circuitry implemented using Vedic division and multiplication is efficient in terms of area/speed compared to its implementation using conventional multiplication and division architectures.
Ozmutlu, H. Cenk
2014-01-01
We developed mixed integer programming (MIP) models and hybrid genetic-local search algorithms for the scheduling problem of unrelated parallel machines with job sequence and machine-dependent setup times and with job splitting property. The first contribution of this paper is to introduce novel algorithms which make splitting and scheduling simultaneously with variable number of subjobs. We proposed simple chromosome structure which is constituted by random key numbers in hybrid genetic-local search algorithm (GAspLA). Random key numbers are used frequently in genetic algorithms, but it creates additional difficulty when hybrid factors in local search are implemented. We developed algorithms that satisfy the adaptation of results of local search into the genetic algorithms with minimum relocation operation of genes' random key numbers. This is the second contribution of the paper. The third contribution of this paper is three developed new MIP models which are making splitting and scheduling simultaneously. The fourth contribution of this paper is implementation of the GAspLAMIP. This implementation let us verify the optimality of GAspLA for the studied combinations. The proposed methods are tested on a set of problems taken from the literature and the results validate the effectiveness of the proposed algorithms. PMID:24977204
Multiple Lookup Table-Based AES Encryption Algorithm Implementation
NASA Astrophysics Data System (ADS)
Gong, Jin; Liu, Wenyi; Zhang, Huixin
Anew AES (Advanced Encryption Standard) encryption algorithm implementation was proposed in this paper. It is based on five lookup tables, which are generated from S-box(the substitution table in AES). The obvious advantages are reducing the code-size, improving the implementation efficiency, and helping new learners to understand the AES encryption algorithm and GF(28) multiplication which are necessary to correctly implement AES[1]. This method can be applied on processors with word length 32 or above, FPGA and others. And correspondingly we can implement it by VHDL, Verilog, VB and other languages.
Valiant load-balanced robust routing under hose model for WDM mesh networks
NASA Astrophysics Data System (ADS)
Zhang, Xiaoning; Li, Lemin; Wang, Sheng
2006-09-01
In this paper, we propose Valiant Load-Balanced robust routing scheme for WDM mesh networks under the model of polyhedral uncertainty (i.e., hose model), and the proposed routing scheme is implemented with traffic grooming approach. Our Objective is to maximize the hose model throughput. A mathematic formulation of Valiant Load-Balanced robust routing is presented and three fast heuristic algorithms are also proposed. When implementing Valiant Load-Balanced robust routing scheme to WDM mesh networks, a novel traffic-grooming algorithm called MHF (minimizing hop first) is proposed. We compare the three heuristic algorithms with the VPN tree under the hose model. Finally we demonstrate in the simulation results that MHF with Valiant Load-Balanced robust routing scheme outperforms the traditional traffic-grooming algorithm in terms of the throughput for the uniform/non-uniform traffic matrix under the hose model.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Qin, SB; Cady, ST; Dominguez-Garcia, AD
This paper presents the theory and implementation of a distributed algorithm for controlling differential power processing converters in photovoltaic (PV) applications. This distributed algorithm achieves true maximum power point tracking of series-connected PV submodules by relying only on local voltage measurements and neighbor-to-neighbor communication between the differential power converters. Compared to previous solutions, the proposed algorithm achieves reduced number of perturbations at each step and potentially faster tracking without adding extra hardware; all these features make this algorithm well-suited for long submodule strings. The formulation of the algorithm, discussion of its properties, as well as three case studies are presented.more » The performance of the distributed tracking algorithm has been verified via experiments, which yielded quantifiable improvements over other techniques that have been implemented in practice. Both simulations and hardware experiments have confirmed the effectiveness of the proposed distributed algorithm.« less
Ultrafast adiabatic quantum algorithm for the NP-complete exact cover problem
Wang, Hefeng; Wu, Lian-Ao
2016-01-01
An adiabatic quantum algorithm may lose quantumness such as quantum coherence entirely in its long runtime, and consequently the expected quantum speedup of the algorithm does not show up. Here we present a general ultrafast adiabatic quantum algorithm. We show that by applying a sequence of fast random or regular signals during evolution, the runtime can be reduced substantially, whereas advantages of the adiabatic algorithm remain intact. We also propose a randomized Trotter formula and show that the driving Hamiltonian and the proposed sequence of fast signals can be implemented simultaneously. We illustrate the algorithm by solving the NP-complete 3-bit exact cover problem (EC3), where NP stands for nondeterministic polynomial time, and put forward an approach to implementing the problem with trapped ions. PMID:26923834
Cache-Oblivious parallel SIMD Viterbi decoding for sequence search in HMMER
2014-01-01
Background HMMER is a commonly used bioinformatics tool based on Hidden Markov Models (HMMs) to analyze and process biological sequences. One of its main homology engines is based on the Viterbi decoding algorithm, which was already highly parallelized and optimized using Farrar’s striped processing pattern with Intel SSE2 instruction set extension. Results A new SIMD vectorization of the Viterbi decoding algorithm is proposed, based on an SSE2 inter-task parallelization approach similar to the DNA alignment algorithm proposed by Rognes. Besides this alternative vectorization scheme, the proposed implementation also introduces a new partitioning of the Markov model that allows a significantly more efficient exploitation of the cache locality. Such optimization, together with an improved loading of the emission scores, allows the achievement of a constant processing throughput, regardless of the innermost-cache size and of the dimension of the considered model. Conclusions The proposed optimized vectorization of the Viterbi decoding algorithm was extensively evaluated and compared with the HMMER3 decoder to process DNA and protein datasets, proving to be a rather competitive alternative implementation. Being always faster than the already highly optimized ViterbiFilter implementation of HMMER3, the proposed Cache-Oblivious Parallel SIMD Viterbi (COPS) implementation provides a constant throughput and offers a processing speedup as high as two times faster, depending on the model’s size. PMID:24884826
NASA Technical Reports Server (NTRS)
Liu, Kuojuey Ray
1990-01-01
Least-squares (LS) estimations and spectral decomposition algorithms constitute the heart of modern signal processing and communication problems. Implementations of recursive LS and spectral decomposition algorithms onto parallel processing architectures such as systolic arrays with efficient fault-tolerant schemes are the major concerns of this dissertation. There are four major results in this dissertation. First, we propose the systolic block Householder transformation with application to the recursive least-squares minimization. It is successfully implemented on a systolic array with a two-level pipelined implementation at the vector level as well as at the word level. Second, a real-time algorithm-based concurrent error detection scheme based on the residual method is proposed for the QRD RLS systolic array. The fault diagnosis, order degraded reconfiguration, and performance analysis are also considered. Third, the dynamic range, stability, error detection capability under finite-precision implementation, order degraded performance, and residual estimation under faulty situations for the QRD RLS systolic array are studied in details. Finally, we propose the use of multi-phase systolic algorithms for spectral decomposition based on the QR algorithm. Two systolic architectures, one based on triangular array and another based on rectangular array, are presented for the multiphase operations with fault-tolerant considerations. Eigenvectors and singular vectors can be easily obtained by using the multi-pase operations. Performance issues are also considered.
A novel highly parallel algorithm for linearly unmixing hyperspectral images
NASA Astrophysics Data System (ADS)
Guerra, Raúl; López, Sebastián.; Callico, Gustavo M.; López, Jose F.; Sarmiento, Roberto
2014-10-01
Endmember extraction and abundances calculation represent critical steps within the process of linearly unmixing a given hyperspectral image because of two main reasons. The first one is due to the need of computing a set of accurate endmembers in order to further obtain confident abundance maps. The second one refers to the huge amount of operations involved in these time-consuming processes. This work proposes an algorithm to estimate the endmembers of a hyperspectral image under analysis and its abundances at the same time. The main advantage of this algorithm is its high parallelization degree and the mathematical simplicity of the operations implemented. This algorithm estimates the endmembers as virtual pixels. In particular, the proposed algorithm performs the descent gradient method to iteratively refine the endmembers and the abundances, reducing the mean square error, according with the linear unmixing model. Some mathematical restrictions must be added so the method converges in a unique and realistic solution. According with the algorithm nature, these restrictions can be easily implemented. The results obtained with synthetic images demonstrate the well behavior of the algorithm proposed. Moreover, the results obtained with the well-known Cuprite dataset also corroborate the benefits of our proposal.
Segmentation of financial seals and its implementation on a DSP-based system
NASA Astrophysics Data System (ADS)
He, Jin; Liu, Tiegen; Guo, Jingjing; Zhang, Hao
2009-11-01
Automatic seal imprint identification is an important part of modern financial security. Accurate segmentation is the basis of correct identification. In this paper, a DSP (digital signal processor) based identification system was designed, and an adaptive algorithm was proposed to extract binary seal images from financial instruments. As the kernel of the identification system, a DSP chip of TMS320DM642 was used to implement image processing, controlling and coordinating works of each system module. The proposed algorithm consisted of three stages, including extraction of grayscale seal image, denoising and binarization. A grayscale seal image was extracted by color transform from a financial instrument image. Adaptive morphological operations were used to highlight details of the extracted grayscale seal image and smooth the background. After median filter for noise elimination, the filtered seal image was binarized by Otsu's method. The algorithm was developed based on the DSP development environment CCS and real-time operation system DSP/BIOS. To simplify the implementation of the proposed algorithm, the calibration of white balance and the coarse positioning of the seal imprint were implemented by TMS320DM642 controlling image acquisition. IMGLIB of TMS320DM642 was used for the efficiency improvement. The experiment result showed that financial seal imprints, even with intricate and dense strokes can be correctly segmented by the proposed algorithm. Adhesion and incompleteness distortions in the segmentation results were reduced, even when the original seal imprint had a poor quality.
NASA Astrophysics Data System (ADS)
Alpatov, Boris; Babayan, Pavel; Ershov, Maksim; Strotov, Valery
2016-10-01
This paper describes the implementation of the orientation estimation algorithm in FPGA-based vision system. An approach to estimate an orientation of objects lacking axial symmetry is proposed. Suggested algorithm is intended to estimate orientation of a specific known 3D object based on object 3D model. The proposed orientation estimation algorithm consists of two stages: learning and estimation. Learning stage is devoted to the exploring of studied object. Using 3D model we can gather set of training images by capturing 3D model from viewpoints evenly distributed on a sphere. Sphere points distribution is made by the geosphere principle. Gathered training image set is used for calculating descriptors, which will be used in the estimation stage of the algorithm. The estimation stage is focusing on matching process between an observed image descriptor and the training image descriptors. The experimental research was performed using a set of images of Airbus A380. The proposed orientation estimation algorithm showed good accuracy in all case studies. The real-time performance of the algorithm in FPGA-based vision system was demonstrated.
A sample implementation for parallelizing Divide-and-Conquer algorithms on the GPU.
Mei, Gang; Zhang, Jiayin; Xu, Nengxiong; Zhao, Kunyang
2018-01-01
The strategy of Divide-and-Conquer (D&C) is one of the frequently used programming patterns to design efficient algorithms in computer science, which has been parallelized on shared memory systems and distributed memory systems. Tzeng and Owens specifically developed a generic paradigm for parallelizing D&C algorithms on modern Graphics Processing Units (GPUs). In this paper, by following the generic paradigm proposed by Tzeng and Owens, we provide a new and publicly available GPU implementation of the famous D&C algorithm, QuickHull, to give a sample and guide for parallelizing D&C algorithms on the GPU. The experimental results demonstrate the practicality of our sample GPU implementation. Our research objective in this paper is to present a sample GPU implementation of a classical D&C algorithm to help interested readers to develop their own efficient GPU implementations with fewer efforts.
An Efficient Randomized Algorithm for Real-Time Process Scheduling in PicOS Operating System
NASA Astrophysics Data System (ADS)
Helmy*, Tarek; Fatai, Anifowose; Sallam, El-Sayed
PicOS is an event-driven operating environment designed for use with embedded networked sensors. More specifically, it is designed to support the concurrency in intensive operations required by networked sensors with minimal hardware requirements. Existing process scheduling algorithms of PicOS; a commercial tiny, low-footprint, real-time operating system; have their associated drawbacks. An efficient, alternative algorithm, based on a randomized selection policy, has been proposed, demonstrated, confirmed for efficiency and fairness, on the average, and has been recommended for implementation in PicOS. Simulations were carried out and performance measures such as Average Waiting Time (AWT) and Average Turn-around Time (ATT) were used to assess the efficiency of the proposed randomized version over the existing ones. The results prove that Randomized algorithm is the best and most attractive for implementation in PicOS, since it is most fair and has the least AWT and ATT on average over the other non-preemptive scheduling algorithms implemented in this paper.
Algorithm architecture co-design for ultra low-power image sensor
NASA Astrophysics Data System (ADS)
Laforest, T.; Dupret, A.; Verdant, A.; Lattard, D.; Villard, P.
2012-03-01
In a context of embedded video surveillance, stand alone leftbehind image sensors are used to detect events with high level of confidence, but also with a very low power consumption. Using a steady camera, motion detection algorithms based on background estimation to find regions in movement are simple to implement and computationally efficient. To reduce power consumption, the background is estimated using a down sampled image formed of macropixels. In order to extend the class of moving objects to be detected, we propose an original mixed mode architecture developed thanks to an algorithm architecture co-design methodology. This programmable architecture is composed of a vector of SIMD processors. A basic RISC architecture was optimized in order to implement motion detection algorithms with a dedicated set of 42 instructions. Definition of delta modulation as a calculation primitive has allowed to implement algorithms in a very compact way. Thereby, a 1920x1080@25fps CMOS image sensor performing integrated motion detection is proposed with a power estimation of 1.8 mW.
An improved non-uniformity correction algorithm and its GPU parallel implementation
NASA Astrophysics Data System (ADS)
Cheng, Kuanhong; Zhou, Huixin; Qin, Hanlin; Zhao, Dong; Qian, Kun; Rong, Shenghui
2018-05-01
The performance of SLP-THP based non-uniformity correction algorithm is seriously affected by the result of SLP filter, which always leads to image blurring and ghosting artifacts. To address this problem, an improved SLP-THP based non-uniformity correction method with curvature constraint was proposed. Here we put forward a new way to estimate spatial low frequency component. First, the details and contours of input image were obtained respectively by minimizing local Gaussian curvature and mean curvature of image surface. Then, the guided filter was utilized to combine these two parts together to get the estimate of spatial low frequency component. Finally, we brought this SLP component into SLP-THP method to achieve non-uniformity correction. The performance of proposed algorithm was verified by several real and simulated infrared image sequences. The experimental results indicated that the proposed algorithm can reduce the non-uniformity without detail losing. After that, a GPU based parallel implementation that runs 150 times faster than CPU was presented, which showed the proposed algorithm has great potential for real time application.
A fuzzy clustering algorithm to detect planar and quadric shapes
NASA Technical Reports Server (NTRS)
Krishnapuram, Raghu; Frigui, Hichem; Nasraoui, Olfa
1992-01-01
In this paper, we introduce a new fuzzy clustering algorithm to detect an unknown number of planar and quadric shapes in noisy data. The proposed algorithm is computationally and implementationally simple, and it overcomes many of the drawbacks of the existing algorithms that have been proposed for similar tasks. Since the clustering is performed in the original image space, and since no features need to be computed, this approach is particularly suited for sparse data. The algorithm may also be used in pattern recognition applications.
One cutting plane algorithm using auxiliary functions
NASA Astrophysics Data System (ADS)
Zabotin, I. Ya; Kazaeva, K. E.
2016-11-01
We propose an algorithm for solving a convex programming problem from the class of cutting methods. The algorithm is characterized by the construction of approximations using some auxiliary functions, instead of the objective function. Each auxiliary function bases on the exterior penalty function. In proposed algorithm the admissible set and the epigraph of each auxiliary function are embedded into polyhedral sets. In connection with the above, the iteration points are found by solving linear programming problems. We discuss the implementation of the algorithm and prove its convergence.
Modified artificial bee colony algorithm for reactive power optimization
NASA Astrophysics Data System (ADS)
Sulaiman, Noorazliza; Mohamad-Saleh, Junita; Abro, Abdul Ghani
2015-05-01
Bio-inspired algorithms (BIAs) implemented to solve various optimization problems have shown promising results which are very important in this severely complex real-world. Artificial Bee Colony (ABC) algorithm, a kind of BIAs has demonstrated tremendous results as compared to other optimization algorithms. This paper presents a new modified ABC algorithm referred to as JA-ABC3 with the aim to enhance convergence speed and avoid premature convergence. The proposed algorithm has been simulated on ten commonly used benchmarks functions. Its performance has also been compared with other existing ABC variants. To justify its robust applicability, the proposed algorithm has been tested to solve Reactive Power Optimization problem. The results have shown that the proposed algorithm has superior performance to other existing ABC variants e.g. GABC, BABC1, BABC2, BsfABC dan IABC in terms of convergence speed. Furthermore, the proposed algorithm has also demonstrated excellence performance in solving Reactive Power Optimization problem.
Iterative Importance Sampling Algorithms for Parameter Estimation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Grout, Ray W; Morzfeld, Matthias; Day, Marcus S.
In parameter estimation problems one computes a posterior distribution over uncertain parameters defined jointly by a prior distribution, a model, and noisy data. Markov chain Monte Carlo (MCMC) is often used for the numerical solution of such problems. An alternative to MCMC is importance sampling, which can exhibit near perfect scaling with the number of cores on high performance computing systems because samples are drawn independently. However, finding a suitable proposal distribution is a challenging task. Several sampling algorithms have been proposed over the past years that take an iterative approach to constructing a proposal distribution. We investigate the applicabilitymore » of such algorithms by applying them to two realistic and challenging test problems, one in subsurface flow, and one in combustion modeling. More specifically, we implement importance sampling algorithms that iterate over the mean and covariance matrix of Gaussian or multivariate t-proposal distributions. Our implementation leverages massively parallel computers, and we present strategies to initialize the iterations using 'coarse' MCMC runs or Gaussian mixture models.« less
NASA Astrophysics Data System (ADS)
Azimi, Ehsan; Behrad, Alireza; Ghaznavi-Ghoushchi, Mohammad Bagher; Shanbehzadeh, Jamshid
2016-11-01
The projective model is an important mapping function for the calculation of global transformation between two images. However, its hardware implementation is challenging because of a large number of coefficients with different required precisions for fixed point representation. A VLSI hardware architecture is proposed for the calculation of a global projective model between input and reference images and refining false matches using random sample consensus (RANSAC) algorithm. To make the hardware implementation feasible, it is proved that the calculation of the projective model can be divided into four submodels comprising two translations, an affine model and a simpler projective mapping. This approach makes the hardware implementation feasible and considerably reduces the required number of bits for fixed point representation of model coefficients and intermediate variables. The proposed hardware architecture for the calculation of a global projective model using the RANSAC algorithm was implemented using Verilog hardware description language and the functionality of the design was validated through several experiments. The proposed architecture was synthesized by using an application-specific integrated circuit digital design flow utilizing 180-nm CMOS technology as well as a Virtex-6 field programmable gate array. Experimental results confirm the efficiency of the proposed hardware architecture in comparison with software implementation.
A Computationally Efficient Visual Saliency Algorithm Suitable for an Analog CMOS Implementation.
D'Angelo, Robert; Wood, Richard; Lowry, Nathan; Freifeld, Geremy; Huang, Haiyao; Salthouse, Christopher D; Hollosi, Brent; Muresan, Matthew; Uy, Wes; Tran, Nhut; Chery, Armand; Poppe, Dorothy C; Sonkusale, Sameer
2018-06-27
Computer vision algorithms are often limited in their application by the large amount of data that must be processed. Mammalian vision systems mitigate this high bandwidth requirement by prioritizing certain regions of the visual field with neural circuits that select the most salient regions. This work introduces a novel and computationally efficient visual saliency algorithm for performing this neuromorphic attention-based data reduction. The proposed algorithm has the added advantage that it is compatible with an analog CMOS design while still achieving comparable performance to existing state-of-the-art saliency algorithms. This compatibility allows for direct integration with the analog-to-digital conversion circuitry present in CMOS image sensors. This integration leads to power savings in the converter by quantizing only the salient pixels. Further system-level power savings are gained by reducing the amount of data that must be transmitted and processed in the digital domain. The analog CMOS compatible formulation relies on a pulse width (i.e., time mode) encoding of the pixel data that is compatible with pulse-mode imagers and slope based converters often used in imager designs. This letter begins by discussing this time-mode encoding for implementing neuromorphic architectures. Next, the proposed algorithm is derived. Hardware-oriented optimizations and modifications to this algorithm are proposed and discussed. Next, a metric for quantifying saliency accuracy is proposed, and simulation results of this metric are presented. Finally, an analog synthesis approach for a time-mode architecture is outlined, and postsynthesis transistor-level simulations that demonstrate functionality of an implementation in a modern CMOS process are discussed.
NASA Astrophysics Data System (ADS)
Laban, Shaban; El-Desouky, Aly
2013-04-01
The monitoring of real-time systems is a challenging and complicated process. So, there is a continuous need to improve the monitoring process through the use of new intelligent techniques and algorithms for detecting exceptions, anomalous behaviours and generating the necessary alerts during the workflow monitoring of such systems. The interval-based or period-based theorems have been discussed, analysed, and used by many researches in Artificial Intelligence (AI), philosophy, and linguistics. As explained by Allen, there are 13 relations between any two intervals. Also, there have also been many studies of interval-based temporal reasoning and logics over the past decades. Interval-based theorems can be used for monitoring real-time interval-based data processing. However, increasing the number of processed intervals makes the implementation of such theorems a complex and time consuming process as the relationships between such intervals are increasing exponentially. To overcome the previous problem, this paper presents a Rule-based Interval State Machine Algorithm (RISMA) for processing, monitoring, and analysing the behaviour of interval-based data, received from real-time sensors. The proposed intelligent algorithm uses the Interval State Machine (ISM) approach to model any number of interval-based data into well-defined states as well as inferring them. An interval-based state transition model and methodology are presented to identify the relationships between the different states of the proposed algorithm. By using such model, the unlimited number of relationships between similar large numbers of intervals can be reduced to only 18 direct relationships using the proposed well-defined states. For testing the proposed algorithm, necessary inference rules and code have been designed and applied to the continuous data received in near real-time from the stations of International Monitoring System (IMS) by the International Data Centre (IDC) of the Preparatory Commission for the Comprehensive Nuclear-Test-Ban Treaty Organization (CTBTO). The CLIPS expert system shell has been used as the main rule engine for implementing the algorithm rules. Python programming language and the module "PyCLIPS" are used for building the necessary code for algorithm implementation. More than 1.7 million intervals constitute the Concise List of Frames (CLF) from 20 different seismic stations have been used for evaluating the proposed algorithm and evaluating stations behaviour and performance. The initial results showed that proposed algorithm can help in better understanding of the operation and performance of those stations. Different important information, such as alerts and some station performance parameters, can be derived from the proposed algorithm. For IMS interval-based data and at any period of time it is possible to analyze station behavior, determine the missing data, generate necessary alerts, and to measure some of station performance attributes. The details of the proposed algorithm, methodology, implementation, experimental results, advantages, and limitations of this research are presented. Finally, future directions and recommendations are discussed.
NASA Astrophysics Data System (ADS)
Nguyen, An Hung; Guillemette, Thomas; Lambert, Andrew J.; Pickering, Mark R.; Garratt, Matthew A.
2017-09-01
Image registration is a fundamental image processing technique. It is used to spatially align two or more images that have been captured at different times, from different sensors, or from different viewpoints. There have been many algorithms proposed for this task. The most common of these being the well-known Lucas-Kanade (LK) and Horn-Schunck approaches. However, the main limitation of these approaches is the computational complexity required to implement the large number of iterations necessary for successful alignment of the images. Previously, a multi-pass image interpolation algorithm (MP-I2A) was developed to considerably reduce the number of iterations required for successful registration compared with the LK algorithm. This paper develops a kernel-warping algorithm (KWA), a modified version of the MP-I2A, which requires fewer iterations to successfully register two images and less memory space for the field-programmable gate array (FPGA) implementation than the MP-I2A. These reductions increase feasibility of the implementation of the proposed algorithm on FPGAs with very limited memory space and other hardware resources. A two-FPGA system rather than single FPGA system is successfully developed to implement the KWA in order to compensate insufficiency of hardware resources supported by one FPGA, and increase parallel processing ability and scalability of the system.
NASA Astrophysics Data System (ADS)
Lashkin, S. V.; Kozelkov, A. S.; Yalozo, A. V.; Gerasimov, V. Yu.; Zelensky, D. K.
2017-12-01
This paper describes the details of the parallel implementation of the SIMPLE algorithm for numerical solution of the Navier-Stokes system of equations on arbitrary unstructured grids. The iteration schemes for the serial and parallel versions of the SIMPLE algorithm are implemented. In the description of the parallel implementation, special attention is paid to computational data exchange among processors under the condition of the grid model decomposition using fictitious cells. We discuss the specific features for the storage of distributed matrices and implementation of vector-matrix operations in parallel mode. It is shown that the proposed way of matrix storage reduces the number of interprocessor exchanges. A series of numerical experiments illustrates the effect of the multigrid SLAE solver tuning on the general efficiency of the algorithm; the tuning involves the types of the cycles used (V, W, and F), the number of iterations of a smoothing operator, and the number of cells for coarsening. Two ways (direct and indirect) of efficiency evaluation for parallelization of the numerical algorithm are demonstrated. The paper presents the results of solving some internal and external flow problems with the evaluation of parallelization efficiency by two algorithms. It is shown that the proposed parallel implementation enables efficient computations for the problems on a thousand processors. Based on the results obtained, some general recommendations are made for the optimal tuning of the multigrid solver, as well as for selecting the optimal number of cells per processor.
Efficient implementation of parallel three-dimensional FFT on clusters of PCs
NASA Astrophysics Data System (ADS)
Takahashi, Daisuke
2003-05-01
In this paper, we propose a high-performance parallel three-dimensional fast Fourier transform (FFT) algorithm on clusters of PCs. The three-dimensional FFT algorithm can be altered into a block three-dimensional FFT algorithm to reduce the number of cache misses. We show that the block three-dimensional FFT algorithm improves performance by utilizing the cache memory effectively. We use the block three-dimensional FFT algorithm to implement the parallel three-dimensional FFT algorithm. We succeeded in obtaining performance of over 1.3 GFLOPS on an 8-node dual Pentium III 1 GHz PC SMP cluster.
NASA Astrophysics Data System (ADS)
Zhou, Yali; Zhang, Qizhi; Yin, Yixin
2015-05-01
In this paper, active control of impulsive noise with symmetric α-stable (SαS) distribution is studied. A general step-size normalized filtered-x Least Mean Square (FxLMS) algorithm is developed based on the analysis of existing algorithms, and the Gaussian distribution function is used to normalize the step size. Compared with existing algorithms, the proposed algorithm needs neither the parameter selection and thresholds estimation nor the process of cost function selection and complex gradient computation. Computer simulations have been carried out to suggest that the proposed algorithm is effective for attenuating SαS impulsive noise, and then the proposed algorithm has been implemented in an experimental ANC system. Experimental results show that the proposed scheme has good performance for SαS impulsive noise attenuation.
NASA Astrophysics Data System (ADS)
Ham, Woonchul; Song, Chulgyu; Lee, Kangsan; Roh, Seungkuk
2016-05-01
In this paper, we propose a new image reconstruction algorithm considering the geometric information of acoustic sources and senor detector and review the two-step reconstruction algorithm which was previously proposed based on the geometrical information of ROI(region of interest) considering the finite size of acoustic sensor element. In a new image reconstruction algorithm, not only mathematical analysis is very simple but also its software implementation is very easy because we don't need to use the FFT. We verify the effectiveness of the proposed reconstruction algorithm by showing the simulation results by using Matlab k-wave toolkit.
NASA Astrophysics Data System (ADS)
Jin, Minglei; Jin, Weiqi; Li, Yiyang; Li, Shuo
2015-08-01
In this paper, we propose a novel scene-based non-uniformity correction algorithm for infrared image processing-temporal high-pass non-uniformity correction algorithm based on grayscale mapping (THP and GM). The main sources of non-uniformity are: (1) detector fabrication inaccuracies; (2) non-linearity and variations in the read-out electronics and (3) optical path effects. The non-uniformity will be reduced by non-uniformity correction (NUC) algorithms. The NUC algorithms are often divided into calibration-based non-uniformity correction (CBNUC) algorithms and scene-based non-uniformity correction (SBNUC) algorithms. As non-uniformity drifts temporally, CBNUC algorithms must be repeated by inserting a uniform radiation source which SBNUC algorithms do not need into the view, so the SBNUC algorithm becomes an essential part of infrared imaging system. The SBNUC algorithms' poor robustness often leads two defects: artifacts and over-correction, meanwhile due to complicated calculation process and large storage consumption, hardware implementation of the SBNUC algorithms is difficult, especially in Field Programmable Gate Array (FPGA) platform. The THP and GM algorithm proposed in this paper can eliminate the non-uniformity without causing defects. The hardware implementation of the algorithm only based on FPGA has two advantages: (1) low resources consumption, and (2) small hardware delay: less than 20 lines, it can be transplanted to a variety of infrared detectors equipped with FPGA image processing module, it can reduce the stripe non-uniformity and the ripple non-uniformity.
NASA Astrophysics Data System (ADS)
Gimazov, R.; Shidlovskiy, S.
2018-05-01
In this paper, we consider the architecture of the algorithm for extreme regulation in the photovoltaic system. An algorithm based on an adaptive neural network with fuzzy inference is proposed. The implementation of such an algorithm not only allows solving a number of problems in existing algorithms for extreme power regulation of photovoltaic systems, but also creates a reserve for the creation of a universal control system for a photovoltaic system.
A 0.13-µm implementation of 5 Gb/s and 3-mW folded parallel architecture for AES algorithm
NASA Astrophysics Data System (ADS)
Rahimunnisa, K.; Karthigaikumar, P.; Kirubavathy, J.; Jayakumar, J.; Kumar, S. Suresh
2014-02-01
A new architecture for encrypting and decrypting the confidential data using Advanced Encryption Standard algorithm is presented in this article. This structure combines the folded structure with parallel architecture to increase the throughput. The whole architecture achieved high throughput with less power. The proposed architecture is implemented in 0.13-µm Complementary metal-oxide-semiconductor (CMOS) technology. The proposed structure is compared with different existing structures, and from the result it is proved that the proposed structure gives higher throughput and less power compared to existing works.
Public-key encryption with chaos.
Kocarev, Ljupco; Sterjev, Marjan; Fekete, Attila; Vattay, Gabor
2004-12-01
We propose public-key encryption algorithms based on chaotic maps, which are generalization of well-known and commercially used algorithms: Rivest-Shamir-Adleman (RSA), ElGamal, and Rabin. For the case of generalized RSA algorithm we discuss in detail its software implementation and properties. We show that our algorithm is as secure as RSA algorithm.
Public-key encryption with chaos
NASA Astrophysics Data System (ADS)
Kocarev, Ljupco; Sterjev, Marjan; Fekete, Attila; Vattay, Gabor
2004-12-01
We propose public-key encryption algorithms based on chaotic maps, which are generalization of well-known and commercially used algorithms: Rivest-Shamir-Adleman (RSA), ElGamal, and Rabin. For the case of generalized RSA algorithm we discuss in detail its software implementation and properties. We show that our algorithm is as secure as RSA algorithm.
Parallel heterogeneous architectures for efficient OMP compressive sensing reconstruction
NASA Astrophysics Data System (ADS)
Kulkarni, Amey; Stanislaus, Jerome L.; Mohsenin, Tinoosh
2014-05-01
Compressive Sensing (CS) is a novel scheme, in which a signal that is sparse in a known transform domain can be reconstructed using fewer samples. The signal reconstruction techniques are computationally intensive and have sluggish performance, which make them impractical for real-time processing applications . The paper presents novel architectures for Orthogonal Matching Pursuit algorithm, one of the popular CS reconstruction algorithms. We show the implementation results of proposed architectures on FPGA, ASIC and on a custom many-core platform. For FPGA and ASIC implementation, a novel thresholding method is used to reduce the processing time for the optimization problem by at least 25%. Whereas, for the custom many-core platform, efficient parallelization techniques are applied, to reconstruct signals with variant signal lengths of N and sparsity of m. The algorithm is divided into three kernels. Each kernel is parallelized to reduce execution time, whereas efficient reuse of the matrix operators allows us to reduce area. Matrix operations are efficiently paralellized by taking advantage of blocked algorithms. For demonstration purpose, all architectures reconstruct a 256-length signal with maximum sparsity of 8 using 64 measurements. Implementation on Xilinx Virtex-5 FPGA, requires 27.14 μs to reconstruct the signal using basic OMP. Whereas, with thresholding method it requires 18 μs. ASIC implementation reconstructs the signal in 13 μs. However, our custom many-core, operating at 1.18 GHz, takes 18.28 μs to complete. Our results show that compared to the previous published work of the same algorithm and matrix size, proposed architectures for FPGA and ASIC implementations perform 1.3x and 1.8x respectively faster. Also, the proposed many-core implementation performs 3000x faster than the CPU and 2000x faster than the GPU.
What does fault tolerant Deep Learning need from MPI?
DOE Office of Scientific and Technical Information (OSTI.GOV)
Amatya, Vinay C.; Vishnu, Abhinav; Siegel, Charles M.
Deep Learning (DL) algorithms have become the {\\em de facto} Machine Learning (ML) algorithm for large scale data analysis. DL algorithms are computationally expensive -- even distributed DL implementations which use MPI require days of training (model learning) time on commonly studied datasets. Long running DL applications become susceptible to faults -- requiring development of a fault tolerant system infrastructure, in addition to fault tolerant DL algorithms. This raises an important question: {\\em What is needed from MPI for designing fault tolerant DL implementations?} In this paper, we address this problem for permanent faults. We motivate the need for amore » fault tolerant MPI specification by an in-depth consideration of recent innovations in DL algorithms and their properties, which drive the need for specific fault tolerance features. We present an in-depth discussion on the suitability of different parallelism types (model, data and hybrid); a need (or lack thereof) for check-pointing of any critical data structures; and most importantly, consideration for several fault tolerance proposals (user-level fault mitigation (ULFM), Reinit) in MPI and their applicability to fault tolerant DL implementations. We leverage a distributed memory implementation of Caffe, currently available under the Machine Learning Toolkit for Extreme Scale (MaTEx). We implement our approaches by extending MaTEx-Caffe for using ULFM-based implementation. Our evaluation using the ImageNet dataset and AlexNet neural network topology demonstrates the effectiveness of the proposed fault tolerant DL implementation using OpenMPI based ULFM.« less
An implementation of the look-ahead Lanczos algorithm for non-Hermitian matrices, part 1
NASA Technical Reports Server (NTRS)
Freund, Roland W.; Gutknecht, Martin H.; Nachtigal, Noel M.
1990-01-01
The nonsymmetric Lanczos method can be used to compute eigenvalues of large sparse non-Hermitian matrices or to solve large sparse non-Hermitian linear systems. However, the original Lanczos algorithm is susceptible to possible breakdowns and potential instabilities. We present an implementation of a look-ahead version of the Lanczos algorithm which overcomes these problems by skipping over those steps in which a breakdown or near-breakdown would occur in the standard process. The proposed algorithm can handle look-ahead steps of any length and is not restricted to steps of length 2, as earlier implementations are. Also, our implementation has the feature that it requires roughly the same number of inner products as the standard Lanczos process without look-ahead.
Specialized Computer Systems for Environment Visualization
NASA Astrophysics Data System (ADS)
Al-Oraiqat, Anas M.; Bashkov, Evgeniy A.; Zori, Sergii A.
2018-06-01
The need for real time image generation of landscapes arises in various fields as part of tasks solved by virtual and augmented reality systems, as well as geographic information systems. Such systems provide opportunities for collecting, storing, analyzing and graphically visualizing geographic data. Algorithmic and hardware software tools for increasing the realism and efficiency of the environment visualization in 3D visualization systems are proposed. This paper discusses a modified path tracing algorithm with a two-level hierarchy of bounding volumes and finding intersections with Axis-Aligned Bounding Box. The proposed algorithm eliminates the branching and hence makes the algorithm more suitable to be implemented on the multi-threaded CPU and GPU. A modified ROAM algorithm is used to solve the qualitative visualization of reliefs' problems and landscapes. The algorithm is implemented on parallel systems—cluster and Compute Unified Device Architecture-networks. Results show that the implementation on MPI clusters is more efficient than Graphics Processing Unit/Graphics Processing Clusters and allows real-time synthesis. The organization and algorithms of the parallel GPU system for the 3D pseudo stereo image/video synthesis are proposed. With realizing possibility analysis on a parallel GPU-architecture of each stage, 3D pseudo stereo synthesis is performed. An experimental prototype of a specialized hardware-software system 3D pseudo stereo imaging and video was developed on the CPU/GPU. The experimental results show that the proposed adaptation of 3D pseudo stereo imaging to the architecture of GPU-systems is efficient. Also it accelerates the computational procedures of 3D pseudo-stereo synthesis for the anaglyph and anamorphic formats of the 3D stereo frame without performing optimization procedures. The acceleration is on average 11 and 54 times for test GPUs.
Community detection in complex networks by using membrane algorithm
NASA Astrophysics Data System (ADS)
Liu, Chuang; Fan, Linan; Liu, Zhou; Dai, Xiang; Xu, Jiamei; Chang, Baoren
Community detection in complex networks is a key problem of network analysis. In this paper, a new membrane algorithm is proposed to solve the community detection in complex networks. The proposed algorithm is based on membrane systems, which consists of objects, reaction rules, and a membrane structure. Each object represents a candidate partition of a complex network, and the quality of objects is evaluated according to network modularity. The reaction rules include evolutionary rules and communication rules. Evolutionary rules are responsible for improving the quality of objects, which employ the differential evolutionary algorithm to evolve objects. Communication rules implement the information exchanged among membranes. Finally, the proposed algorithm is evaluated on synthetic, real-world networks with real partitions known and the large-scaled networks with real partitions unknown. The experimental results indicate the superior performance of the proposed algorithm in comparison with other experimental algorithms.
FPGA-based coprocessor for matrix algorithms implementation
NASA Astrophysics Data System (ADS)
Amira, Abbes; Bensaali, Faycal
2003-03-01
Matrix algorithms are important in many types of applications including image and signal processing. These areas require enormous computing power. A close examination of the algorithms used in these, and related, applications reveals that many of the fundamental actions involve matrix operations such as matrix multiplication which is of O (N3) on a sequential computer and O (N3/p) on a parallel system with p processors complexity. This paper presents an investigation into the design and implementation of different matrix algorithms such as matrix operations, matrix transforms and matrix decompositions using an FPGA based environment. Solutions for the problem of processing large matrices have been proposed. The proposed system architectures are scalable, modular and require less area and time complexity with reduced latency when compared with existing structures.
Multitarget mixture reduction algorithm with incorporated target existence recursions
NASA Astrophysics Data System (ADS)
Ristic, Branko; Arulampalam, Sanjeev
2000-07-01
The paper derives a deferred logic data association algorithm based on the mixture reduction approach originally due to Salmond [SPIE vol.1305, 1990]. The novelty of the proposed algorithm provides the recursive formulae for both data association and target existence (confidence) estimation, thus allowing automatic track initiation and termination. T he track initiation performance of the proposed filter is investigated by computer simulations. It is observed that at moderately high levels of clutter density the proposed filter initiates tracks more reliably than its corresponding PDA filter. An extension of the proposed filter to the multi-target case is also presented. In addition, the paper compares the track maintenance performance of the MR algorithm with an MHT implementation.
NASA Astrophysics Data System (ADS)
Tang, Li; Liu, Jing-Ning; Feng, Dan; Tong, Wei
2008-12-01
Existing security solutions in network storage environment perform poorly because cryptographic operations (encryption and decryption) implemented in software can dramatically reduce system performance. In this paper we propose a cryptographic hardware accelerator on dynamically reconfigurable platform for the security of high performance network storage system. We employ a dynamic reconfigurable platform based on a FPGA to implement a PowerPCbased embedded system, which executes cryptographic algorithms. To reduce the reconfiguration latency, we apply prefetch scheduling. Moreover, the processing elements could be dynamically configured to support different cryptographic algorithms according to the request received by the accelerator. In the experiment, we have implemented AES (Rijndael) and 3DES cryptographic algorithms in the reconfigurable accelerator. Our proposed reconfigurable cryptographic accelerator could dramatically increase the performance comparing with the traditional software-based network storage systems.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhou, Ning; Huang, Zhenyu; Tuffner, Francis K.
2010-02-28
Small signal stability problems are one of the major threats to grid stability and reliability. Prony analysis has been successfully applied on ringdown data to monitor electromechanical modes of a power system using phasor measurement unit (PMU) data. To facilitate an on-line application of mode estimation, this paper develops a recursive algorithm for implementing Prony analysis and proposed an oscillation detection method to detect ringdown data in real time. By automatically detecting ringdown data, the proposed method helps guarantee that Prony analysis is applied properly and timely on the ringdown data. Thus, the mode estimation results can be performed reliablymore » and timely. The proposed method is tested using Monte Carlo simulations based on a 17-machine model and is shown to be able to properly identify the oscillation data for on-line application of Prony analysis. In addition, the proposed method is applied to field measurement data from WECC to show the performance of the proposed algorithm.« less
Angle Statistics Reconstruction: a robust reconstruction algorithm for Muon Scattering Tomography
NASA Astrophysics Data System (ADS)
Stapleton, M.; Burns, J.; Quillin, S.; Steer, C.
2014-11-01
Muon Scattering Tomography (MST) is a technique for using the scattering of cosmic ray muons to probe the contents of enclosed volumes. As a muon passes through material it undergoes multiple Coulomb scattering, where the amount of scattering is dependent on the density and atomic number of the material as well as the path length. Hence, MST has been proposed as a means of imaging dense materials, for instance to detect special nuclear material in cargo containers. Algorithms are required to generate an accurate reconstruction of the material density inside the volume from the muon scattering information and some have already been proposed, most notably the Point of Closest Approach (PoCA) and Maximum Likelihood/Expectation Maximisation (MLEM) algorithms. However, whilst PoCA-based algorithms are easy to implement, they perform rather poorly in practice. Conversely, MLEM is a complicated algorithm to implement and computationally intensive and there is currently no published, fast and easily-implementable algorithm that performs well in practice. In this paper, we first provide a detailed analysis of the source of inaccuracy in PoCA-based algorithms. We then motivate an alternative method, based on ideas first laid out by Morris et al, presenting and fully specifying an algorithm that performs well against simulations of realistic scenarios. We argue this new algorithm should be adopted by developers of Muon Scattering Tomography as an alternative to PoCA.
Fuzzy logic-based approach to detecting a passive RFID tag in an outpatient clinic.
Min, Daiki; Yih, Yuehwern
2011-06-01
This study is motivated by the observations on the data collected by radio frequency identification (RFID) readers in a pilot study, which was used to investigate the feasibility of implementing an RFID-based monitoring system in an outpatient eye clinic. The raw RFID data collected from RFID readers contain noise and missing reads, which prevent us from determining the tag location. In this paper, fuzzy logic-based algorithms are proposed to interpret the raw RFID data to extract accurate information. The proposed algorithms determine the location of an RFID tag by evaluating its possibility of presence and absence. To evaluate the performance of the proposed algorithms, numerical experiments are conducted using the data observed in the outpatient eye clinic. Experiments results showed that the proposed algorithms outperform existing static smoothing method in terms of minimizing both false positives and false negatives. Furthermore, the proposed algorithms are applied to a set of simulated data to show the robustness of the proposed algorithms at various levels of RFID reader reliability.
On recursive least-squares filtering algorithms and implementations. Ph.D. Thesis
NASA Technical Reports Server (NTRS)
Hsieh, Shih-Fu
1990-01-01
In many real-time signal processing applications, fast and numerically stable algorithms for solving least-squares problems are necessary and important. In particular, under non-stationary conditions, these algorithms must be able to adapt themselves to reflect the changes in the system and take appropriate adjustments to achieve optimum performances. Among existing algorithms, the QR-decomposition (QRD)-based recursive least-squares (RLS) methods have been shown to be useful and effective for adaptive signal processing. In order to increase the speed of processing and achieve high throughput rate, many algorithms are being vectorized and/or pipelined to facilitate high degrees of parallelism. A time-recursive formulation of RLS filtering employing block QRD will be considered first. Several methods, including a new non-continuous windowing scheme based on selectively rejecting contaminated data, were investigated for adaptive processing. Based on systolic triarrays, many other forms of systolic arrays are shown to be capable of implementing different algorithms. Various updating and downdating systolic algorithms and architectures for RLS filtering are examined and compared in details, which include Householder reflector, Gram-Schmidt procedure, and Givens rotation. A unified approach encompassing existing square-root-free algorithms is also proposed. For the sinusoidal spectrum estimation problem, a judicious method of separating the noise from the signal is of great interest. Various truncated QR methods are proposed for this purpose and compared to the truncated SVD method. Computer simulations provided for detailed comparisons show the effectiveness of these methods. This thesis deals with fundamental issues of numerical stability, computational efficiency, adaptivity, and VLSI implementation for the RLS filtering problems. In all, various new and modified algorithms and architectures are proposed and analyzed; the significance of any of the new method depends crucially on specific application.
High-Dimensional Exploratory Item Factor Analysis by a Metropolis-Hastings Robbins-Monro Algorithm
ERIC Educational Resources Information Center
Cai, Li
2010-01-01
A Metropolis-Hastings Robbins-Monro (MH-RM) algorithm for high-dimensional maximum marginal likelihood exploratory item factor analysis is proposed. The sequence of estimates from the MH-RM algorithm converges with probability one to the maximum likelihood solution. Details on the computer implementation of this algorithm are provided. The…
NASA Astrophysics Data System (ADS)
Bouganssa, Issam; Sbihi, Mohamed; Zaim, Mounia
2017-07-01
The 2D Discrete Wavelet Transform (DWT) is a computationally intensive task that is usually implemented on specific architectures in many imaging systems in real time. In this paper, a high throughput edge or contour detection algorithm is proposed based on the discrete wavelet transform. A technique for applying the filters on the three directions (Horizontal, Vertical and Diagonal) of the image is used to present the maximum of the existing contours. The proposed architectures were designed in VHDL and mapped to a Xilinx Sparten6 FPGA. The results of the synthesis show that the proposed architecture has a low area cost and can operate up to 100 MHz, which can perform 2D wavelet analysis for a sequence of images while maintaining the flexibility of the system to support an adaptive algorithm.
Jankovic, Marko; Ogawa, Hidemitsu
2003-08-01
This paper presents one possible implementation of a transformation that performs linear mapping to a lower-dimensional subspace. Principal component subspace will be the one that will be analyzed. Idea implemented in this paper represents generalization of the recently proposed infinity OH neural method for principal component extraction. The calculations in the newly proposed method are performed locally--a feature which is usually considered as desirable from the biological point of view. Comparing to some other wellknown methods, proposed synaptic efficacy learning rule requires less information about the value of the other efficacies to make single efficacy modification. Synaptic efficacies are modified by implementation of Modulated Hebb-type (MH) learning rule. Slightly modified MH algorithm named Modulated Hebb Oja (MHO) algorithm, will be also introduced. Structural similarity of the proposed network with part of the retinal circuit will be presented, too.
Mao, Yong; Zhou, Xiao-Bo; Pi, Dao-Ying; Sun, You-Xian; Wong, Stephen T C
2005-10-01
In microarray-based cancer classification, gene selection is an important issue owing to the large number of variables and small number of samples as well as its non-linearity. It is difficult to get satisfying results by using conventional linear statistical methods. Recursive feature elimination based on support vector machine (SVM RFE) is an effective algorithm for gene selection and cancer classification, which are integrated into a consistent framework. In this paper, we propose a new method to select parameters of the aforementioned algorithm implemented with Gaussian kernel SVMs as better alternatives to the common practice of selecting the apparently best parameters by using a genetic algorithm to search for a couple of optimal parameter. Fast implementation issues for this method are also discussed for pragmatic reasons. The proposed method was tested on two representative hereditary breast cancer and acute leukaemia datasets. The experimental results indicate that the proposed method performs well in selecting genes and achieves high classification accuracies with these genes.
Canbay, Ferhat; Levent, Vecdi Emre; Serbes, Gorkem; Ugurdag, H. Fatih; Goren, Sezer
2016-01-01
The authors aimed to develop an application for producing different architectures to implement dual tree complex wavelet transform (DTCWT) having near shift-invariance property. To obtain a low-cost and portable solution for implementing the DTCWT in multi-channel real-time applications, various embedded-system approaches are realised. For comparison, the DTCWT was implemented in C language on a personal computer and on a PIC microcontroller. However, in the former approach portability and in the latter desired speed performance properties cannot be achieved. Hence, implementation of the DTCWT on a reconfigurable platform such as field programmable gate array, which provides portable, low-cost, low-power, and high-performance computing, is considered as the most feasible solution. At first, they used the system generator DSP design tool of Xilinx for algorithm design. However, the design implemented by using such tools is not optimised in terms of area and power. To overcome all these drawbacks mentioned above, they implemented the DTCWT algorithm by using Verilog Hardware Description Language, which has its own difficulties. To overcome these difficulties, simplify the usage of proposed algorithms and the adaptation procedures, a code generator program that can produce different architectures is proposed. PMID:27733925
Canbay, Ferhat; Levent, Vecdi Emre; Serbes, Gorkem; Ugurdag, H Fatih; Goren, Sezer; Aydin, Nizamettin
2016-09-01
The authors aimed to develop an application for producing different architectures to implement dual tree complex wavelet transform (DTCWT) having near shift-invariance property. To obtain a low-cost and portable solution for implementing the DTCWT in multi-channel real-time applications, various embedded-system approaches are realised. For comparison, the DTCWT was implemented in C language on a personal computer and on a PIC microcontroller. However, in the former approach portability and in the latter desired speed performance properties cannot be achieved. Hence, implementation of the DTCWT on a reconfigurable platform such as field programmable gate array, which provides portable, low-cost, low-power, and high-performance computing, is considered as the most feasible solution. At first, they used the system generator DSP design tool of Xilinx for algorithm design. However, the design implemented by using such tools is not optimised in terms of area and power. To overcome all these drawbacks mentioned above, they implemented the DTCWT algorithm by using Verilog Hardware Description Language, which has its own difficulties. To overcome these difficulties, simplify the usage of proposed algorithms and the adaptation procedures, a code generator program that can produce different architectures is proposed.
Operating Quantum States in Single Magnetic Molecules: Implementation of Grover's Quantum Algorithm.
Godfrin, C; Ferhat, A; Ballou, R; Klyatskaya, S; Ruben, M; Wernsdorfer, W; Balestro, F
2017-11-03
Quantum algorithms use the principles of quantum mechanics, such as, for example, quantum superposition, in order to solve particular problems outperforming standard computation. They are developed for cryptography, searching, optimization, simulation, and solving large systems of linear equations. Here, we implement Grover's quantum algorithm, proposed to find an element in an unsorted list, using a single nuclear 3/2 spin carried by a Tb ion sitting in a single molecular magnet transistor. The coherent manipulation of this multilevel quantum system (qudit) is achieved by means of electric fields only. Grover's search algorithm is implemented by constructing a quantum database via a multilevel Hadamard gate. The Grover sequence then allows us to select each state. The presented method is of universal character and can be implemented in any multilevel quantum system with nonequal spaced energy levels, opening the way to novel quantum search algorithms.
Operating Quantum States in Single Magnetic Molecules: Implementation of Grover's Quantum Algorithm
NASA Astrophysics Data System (ADS)
Godfrin, C.; Ferhat, A.; Ballou, R.; Klyatskaya, S.; Ruben, M.; Wernsdorfer, W.; Balestro, F.
2017-11-01
Quantum algorithms use the principles of quantum mechanics, such as, for example, quantum superposition, in order to solve particular problems outperforming standard computation. They are developed for cryptography, searching, optimization, simulation, and solving large systems of linear equations. Here, we implement Grover's quantum algorithm, proposed to find an element in an unsorted list, using a single nuclear 3 /2 spin carried by a Tb ion sitting in a single molecular magnet transistor. The coherent manipulation of this multilevel quantum system (qudit) is achieved by means of electric fields only. Grover's search algorithm is implemented by constructing a quantum database via a multilevel Hadamard gate. The Grover sequence then allows us to select each state. The presented method is of universal character and can be implemented in any multilevel quantum system with nonequal spaced energy levels, opening the way to novel quantum search algorithms.
Hardware realization of an SVM algorithm implemented in FPGAs
NASA Astrophysics Data System (ADS)
Wiśniewski, Remigiusz; Bazydło, Grzegorz; Szcześniak, Paweł
2017-08-01
The paper proposes a technique of hardware realization of a space vector modulation (SVM) of state function switching in matrix converter (MC), oriented on the implementation in a single field programmable gate array (FPGA). In MC the SVM method is based on the instantaneous space-vector representation of input currents and output voltages. The traditional computation algorithms usually involve digital signal processors (DSPs) which consumes the large number of power transistors (18 transistors and 18 independent PWM outputs) and "non-standard positions of control pulses" during the switching sequence. Recently, hardware implementations become popular since computed operations may be executed much faster and efficient due to nature of the digital devices (especially concurrency). In the paper, we propose a hardware algorithm of SVM computation. In opposite to the existing techniques, the presented solution applies COordinate Rotation DIgital Computer (CORDIC) method to solve the trigonometric operations. Furthermore, adequate arithmetic modules (that is, sub-devices) used for intermediate calculations, such as code converters or proper sectors selectors (for output voltages and input current) are presented in detail. The proposed technique has been implemented as a design described with the use of Verilog hardware description language. The preliminary results of logic implementation oriented on the Xilinx FPGA (particularly, low-cost device from Artix-7 family from Xilinx was used) are also presented.
Discrete size optimization of steel trusses using a refined big bang-big crunch algorithm
NASA Astrophysics Data System (ADS)
Hasançebi, O.; Kazemzadeh Azad, S.
2014-01-01
This article presents a methodology that provides a method for design optimization of steel truss structures based on a refined big bang-big crunch (BB-BC) algorithm. It is shown that a standard formulation of the BB-BC algorithm occasionally falls short of producing acceptable solutions to problems from discrete size optimum design of steel trusses. A reformulation of the algorithm is proposed and implemented for design optimization of various discrete truss structures according to American Institute of Steel Construction Allowable Stress Design (AISC-ASD) specifications. Furthermore, the performance of the proposed BB-BC algorithm is compared to its standard version as well as other well-known metaheuristic techniques. The numerical results confirm the efficiency of the proposed algorithm in practical design optimization of truss structures.
Ren, Shanshan; Bertels, Koen; Al-Ars, Zaid
2018-01-01
GATK HaplotypeCaller (HC) is a popular variant caller, which is widely used to identify variants in complex genomes. However, due to its high variants detection accuracy, it suffers from long execution time. In GATK HC, the pair-HMMs forward algorithm accounts for a large percentage of the total execution time. This article proposes to accelerate the pair-HMMs forward algorithm on graphics processing units (GPUs) to improve the performance of GATK HC. This article presents several GPU-based implementations of the pair-HMMs forward algorithm. It also analyzes the performance bottlenecks of the implementations on an NVIDIA Tesla K40 card with various data sets. Based on these results and the characteristics of GATK HC, we are able to identify the GPU-based implementations with the highest performance for the various analyzed data sets. Experimental results show that the GPU-based implementations of the pair-HMMs forward algorithm achieve a speedup of up to 5.47× over existing GPU-based implementations.
Sinogram-based adaptive iterative reconstruction for sparse view x-ray computed tomography
NASA Astrophysics Data System (ADS)
Trinca, D.; Zhong, Y.; Wang, Y.-Z.; Mamyrbayev, T.; Libin, E.
2016-10-01
With the availability of more powerful computing processors, iterative reconstruction algorithms have recently been successfully implemented as an approach to achieving significant dose reduction in X-ray CT. In this paper, we propose an adaptive iterative reconstruction algorithm for X-ray CT, that is shown to provide results comparable to those obtained by proprietary algorithms, both in terms of reconstruction accuracy and execution time. The proposed algorithm is thus provided for free to the scientific community, for regular use, and for possible further optimization.
NASA Astrophysics Data System (ADS)
Ren, Feixiang; Huang, Jinsheng; Terauchi, Mutsuhiro; Jiang, Ruyi; Klette, Reinhard
A robust and efficient lane detection system is an essential component of Lane Departure Warning Systems, which are commonly used in many vision-based Driver Assistance Systems (DAS) in intelligent transportation. Various computation platforms have been proposed in the past few years for the implementation of driver assistance systems (e.g., PC, laptop, integrated chips, PlayStation, and so on). In this paper, we propose a new platform for the implementation of lane detection, which is based on a mobile phone (the iPhone). Due to physical limitations of the iPhone w.r.t. memory and computing power, a simple and efficient lane detection algorithm using a Hough transform is developed and implemented on the iPhone, as existing algorithms developed based on the PC platform are not suitable for mobile phone devices (currently). Experiments of the lane detection algorithm are made both on PC and on iPhone.
Sequential and parallel image restoration: neural network implementations.
Figueiredo, M T; Leitao, J N
1994-01-01
Sequential and parallel image restoration algorithms and their implementations on neural networks are proposed. For images degraded by linear blur and contaminated by additive white Gaussian noise, maximum a posteriori (MAP) estimation and regularization theory lead to the same high dimension convex optimization problem. The commonly adopted strategy (in using neural networks for image restoration) is to map the objective function of the optimization problem into the energy of a predefined network, taking advantage of its energy minimization properties. Departing from this approach, we propose neural implementations of iterative minimization algorithms which are first proved to converge. The developed schemes are based on modified Hopfield (1985) networks of graded elements, with both sequential and parallel updating schedules. An algorithm supported on a fully standard Hopfield network (binary elements and zero autoconnections) is also considered. Robustness with respect to finite numerical precision is studied, and examples with real images are presented.
A Demons algorithm for image registration with locally adaptive regularization.
Cahill, Nathan D; Noble, J Alison; Hawkes, David J
2009-01-01
Thirion's Demons is a popular algorithm for nonrigid image registration because of its linear computational complexity and ease of implementation. It approximately solves the diffusion registration problem by successively estimating force vectors that drive the deformation toward alignment and smoothing the force vectors by Gaussian convolution. In this article, we show how the Demons algorithm can be generalized to allow image-driven locally adaptive regularization in a manner that preserves both the linear complexity and ease of implementation of the original Demons algorithm. We show that the proposed algorithm exhibits lower target registration error and requires less computational effort than the original Demons algorithm on the registration of serial chest CT scans of patients with lung nodules.
Use of the Hotelling observer to optimize image reconstruction in digital breast tomosynthesis
Sánchez, Adrian A.; Sidky, Emil Y.; Pan, Xiaochuan
2015-01-01
Abstract. We propose an implementation of the Hotelling observer that can be applied to the optimization of linear image reconstruction algorithms in digital breast tomosynthesis. The method is based on considering information within a specific region of interest, and it is applied to the optimization of algorithms for detectability of microcalcifications. Several linear algorithms are considered: simple back-projection, filtered back-projection, back-projection filtration, and Λ-tomography. The optimized algorithms are then evaluated through the reconstruction of phantom data. The method appears robust across algorithms and parameters and leads to the generation of algorithm implementations which subjectively appear optimized for the task of interest. PMID:26702408
An enhanced velocity-based algorithm for safe implementations of gain-scheduled controllers
NASA Astrophysics Data System (ADS)
Lhachemi, H.; Saussié, D.; Zhu, G.
2017-09-01
This paper presents an enhanced velocity-based algorithm to implement gain-scheduled controllers for nonlinear and parameter-dependent systems. A new scheme including pre- and post-filtering is proposed with the assumption that the time-derivative of the controller inputs is not available for feedback control. It is shown that the proposed control structure can preserve the input-output properties of the linearised closed-loop system in the neighbourhood of each equilibrium point, avoiding the emergence of the so-called hidden coupling terms. Moreover, it is guaranteed that this implementation will not introduce unobservable or uncontrollable unstable modes, and hence the internal stability will not be affected. A case study dealing with the design of a pitch-axis missile autopilot is carried out and the numerical simulation results confirm the validity of the proposed approach.
DNA algorithms of implementing biomolecular databases on a biological computer.
Chang, Weng-Long; Vasilakos, Athanasios V
2015-01-01
In this paper, DNA algorithms are proposed to perform eight operations of relational algebra (calculus), which include Cartesian product, union, set difference, selection, projection, intersection, join, and division, on biomolecular relational databases.
NASA Astrophysics Data System (ADS)
Liu, Zhi; Zhou, Baotong; Zhang, Changnian
2017-03-01
Vehicle-mounted panoramic system is important safety assistant equipment for driving. However, traditional systems only render fixed top-down perspective view of limited view field, which may have potential safety hazard. In this paper, a texture mapping algorithm for 3D vehicle-mounted panoramic system is introduced, and an implementation of the algorithm utilizing OpenGL ES library based on Android smart platform is presented. Initial experiment results show that the proposed algorithm can render a good 3D panorama, and has the ability to change view point freely.
Implementing a GPU-based numerical algorithm for modelling dynamics of a high-speed train
NASA Astrophysics Data System (ADS)
Sytov, E. S.; Bratus, A. S.; Yurchenko, D.
2018-04-01
This paper discusses the initiative of implementing a GPU-based numerical algorithm for studying various phenomena associated with dynamics of a high-speed railway transport. The proposed numerical algorithm for calculating a critical speed of the bogie is based on the first Lyapunov number. Numerical algorithm is validated by analytical results, derived for a simple model. A dynamic model of a carriage connected to a new dual-wheelset flexible bogie is studied for linear and dry friction damping. Numerical results obtained by CPU, MPU and GPU approaches are compared and appropriateness of these methods is discussed.
An implementation of a data-transmission pipelining algorithm on Imote2 platforms
NASA Astrophysics Data System (ADS)
Li, Xu; Dorvash, Siavash; Cheng, Liang; Pakzad, Shamim
2011-04-01
Over the past several years, wireless network systems and sensing technologies have been developed significantly. This has resulted in the broad application of wireless sensor networks (WSNs) in many engineering fields and in particular structural health monitoring (SHM). The movement of traditional SHM toward the new generation of SHM, which utilizes WSNs, relies on the advantages of this new approach such as relatively low costs, ease of implementation and the capability of onboard data processing and management. In the particular case of long span bridge monitoring, a WSN should be capable of transmitting commands and measurement data over long network geometry in a reliable manner. While using single-hop data transmission in such geometry requires a long radio range and consequently a high level of power supply, multi-hop communication may offer an effective and reliable way for data transmissions across the network. Using a multi-hop communication protocol, the network relays data from a remote node to the base station via intermediary nodes. We have proposed a data-transmission pipelining algorithm to enable an effective use of the available bandwidth and minimize the energy consumption and the delay performance by the multi-hop communication protocol. This paper focuses on the implementation aspect of the pipelining algorithm on Imote2 platforms for SHM applications, describes its interaction with underlying routing protocols, and presents the solutions to various implementation issues of the proposed pipelining algorithm. Finally, the performance of the algorithm is evaluated based on the results of an experimental implementation.
A Decentralized Eigenvalue Computation Method for Spectrum Sensing Based on Average Consensus
NASA Astrophysics Data System (ADS)
Mohammadi, Jafar; Limmer, Steffen; Stańczak, Sławomir
2016-07-01
This paper considers eigenvalue estimation for the decentralized inference problem for spectrum sensing. We propose a decentralized eigenvalue computation algorithm based on the power method, which is referred to as generalized power method GPM; it is capable of estimating the eigenvalues of a given covariance matrix under certain conditions. Furthermore, we have developed a decentralized implementation of GPM by splitting the iterative operations into local and global computation tasks. The global tasks require data exchange to be performed among the nodes. For this task, we apply an average consensus algorithm to efficiently perform the global computations. As a special case, we consider a structured graph that is a tree with clusters of nodes at its leaves. For an accelerated distributed implementation, we propose to use computation over multiple access channel (CoMAC) as a building block of the algorithm. Numerical simulations are provided to illustrate the performance of the two algorithms.
Adaptive Load-Balancing Algorithms using Symmetric Broadcast Networks
NASA Technical Reports Server (NTRS)
Das, Sajal K.; Harvey, Daniel J.; Biswas, Rupak; Biegel, Bryan A. (Technical Monitor)
2002-01-01
In a distributed computing environment, it is important to ensure that the processor workloads are adequately balanced, Among numerous load-balancing algorithms, a unique approach due to Das and Prasad defines a symmetric broadcast network (SBN) that provides a robust communication pattern among the processors in a topology-independent manner. In this paper, we propose and analyze three efficient SBN-based dynamic load-balancing algorithms, and implement them on an SGI Origin2000. A thorough experimental study with Poisson distributed synthetic loads demonstrates that our algorithms are effective in balancing system load. By optimizing completion time and idle time, the proposed algorithms are shown to compare favorably with several existing approaches.
Embedded System Implementation of Sound Localization in Proximal Region
NASA Astrophysics Data System (ADS)
Iwanaga, Nobuyuki; Matsumura, Tomoya; Yoshida, Akihiro; Kobayashi, Wataru; Onoye, Takao
A sound localization method in the proximal region is proposed, which is based on a low-cost 3D sound localization algorithm with the use of head-related transfer functions (HRTFs). The auditory parallax model is applied to the current algorithm so that more accurate HRTFs can be used for sound localization in the proximal region. In addition, head-shadowing effects based on rigid-sphere model are reproduced in the proximal region by means of a second-order IIR filter. A subjective listening test demonstrates the effectiveness of the proposed method. Embedded system implementation of the proposed method is also described claiming that the proposed method improves sound effects in the proximal region only with 5.1% increase of memory capacity and 8.3% of computational costs.
Human resource recommendation algorithm based on ensemble learning and Spark
NASA Astrophysics Data System (ADS)
Cong, Zihan; Zhang, Xingming; Wang, Haoxiang; Xu, Hongjie
2017-08-01
Aiming at the problem of “information overload” in the human resources industry, this paper proposes a human resource recommendation algorithm based on Ensemble Learning. The algorithm considers the characteristics and behaviours of both job seeker and job features in the real business circumstance. Firstly, the algorithm uses two ensemble learning methods-Bagging and Boosting. The outputs from both learning methods are then merged to form user interest model. Based on user interest model, job recommendation can be extracted for users. The algorithm is implemented as a parallelized recommendation system on Spark. A set of experiments have been done and analysed. The proposed algorithm achieves significant improvement in accuracy, recall rate and coverage, compared with recommendation algorithms such as UserCF and ItemCF.
A real time QRS detection using delay-coordinate mapping for the microcontroller implementation.
Lee, Jeong-Whan; Kim, Kyeong-Seop; Lee, Bongsoo; Lee, Byungchae; Lee, Myoung-Ho
2002-01-01
In this article, we propose a new algorithm using the characteristics of reconstructed phase portraits by delay-coordinate mapping utilizing lag rotundity for a real-time detection of QRS complexes in ECG signals. In reconstructing phase portrait the mapping parameters, time delay, and mapping dimension play important roles in shaping of portraits drawn in a new dimensional space. Experimentally, the optimal mapping time delay for detection of QRS complexes turned out to be 20 ms. To explore the meaning of this time delay and the proper mapping dimension, we applied a fill factor, mutual information, and autocorrelation function algorithm that were generally used to analyze the chaotic characteristics of sampled signals. From these results, we could find the fact that the performance of our proposed algorithms relied mainly on the geometrical property such as an area of the reconstructed phase portrait. For the real application, we applied our algorithm for designing a small cardiac event recorder. This system was to record patients' ECG and R-R intervals for 1 h to investigate HRV characteristics of the patients who had vasovagal syncope symptom and for the evaluation, we implemented our algorithm in C language and applied to MIT/BIH arrhythmia database of 48 subjects. Our proposed algorithm achieved a 99.58% detection rate of QRS complexes.
Implementation details of the coupled QMR algorithm
NASA Technical Reports Server (NTRS)
Freund, Roland W.; Nachtigal, Noel M.
1992-01-01
The original quasi-minimal residual method (QMR) relies on the three-term look-ahead Lanczos process, to generate basis vectors for the underlying Krylov subspaces. However, empirical observations indicate that, in finite precision arithmetic, three-term vector recurrences are less robust than mathematically equivalent coupled two-term recurrences. Therefore, we recently proposed a new implementation of the QMR method based on a coupled two-term look-ahead Lanczos procedure. In this paper, we describe implementation details of this coupled QMR algorithm, and we present results of numerical experiments.
Structural model constructing for optical handwritten character recognition
NASA Astrophysics Data System (ADS)
Khaustov, P. A.; Spitsyn, V. G.; Maksimova, E. I.
2017-02-01
The article is devoted to the development of the algorithms for optical handwritten character recognition based on the structural models constructing. The main advantage of these algorithms is the low requirement regarding the number of reference images. The one-pass approach to a thinning of the binary character representation has been proposed. This approach is based on the joint use of Zhang-Suen and Wu-Tsai algorithms. The effectiveness of the proposed approach is confirmed by the results of the experiments. The article includes the detailed description of the structural model constructing algorithm’s steps. The proposed algorithm has been implemented in character processing application and has been approved on MNIST handwriting characters database. Algorithms that could be used in case of limited reference images number were used for the comparison.
Mai, Xiaofeng; Liu, Jie; Wu, Xiong; Zhang, Qun; Guo, Changjian; Yang, Yanfu; Li, Zhaohui
2017-02-06
A Stokes-space modulation format classification (MFC) technique is proposed for coherent optical receivers by using a non-iterative clustering algorithm. In the clustering algorithm, two simple parameters are calculated to help find the density peaks of the data points in Stokes space and no iteration is required. Correct MFC can be realized in numerical simulations among PM-QPSK, PM-8QAM, PM-16QAM, PM-32QAM and PM-64QAM signals within practical optical signal-to-noise ratio (OSNR) ranges. The performance of the proposed MFC algorithm is also compared with those of other schemes based on clustering algorithms. The simulation results show that good classification performance can be achieved using the proposed MFC scheme with moderate time complexity. Proof-of-concept experiments are finally implemented to demonstrate MFC among PM-QPSK/16QAM/64QAM signals, which confirm the feasibility of our proposed MFC scheme.
Optimization Techniques for Analysis of Biological and Social Networks
2012-03-28
analyzing a new metaheuristic technique, variable objective search. 3. Experimentation and application: Implement the proposed algorithms , test and fine...alternative mathematical programming formulations, their theoretical analysis, the development of exact algorithms , and heuristics. Originally, clusters...systematic fashion under a unifying theoretical and algorithmic framework. Optimization, Complex Networks, Social Network Analysis, Computational
Massanes, Francesc; Cadennes, Marie; Brankov, Jovan G.
2012-01-01
In this paper we describe and evaluate a fast implementation of a classical block matching motion estimation algorithm for multiple Graphical Processing Units (GPUs) using the Compute Unified Device Architecture (CUDA) computing engine. The implemented block matching algorithm (BMA) uses summed absolute difference (SAD) error criterion and full grid search (FS) for finding optimal block displacement. In this evaluation we compared the execution time of a GPU and CPU implementation for images of various sizes, using integer and non-integer search grids. The results show that use of a GPU card can shorten computation time by a factor of 200 times for integer and 1000 times for a non-integer search grid. The additional speedup for non-integer search grid comes from the fact that GPU has built-in hardware for image interpolation. Further, when using multiple GPU cards, the presented evaluation shows the importance of the data splitting method across multiple cards, but an almost linear speedup with a number of cards is achievable. In addition we compared execution time of the proposed FS GPU implementation with two existing, highly optimized non-full grid search CPU based motion estimations methods, namely implementation of the Pyramidal Lucas Kanade Optical flow algorithm in OpenCV and Simplified Unsymmetrical multi-Hexagon search in H.264/AVC standard. In these comparisons, FS GPU implementation still showed modest improvement even though the computational complexity of FS GPU implementation is substantially higher than non-FS CPU implementation. We also demonstrated that for an image sequence of 720×480 pixels in resolution, commonly used in video surveillance, the proposed GPU implementation is sufficiently fast for real-time motion estimation at 30 frames-per-second using two NVIDIA C1060 Tesla GPU cards. PMID:22347787
Massanes, Francesc; Cadennes, Marie; Brankov, Jovan G
2011-07-01
In this paper we describe and evaluate a fast implementation of a classical block matching motion estimation algorithm for multiple Graphical Processing Units (GPUs) using the Compute Unified Device Architecture (CUDA) computing engine. The implemented block matching algorithm (BMA) uses summed absolute difference (SAD) error criterion and full grid search (FS) for finding optimal block displacement. In this evaluation we compared the execution time of a GPU and CPU implementation for images of various sizes, using integer and non-integer search grids.The results show that use of a GPU card can shorten computation time by a factor of 200 times for integer and 1000 times for a non-integer search grid. The additional speedup for non-integer search grid comes from the fact that GPU has built-in hardware for image interpolation. Further, when using multiple GPU cards, the presented evaluation shows the importance of the data splitting method across multiple cards, but an almost linear speedup with a number of cards is achievable.In addition we compared execution time of the proposed FS GPU implementation with two existing, highly optimized non-full grid search CPU based motion estimations methods, namely implementation of the Pyramidal Lucas Kanade Optical flow algorithm in OpenCV and Simplified Unsymmetrical multi-Hexagon search in H.264/AVC standard. In these comparisons, FS GPU implementation still showed modest improvement even though the computational complexity of FS GPU implementation is substantially higher than non-FS CPU implementation.We also demonstrated that for an image sequence of 720×480 pixels in resolution, commonly used in video surveillance, the proposed GPU implementation is sufficiently fast for real-time motion estimation at 30 frames-per-second using two NVIDIA C1060 Tesla GPU cards.
NASA Astrophysics Data System (ADS)
Qiu, Mo; Yu, Simin; Wen, Yuqiong; Lü, Jinhu; He, Jianbin; Lin, Zhuosheng
In this paper, a novel design methodology and its FPGA hardware implementation for a universal chaotic signal generator is proposed via the Verilog HDL fixed-point algorithm and state machine control. According to continuous-time or discrete-time chaotic equations, a Verilog HDL fixed-point algorithm and its corresponding digital system are first designed. In the FPGA hardware platform, each operation step of Verilog HDL fixed-point algorithm is then controlled by a state machine. The generality of this method is that, for any given chaotic equation, it can be decomposed into four basic operation procedures, i.e. nonlinear function calculation, iterative sequence operation, iterative values right shifting and ceiling, and chaotic iterative sequences output, each of which corresponds to only a state via state machine control. Compared with the Verilog HDL floating-point algorithm, the Verilog HDL fixed-point algorithm can save the FPGA hardware resources and improve the operation efficiency. FPGA-based hardware experimental results validate the feasibility and reliability of the proposed approach.
RS-Forest: A Rapid Density Estimator for Streaming Anomaly Detection.
Wu, Ke; Zhang, Kun; Fan, Wei; Edwards, Andrea; Yu, Philip S
Anomaly detection in streaming data is of high interest in numerous application domains. In this paper, we propose a novel one-class semi-supervised algorithm to detect anomalies in streaming data. Underlying the algorithm is a fast and accurate density estimator implemented by multiple fully randomized space trees (RS-Trees), named RS-Forest. The piecewise constant density estimate of each RS-tree is defined on the tree node into which an instance falls. Each incoming instance in a data stream is scored by the density estimates averaged over all trees in the forest. Two strategies, statistical attribute range estimation of high probability guarantee and dual node profiles for rapid model update, are seamlessly integrated into RS-Forest to systematically address the ever-evolving nature of data streams. We derive the theoretical upper bound for the proposed algorithm and analyze its asymptotic properties via bias-variance decomposition. Empirical comparisons to the state-of-the-art methods on multiple benchmark datasets demonstrate that the proposed method features high detection rate, fast response, and insensitivity to most of the parameter settings. Algorithm implementations and datasets are available upon request.
RS-Forest: A Rapid Density Estimator for Streaming Anomaly Detection
Wu, Ke; Zhang, Kun; Fan, Wei; Edwards, Andrea; Yu, Philip S.
2015-01-01
Anomaly detection in streaming data is of high interest in numerous application domains. In this paper, we propose a novel one-class semi-supervised algorithm to detect anomalies in streaming data. Underlying the algorithm is a fast and accurate density estimator implemented by multiple fully randomized space trees (RS-Trees), named RS-Forest. The piecewise constant density estimate of each RS-tree is defined on the tree node into which an instance falls. Each incoming instance in a data stream is scored by the density estimates averaged over all trees in the forest. Two strategies, statistical attribute range estimation of high probability guarantee and dual node profiles for rapid model update, are seamlessly integrated into RS-Forest to systematically address the ever-evolving nature of data streams. We derive the theoretical upper bound for the proposed algorithm and analyze its asymptotic properties via bias-variance decomposition. Empirical comparisons to the state-of-the-art methods on multiple benchmark datasets demonstrate that the proposed method features high detection rate, fast response, and insensitivity to most of the parameter settings. Algorithm implementations and datasets are available upon request. PMID:25685112
Automated Detection of Craters in Martian Satellite Imagery Using Convolutional Neural Networks
NASA Astrophysics Data System (ADS)
Norman, C. J.; Paxman, J.; Benedix, G. K.; Tan, T.; Bland, P. A.; Towner, M.
2018-04-01
Crater counting is used in determining surface age of planets. We propose improvements to martian Crater Detection Algorithms by implementing an end-to-end detection approach with the possibility of scaling the algorithm planet-wide.
Fast underdetermined BSS architecture design methodology for real time applications.
Mopuri, Suresh; Reddy, P Sreenivasa; Acharyya, Amit; Naik, Ganesh R
2015-01-01
In this paper, we propose a high speed architecture design methodology for the Under-determined Blind Source Separation (UBSS) algorithm using our recently proposed high speed Discrete Hilbert Transform (DHT) targeting real time applications. In UBSS algorithm, unlike the typical BSS, the number of sensors are less than the number of the sources, which is of more interest in the real time applications. The DHT architecture has been implemented based on sub matrix multiplication method to compute M point DHT, which uses N point architecture recursively and where M is an integer multiples of N. The DHT architecture and state of the art architecture are coded in VHDL for 16 bit word length and ASIC implementation is carried out using UMC 90 - nm technology @V DD = 1V and @ 1MHZ clock frequency. The proposed architecture implementation and experimental comparison results show that the DHT design is two times faster than state of the art architecture.
One-way quantum computing in superconducting circuits
NASA Astrophysics Data System (ADS)
Albarrán-Arriagada, F.; Alvarado Barrios, G.; Sanz, M.; Romero, G.; Lamata, L.; Retamal, J. C.; Solano, E.
2018-03-01
We propose a method for the implementation of one-way quantum computing in superconducting circuits. Measurement-based quantum computing is a universal quantum computation paradigm in which an initial cluster state provides the quantum resource, while the iteration of sequential measurements and local rotations encodes the quantum algorithm. Up to now, technical constraints have limited a scalable approach to this quantum computing alternative. The initial cluster state can be generated with available controlled-phase gates, while the quantum algorithm makes use of high-fidelity readout and coherent feedforward. With current technology, we estimate that quantum algorithms with above 20 qubits may be implemented in the path toward quantum supremacy. Moreover, we propose an alternative initial state with properties of maximal persistence and maximal connectedness, reducing the required resources of one-way quantum computing protocols.
Optimized data fusion for K-means Laplacian clustering
Yu, Shi; Liu, Xinhai; Tranchevent, Léon-Charles; Glänzel, Wolfgang; Suykens, Johan A. K.; De Moor, Bart; Moreau, Yves
2011-01-01
Motivation: We propose a novel algorithm to combine multiple kernels and Laplacians for clustering analysis. The new algorithm is formulated on a Rayleigh quotient objective function and is solved as a bi-level alternating minimization procedure. Using the proposed algorithm, the coefficients of kernels and Laplacians can be optimized automatically. Results: Three variants of the algorithm are proposed. The performance is systematically validated on two real-life data fusion applications. The proposed Optimized Kernel Laplacian Clustering (OKLC) algorithms perform significantly better than other methods. Moreover, the coefficients of kernels and Laplacians optimized by OKLC show some correlation with the rank of performance of individual data source. Though in our evaluation the K values are predefined, in practical studies, the optimal cluster number can be consistently estimated from the eigenspectrum of the combined kernel Laplacian matrix. Availability: The MATLAB code of algorithms implemented in this paper is downloadable from http://homes.esat.kuleuven.be/~sistawww/bioi/syu/oklc.html. Contact: shiyu@uchicago.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:20980271
Effective Iterated Greedy Algorithm for Flow-Shop Scheduling Problems with Time lags
NASA Astrophysics Data System (ADS)
ZHAO, Ning; YE, Song; LI, Kaidian; CHEN, Siyu
2017-05-01
Flow shop scheduling problem with time lags is a practical scheduling problem and attracts many studies. Permutation problem(PFSP with time lags) is concentrated but non-permutation problem(non-PFSP with time lags) seems to be neglected. With the aim to minimize the makespan and satisfy time lag constraints, efficient algorithms corresponding to PFSP and non-PFSP problems are proposed, which consist of iterated greedy algorithm for permutation(IGTLP) and iterated greedy algorithm for non-permutation (IGTLNP). The proposed algorithms are verified using well-known simple and complex instances of permutation and non-permutation problems with various time lag ranges. The permutation results indicate that the proposed IGTLP can reach near optimal solution within nearly 11% computational time of traditional GA approach. The non-permutation results indicate that the proposed IG can reach nearly same solution within less than 1% computational time compared with traditional GA approach. The proposed research combines PFSP and non-PFSP together with minimal and maximal time lag consideration, which provides an interesting viewpoint for industrial implementation.
General purpose graphic processing unit implementation of adaptive pulse compression algorithms
NASA Astrophysics Data System (ADS)
Cai, Jingxiao; Zhang, Yan
2017-07-01
This study introduces a practical approach to implement real-time signal processing algorithms for general surveillance radar based on NVIDIA graphical processing units (GPUs). The pulse compression algorithms are implemented using compute unified device architecture (CUDA) libraries such as CUDA basic linear algebra subroutines and CUDA fast Fourier transform library, which are adopted from open source libraries and optimized for the NVIDIA GPUs. For more advanced, adaptive processing algorithms such as adaptive pulse compression, customized kernel optimization is needed and investigated. A statistical optimization approach is developed for this purpose without needing much knowledge of the physical configurations of the kernels. It was found that the kernel optimization approach can significantly improve the performance. Benchmark performance is compared with the CPU performance in terms of processing accelerations. The proposed implementation framework can be used in various radar systems including ground-based phased array radar, airborne sense and avoid radar, and aerospace surveillance radar.
An implementation of the QMR method based on coupled two-term recurrences
NASA Technical Reports Server (NTRS)
Freund, Roland W.; Nachtigal, Noeel M.
1992-01-01
The authors have proposed a new Krylov subspace iteration, the quasi-minimal residual algorithm (QMR), for solving non-Hermitian linear systems. In the original implementation of the QMR method, the Lanczos process with look-ahead is used to generate basis vectors for the underlying Krylov subspaces. In the Lanczos algorithm, these basis vectors are computed by means of three-term recurrences. It has been observed that, in finite precision arithmetic, vector iterations based on three-term recursions are usually less robust than mathematically equivalent coupled two-term vector recurrences. This paper presents a look-ahead algorithm that constructs the Lanczos basis vectors by means of coupled two-term recursions. Implementation details are given, and the look-ahead strategy is described. A new implementation of the QMR method, based on this coupled two-term algorithm, is described. A simplified version of the QMR algorithm without look-ahead is also presented, and the special case of QMR for complex symmetric linear systems is considered. Results of numerical experiments comparing the original and the new implementations of the QMR method are reported.
Chang, S; Wong, K W; Zhang, W; Zhang, Y
1999-08-10
An algorithm for optimizing a bipolar interconnection weight matrix with the Hopfield network is proposed. The effectiveness of this algorithm is demonstrated by computer simulation and optical implementation. In the optical implementation of the neural network the interconnection weights are biased to yield a nonnegative weight matrix. Moreover, a threshold subchannel is added so that the system can realize, in real time, the bipolar weighted summation in a single channel. Preliminary experimental results obtained from the applications in associative memories and multitarget classification with rotation invariance are shown.
NASA Astrophysics Data System (ADS)
Chang, Shengjiang; Wong, Kwok-Wo; Zhang, Wenwei; Zhang, Yanxin
1999-08-01
An algorithm for optimizing a bipolar interconnection weight matrix with the Hopfield network is proposed. The effectiveness of this algorithm is demonstrated by computer simulation and optical implementation. In the optical implementation of the neural network the interconnection weights are biased to yield a nonnegative weight matrix. Moreover, a threshold subchannel is added so that the system can realize, in real time, the bipolar weighted summation in a single channel. Preliminary experimental results obtained from the applications in associative memories and multitarget classification with rotation invariance are shown.
NASA Astrophysics Data System (ADS)
Villar, Xabier; Piso, Daniel; Bruguera, Javier D.
2014-02-01
This paper presents an FPGA implementation of an algorithm, previously published, for the the reconstruction of cosmic rays' trajectories and the determination of the time of arrival and velocity of the particles. The accuracy and precision issues of the algorithm have been analyzed to propose a suitable implementation. Thus, a 32-bit fixed-point format has been used for the representation of the data values. Moreover, the dependencies among the different operations have been taken into account to obtain a highly parallel and efficient hardware implementation. The final hardware architecture requires 18 cycles to process every particle, and has been exhaustively simulated to validate all the design decisions. The architecture has been mapped over different commercial FPGAs, with a frequency of operation ranging from 300 MHz to 1.3 GHz, depending on the FPGA being used. Consequently, the number of particle trajectories processed per second is between 16 million and 72 million. The high number of particle trajectories calculated per second shows that the proposed FPGA implementation might be used also in high rate environments such as those found in particle and nuclear physics experiments.
A triangle voting algorithm based on double feature constraints for star sensors
NASA Astrophysics Data System (ADS)
Fan, Qiaoyun; Zhong, Xuyang
2018-02-01
A novel autonomous star identification algorithm is presented in this study. In the proposed algorithm, each sensor star constructs multi-triangle with its bright neighbor stars and obtains its candidates by triangle voting process, in which the triangle is considered as the basic voting element. In order to accelerate the speed of this algorithm and reduce the required memory for star database, feature extraction is carried out to reduce the dimension of triangles and each triangle is described by its base and height. During the identification period, the voting scheme based on double feature constraints is proposed to implement triangle voting. This scheme guarantees that only the catalog star satisfying two features can vote for the sensor star, which improves the robustness towards false stars. The simulation and real star image test demonstrate that compared with the other two algorithms, the proposed algorithm is more robust towards position noise, magnitude noise and false stars.
Random Walk Quantum Clustering Algorithm Based on Space
NASA Astrophysics Data System (ADS)
Xiao, Shufen; Dong, Yumin; Ma, Hongyang
2018-01-01
In the random quantum walk, which is a quantum simulation of the classical walk, data points interacted when selecting the appropriate walk strategy by taking advantage of quantum-entanglement features; thus, the results obtained when the quantum walk is used are different from those when the classical walk is adopted. A new quantum walk clustering algorithm based on space is proposed by applying the quantum walk to clustering analysis. In this algorithm, data points are viewed as walking participants, and similar data points are clustered using the walk function in the pay-off matrix according to a certain rule. The walk process is simplified by implementing a space-combining rule. The proposed algorithm is validated by a simulation test and is proved superior to existing clustering algorithms, namely, Kmeans, PCA + Kmeans, and LDA-Km. The effects of some of the parameters in the proposed algorithm on its performance are also analyzed and discussed. Specific suggestions are provided.
A novel minimum cost maximum power algorithm for future smart home energy management.
Singaravelan, A; Kowsalya, M
2017-11-01
With the latest development of smart grid technology, the energy management system can be efficiently implemented at consumer premises. In this paper, an energy management system with wireless communication and smart meter are designed for scheduling the electric home appliances efficiently with an aim of reducing the cost and peak demand. For an efficient scheduling scheme, the appliances are classified into two types: uninterruptible and interruptible appliances. The problem formulation was constructed based on the practical constraints that make the proposed algorithm cope up with the real-time situation. The formulated problem was identified as Mixed Integer Linear Programming (MILP) problem, so this problem was solved by a step-wise approach. This paper proposes a novel Minimum Cost Maximum Power (MCMP) algorithm to solve the formulated problem. The proposed algorithm was simulated with input data available in the existing method. For validating the proposed MCMP algorithm, results were compared with the existing method. The compared results prove that the proposed algorithm efficiently reduces the consumer electricity consumption cost and peak demand to optimum level with 100% task completion without sacrificing the consumer comfort.
$n$ -Dimensional Discrete Cat Map Generation Using Laplace Expansions.
Wu, Yue; Hua, Zhongyun; Zhou, Yicong
2016-11-01
Different from existing methods that use matrix multiplications and have high computation complexity, this paper proposes an efficient generation method of n -dimensional ( [Formula: see text]) Cat maps using Laplace expansions. New parameters are also introduced to control the spatial configurations of the [Formula: see text] Cat matrix. Thus, the proposed method provides an efficient way to mix dynamics of all dimensions at one time. To investigate its implementations and applications, we further introduce a fast implementation algorithm of the proposed method with time complexity O(n 4 ) and a pseudorandom number generator using the Cat map generated by the proposed method. The experimental results show that, compared with existing generation methods, the proposed method has a larger parameter space and simpler algorithm complexity, generates [Formula: see text] Cat matrices with a lower inner correlation, and thus yields more random and unpredictable outputs of [Formula: see text] Cat maps.
Quantum computation with classical light: Implementation of the Deutsch-Jozsa algorithm
NASA Astrophysics Data System (ADS)
Perez-Garcia, Benjamin; McLaren, Melanie; Goyal, Sandeep K.; Hernandez-Aranda, Raul I.; Forbes, Andrew; Konrad, Thomas
2016-05-01
We propose an optical implementation of the Deutsch-Jozsa Algorithm using classical light in a binary decision-tree scheme. Our approach uses a ring cavity and linear optical devices in order to efficiently query the oracle functional values. In addition, we take advantage of the intrinsic Fourier transforming properties of a lens to read out whether the function given by the oracle is balanced or constant.
SLMRACE: a noise-free RACE implementation with reduced computational time
NASA Astrophysics Data System (ADS)
Chauvin, Juliet; Provenzi, Edoardo
2017-05-01
We present a faster and noise-free implementation of the RACE algorithm. RACE has mixed characteristics between the famous Retinex model of Land and McCann and the automatic color equalization (ACE) color-correction algorithm. The original random spray-based RACE implementation suffers from two main problems: its computational time and the presence of noise. Here, we will show that it is possible to adapt two techniques recently proposed by Banić et al. to the RACE framework in order to drastically decrease the computational time and noise generation. The implementation will be called smart-light-memory-RACE (SLMRACE).
QPSO-Based Adaptive DNA Computing Algorithm
Karakose, Mehmet; Cigdem, Ugur
2013-01-01
DNA (deoxyribonucleic acid) computing that is a new computation model based on DNA molecules for information storage has been increasingly used for optimization and data analysis in recent years. However, DNA computing algorithm has some limitations in terms of convergence speed, adaptability, and effectiveness. In this paper, a new approach for improvement of DNA computing is proposed. This new approach aims to perform DNA computing algorithm with adaptive parameters towards the desired goal using quantum-behaved particle swarm optimization (QPSO). Some contributions provided by the proposed QPSO based on adaptive DNA computing algorithm are as follows: (1) parameters of population size, crossover rate, maximum number of operations, enzyme and virus mutation rate, and fitness function of DNA computing algorithm are simultaneously tuned for adaptive process, (2) adaptive algorithm is performed using QPSO algorithm for goal-driven progress, faster operation, and flexibility in data, and (3) numerical realization of DNA computing algorithm with proposed approach is implemented in system identification. Two experiments with different systems were carried out to evaluate the performance of the proposed approach with comparative results. Experimental results obtained with Matlab and FPGA demonstrate ability to provide effective optimization, considerable convergence speed, and high accuracy according to DNA computing algorithm. PMID:23935409
A robust embedded vision system feasible white balance algorithm
NASA Astrophysics Data System (ADS)
Wang, Yuan; Yu, Feihong
2018-01-01
White balance is a very important part of the color image processing pipeline. In order to meet the need of efficiency and accuracy in embedded machine vision processing system, an efficient and robust white balance algorithm combining several classical ones is proposed. The proposed algorithm mainly has three parts. Firstly, in order to guarantee higher efficiency, an initial parameter calculated from the statistics of R, G and B components from raw data is used to initialize the following iterative method. After that, the bilinear interpolation algorithm is utilized to implement demosaicing procedure. Finally, an adaptive step adjustable scheme is introduced to ensure the controllability and robustness of the algorithm. In order to verify the proposed algorithm's performance on embedded vision system, a smart camera based on IMX6 DualLite, IMX291 and XC6130 is designed. Extensive experiments on a large amount of images under different color temperatures and exposure conditions illustrate that the proposed white balance algorithm avoids color deviation problem effectively, achieves a good balance between efficiency and quality, and is suitable for embedded machine vision processing system.
DNA Cryptography and Deep Learning using Genetic Algorithm with NW algorithm for Key Generation.
Kalsi, Shruti; Kaur, Harleen; Chang, Victor
2017-12-05
Cryptography is not only a science of applying complex mathematics and logic to design strong methods to hide data called as encryption, but also to retrieve the original data back, called decryption. The purpose of cryptography is to transmit a message between a sender and receiver such that an eavesdropper is unable to comprehend it. To accomplish this, not only we need a strong algorithm, but a strong key and a strong concept for encryption and decryption process. We have introduced a concept of DNA Deep Learning Cryptography which is defined as a technique of concealing data in terms of DNA sequence and deep learning. In the cryptographic technique, each alphabet of a letter is converted into a different combination of the four bases, namely; Adenine (A), Cytosine (C), Guanine (G) and Thymine (T), which make up the human deoxyribonucleic acid (DNA). Actual implementations with the DNA don't exceed laboratory level and are expensive. To bring DNA computing on a digital level, easy and effective algorithms are proposed in this paper. In proposed work we have introduced firstly, a method and its implementation for key generation based on the theory of natural selection using Genetic Algorithm with Needleman-Wunsch (NW) algorithm and Secondly, a method for implementation of encryption and decryption based on DNA computing using biological operations Transcription, Translation, DNA Sequencing and Deep Learning.
NASA Astrophysics Data System (ADS)
Moradi, Saed; Moallem, Payman; Sabahi, Mohamad Farzan
2018-03-01
False alarm rate and detection rate are still two contradictory metrics for infrared small target detection in an infrared search and track system (IRST), despite the development of new detection algorithms. In certain circumstances, not detecting true targets is more tolerable than detecting false items as true targets. Hence, considering background clutter and detector noise as the sources of the false alarm in an IRST system, in this paper, a false alarm aware methodology is presented to reduce false alarm rate while the detection rate remains undegraded. To this end, advantages and disadvantages of each detection algorithm are investigated and the sources of the false alarms are determined. Two target detection algorithms having independent false alarm sources are chosen in a way that the disadvantages of the one algorithm can be compensated by the advantages of the other one. In this work, multi-scale average absolute gray difference (AAGD) and Laplacian of point spread function (LoPSF) are utilized as the cornerstones of the desired algorithm of the proposed methodology. After presenting a conceptual model for the desired algorithm, it is implemented through the most straightforward mechanism. The desired algorithm effectively suppresses background clutter and eliminates detector noise. Also, since the input images are processed through just four different scales, the desired algorithm has good capability for real-time implementation. Simulation results in term of signal to clutter ratio and background suppression factor on real and simulated images prove the effectiveness and the performance of the proposed methodology. Since the desired algorithm was developed based on independent false alarm sources, our proposed methodology is expandable to any pair of detection algorithms which have different false alarm sources.
ERIC Educational Resources Information Center
Fuwa, Minori; Kayama, Mizue; Kunimune, Hisayoshi; Hashimoto, Masami; Asano, David K.
2015-01-01
We have explored educational methods for algorithmic thinking for novices and implemented a block programming editor and a simple learning management system. In this paper, we propose a program/algorithm complexity metric specified for novice learners. This metric is based on the variable usage in arithmetic and relational formulas in learner's…
FPGA implementation of low complexity LDPC iterative decoder
NASA Astrophysics Data System (ADS)
Verma, Shivani; Sharma, Sanjay
2016-07-01
Low-density parity-check (LDPC) codes, proposed by Gallager, emerged as a class of codes which can yield very good performance on the additive white Gaussian noise channel as well as on the binary symmetric channel. LDPC codes have gained lots of importance due to their capacity achieving property and excellent performance in the noisy channel. Belief propagation (BP) algorithm and its approximations, most notably min-sum, are popular iterative decoding algorithms used for LDPC and turbo codes. The trade-off between the hardware complexity and the decoding throughput is a critical factor in the implementation of the practical decoder. This article presents introduction to LDPC codes and its various decoding algorithms followed by realisation of LDPC decoder by using simplified message passing algorithm and partially parallel decoder architecture. Simplified message passing algorithm has been proposed for trade-off between low decoding complexity and decoder performance. It greatly reduces the routing and check node complexity of the decoder. Partially parallel decoder architecture possesses high speed and reduced complexity. The improved design of the decoder possesses a maximum symbol throughput of 92.95 Mbps and a maximum of 18 decoding iterations. The article presents implementation of 9216 bits, rate-1/2, (3, 6) LDPC decoder on Xilinx XC3D3400A device from Spartan-3A DSP family.
Customizing FP-growth algorithm to parallel mining with Charm++ library
NASA Astrophysics Data System (ADS)
Puścian, Marek
2017-08-01
This paper presents a frequent item mining algorithm that was customized to handle growing data repositories. The proposed solution applies Master Slave scheme to frequent pattern growth technique. Efficient utilization of available computation units is achieved by dynamic reallocation of tasks. Conditional frequent trees are assigned to parallel workers basing on their workload. Proposed enhancements have been successfully implemented using Charm++ library. This paper discusses results of the performance of parallelized FP-growth algorithm against different datasets. The approach has been illustrated with many experiments and measurements performed using multiprocessor and multithreaded computer.
Luo, Qiang; Yan, Zhuangzhi; Gu, Dongxing; Cao, Lei
This paper proposed an image interpolation algorithm based on bilinear interpolation and a color correction algorithm based on polynomial regression on FPGA, which focused on the limited number of imaging pixels and color distortion of the ultra-thin electronic endoscope. Simulation experiment results showed that the proposed algorithm realized the real-time display of 1280 x 720@60Hz HD video, and using the X-rite color checker as standard colors, the average color difference was reduced about 30% comparing with that before color correction.
MATSurv: multisensor air traffic surveillance system
NASA Astrophysics Data System (ADS)
Yeddanapudi, Murali; Bar-Shalom, Yaakov; Pattipati, Krishna R.; Gassner, Richard R.
1995-09-01
This paper deals with the design and implementation of MATSurv 1--an experimental Multisensor Air Traffic Surveillance system. The proposed system consists of a Kalman filter based state estimator used in conjunction with a 2D sliding window assignment algorithm. Real data from two FAA radars is used to evaluate the performance of this algorithm. The results indicate that the proposed algorithm provides a superior classification of the measurements into tracks (i.e., the most likely aircraft trajectories) when compared to the aircraft trajectories obtained using the measurement IDs (squawk or IFF code).
NASA Astrophysics Data System (ADS)
Gilbert, B. K.; Robb, R. A.; Chu, A.; Kenue, S. K.; Lent, A. H.; Swartzlander, E. E., Jr.
1981-02-01
Rapid advances during the past ten years of several forms of computer-assisted tomography (CT) have resulted in the development of numerous algorithms to convert raw projection data into cross-sectional images. These reconstruction algorithms are either 'iterative,' in which a large matrix algebraic equation is solved by successive approximation techniques; or 'closed form'. Continuing evolution of the closed form algorithms has allowed the newest versions to produce excellent reconstructed images in most applications. This paper will review several computer software and special-purpose digital hardware implementations of closed form algorithms, either proposed during the past several years by a number of workers or actually implemented in commercial or research CT scanners. The discussion will also cover a number of recently investigated algorithmic modifications which reduce the amount of computation required to execute the reconstruction process, as well as several new special-purpose digital hardware implementations under development in laboratories at the Mayo Clinic.
An efficient algorithm for function optimization: modified stem cells algorithm
NASA Astrophysics Data System (ADS)
Taherdangkoo, Mohammad; Paziresh, Mahsa; Yazdi, Mehran; Bagheri, Mohammad Hadi
2013-03-01
In this paper, we propose an optimization algorithm based on the intelligent behavior of stem cell swarms in reproduction and self-organization. Optimization algorithms, such as the Genetic Algorithm (GA), Particle Swarm Optimization (PSO) algorithm, Ant Colony Optimization (ACO) algorithm and Artificial Bee Colony (ABC) algorithm, can give solutions to linear and non-linear problems near to the optimum for many applications; however, in some case, they can suffer from becoming trapped in local optima. The Stem Cells Algorithm (SCA) is an optimization algorithm inspired by the natural behavior of stem cells in evolving themselves into new and improved cells. The SCA avoids the local optima problem successfully. In this paper, we have made small changes in the implementation of this algorithm to obtain improved performance over previous versions. Using a series of benchmark functions, we assess the performance of the proposed algorithm and compare it with that of the other aforementioned optimization algorithms. The obtained results prove the superiority of the Modified Stem Cells Algorithm (MSCA).
Design and implementation of a hybrid MPI-CUDA model for the Smith-Waterman algorithm.
Khaled, Heba; Faheem, Hossam El Deen Mostafa; El Gohary, Rania
2015-01-01
This paper provides a novel hybrid model for solving the multiple pair-wise sequence alignment problem combining message passing interface and CUDA, the parallel computing platform and programming model invented by NVIDIA. The proposed model targets homogeneous cluster nodes equipped with similar Graphical Processing Unit (GPU) cards. The model consists of the Master Node Dispatcher (MND) and the Worker GPU Nodes (WGN). The MND distributes the workload among the cluster working nodes and then aggregates the results. The WGN performs the multiple pair-wise sequence alignments using the Smith-Waterman algorithm. We also propose a modified implementation to the Smith-Waterman algorithm based on computing the alignment matrices row-wise. The experimental results demonstrate a considerable reduction in the running time by increasing the number of the working GPU nodes. The proposed model achieved a performance of about 12 Giga cell updates per second when we tested against the SWISS-PROT protein knowledge base running on four nodes.
NASA Astrophysics Data System (ADS)
Romano, Paul Kollath
Monte Carlo particle transport methods are being considered as a viable option for high-fidelity simulation of nuclear reactors. While Monte Carlo methods offer several potential advantages over deterministic methods, there are a number of algorithmic shortcomings that would prevent their immediate adoption for full-core analyses. In this thesis, algorithms are proposed both to ameliorate the degradation in parallel efficiency typically observed for large numbers of processors and to offer a means of decomposing large tally data that will be needed for reactor analysis. A nearest-neighbor fission bank algorithm was proposed and subsequently implemented in the OpenMC Monte Carlo code. A theoretical analysis of the communication pattern shows that the expected cost is O( N ) whereas traditional fission bank algorithms are O(N) at best. The algorithm was tested on two supercomputers, the Intrepid Blue Gene/P and the Titan Cray XK7, and demonstrated nearly linear parallel scaling up to 163,840 processor cores on a full-core benchmark problem. An algorithm for reducing network communication arising from tally reduction was analyzed and implemented in OpenMC. The proposed algorithm groups only particle histories on a single processor into batches for tally purposes---in doing so it prevents all network communication for tallies until the very end of the simulation. The algorithm was tested, again on a full-core benchmark, and shown to reduce network communication substantially. A model was developed to predict the impact of load imbalances on the performance of domain decomposed simulations. The analysis demonstrated that load imbalances in domain decomposed simulations arise from two distinct phenomena: non-uniform particle densities and non-uniform spatial leakage. The dominant performance penalty for domain decomposition was shown to come from these physical effects rather than insufficient network bandwidth or high latency. The model predictions were verified with measured data from simulations in OpenMC on a full-core benchmark problem. Finally, a novel algorithm for decomposing large tally data was proposed, analyzed, and implemented/tested in OpenMC. The algorithm relies on disjoint sets of compute processes and tally servers. The analysis showed that for a range of parameters relevant to LWR analysis, the tally server algorithm should perform with minimal overhead. Tests were performed on Intrepid and Titan and demonstrated that the algorithm did indeed perform well over a wide range of parameters. (Copies available exclusively from MIT Libraries, libraries.mit.edu/docs - docs mit.edu)
Fast parallel approach for 2-D DHT-based real-valued discrete Gabor transform.
Tao, Liang; Kwan, Hon Keung
2009-12-01
Two-dimensional fast Gabor transform algorithms are useful for real-time applications due to the high computational complexity of the traditional 2-D complex-valued discrete Gabor transform (CDGT). This paper presents two block time-recursive algorithms for 2-D DHT-based real-valued discrete Gabor transform (RDGT) and its inverse transform and develops a fast parallel approach for the implementation of the two algorithms. The computational complexity of the proposed parallel approach is analyzed and compared with that of the existing 2-D CDGT algorithms. The results indicate that the proposed parallel approach is attractive for real time image processing.
Feng, Yanqiu; Song, Yanli; Wang, Cong; Xin, Xuegang; Feng, Qianjin; Chen, Wufan
2013-10-01
To develop and test a new algorithm for fast direct Fourier transform (DrFT) reconstruction of MR data on non-Cartesian trajectories composed of lines with equally spaced points. The DrFT, which is normally used as a reference in evaluating the accuracy of other reconstruction methods, can reconstruct images directly from non-Cartesian MR data without interpolation. However, DrFT reconstruction involves substantially intensive computation, which makes the DrFT impractical for clinical routine applications. In this article, the Chirp transform algorithm was introduced to accelerate the DrFT reconstruction of radial and Periodically Rotated Overlapping ParallEL Lines with Enhanced Reconstruction (PROPELLER) MRI data located on the trajectories that are composed of lines with equally spaced points. The performance of the proposed Chirp transform algorithm-DrFT algorithm was evaluated by using simulation and in vivo MRI data. After implementing the algorithm on a graphics processing unit, the proposed Chirp transform algorithm-DrFT algorithm achieved an acceleration of approximately one order of magnitude, and the speed-up factor was further increased to approximately three orders of magnitude compared with the traditional single-thread DrFT reconstruction. Implementation the Chirp transform algorithm-DrFT algorithm on the graphics processing unit can efficiently calculate the DrFT reconstruction of the radial and PROPELLER MRI data. Copyright © 2012 Wiley Periodicals, Inc.
Cao, Jianfang; Chen, Lichao; Wang, Min; Tian, Yun
2018-01-01
The Canny operator is widely used to detect edges in images. However, as the size of the image dataset increases, the edge detection performance of the Canny operator decreases and its runtime becomes excessive. To improve the runtime and edge detection performance of the Canny operator, in this paper, we propose a parallel design and implementation for an Otsu-optimized Canny operator using a MapReduce parallel programming model that runs on the Hadoop platform. The Otsu algorithm is used to optimize the Canny operator's dual threshold and improve the edge detection performance, while the MapReduce parallel programming model facilitates parallel processing for the Canny operator to solve the processing speed and communication cost problems that occur when the Canny edge detection algorithm is applied to big data. For the experiments, we constructed datasets of different scales from the Pascal VOC2012 image database. The proposed parallel Otsu-Canny edge detection algorithm performs better than other traditional edge detection algorithms. The parallel approach reduced the running time by approximately 67.2% on a Hadoop cluster architecture consisting of 5 nodes with a dataset of 60,000 images. Overall, our approach system speeds up the system by approximately 3.4 times when processing large-scale datasets, which demonstrates the obvious superiority of our method. The proposed algorithm in this study demonstrates both better edge detection performance and improved time performance.
Adaptive Neuron Apoptosis for Accelerating Deep Learning on Large Scale Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Siegel, Charles M.; Daily, Jeffrey A.; Vishnu, Abhinav
Machine Learning and Data Mining (MLDM) algorithms are becoming ubiquitous in {\\em model learning} from the large volume of data generated using simulations, experiments and handheld devices. Deep Learning algorithms -- a class of MLDM algorithms -- are applied for automatic feature extraction, and learning non-linear models for unsupervised and supervised algorithms. Naturally, several libraries which support large scale Deep Learning -- such as TensorFlow and Caffe -- have become popular. In this paper, we present novel techniques to accelerate the convergence of Deep Learning algorithms by conducting low overhead removal of redundant neurons -- {\\em apoptosis} of neurons --more » which do not contribute to model learning, during the training phase itself. We provide in-depth theoretical underpinnings of our heuristics (bounding accuracy loss and handling apoptosis of several neuron types), and present the methods to conduct adaptive neuron apoptosis. We implement our proposed heuristics with the recently introduced TensorFlow and using its recently proposed extension with MPI. Our performance evaluation on two difference clusters -- one connected with Intel Haswell multi-core systems, and other with nVIDIA GPUs -- using InfiniBand, indicates the efficacy of the proposed heuristics and implementations. Specifically, we are able to improve the training time for several datasets by 2-3x, while reducing the number of parameters by 30x (4-5x on average) on datasets such as ImageNet classification. For the Higgs Boson dataset, our implementation improves the accuracy (measured by Area Under Curve (AUC)) for classification from 0.88/1 to 0.94/1, while reducing the number of parameters by 3x in comparison to existing literature, while achieving a 2.44x speedup in comparison to the default (no apoptosis) algorithm.« less
Qi, Xin; Xing, Fuyong; Foran, David J.; Yang, Lin
2013-01-01
Summary Background Automated analysis of imaged histopathology specimens could potentially provide support for improved reliability in detection and classification in a range of investigative and clinical cancer applications. Automated segmentation of cells in the digitized tissue microarray (TMA) is often the prerequisite for quantitative analysis. However overlapping cells usually bring significant challenges for traditional segmentation algorithms. Objectives In this paper, we propose a novel, automatic algorithm to separate overlapping cells in stained histology specimens acquired using bright-field RGB imaging. Methods It starts by systematically identifying salient regions of interest throughout the image based upon their underlying visual content. The segmentation algorithm subsequently performs a quick, voting based seed detection. Finally, the contour of each cell is obtained using a repulsive level set deformable model using the seeds generated in the previous step. We compared the experimental results with the most current literature, and the pixel wise accuracy between human experts' annotation and those generated using the automatic segmentation algorithm. Results The method is tested with 100 image patches which contain more than 1000 overlapping cells. The overall precision and recall of the developed algorithm is 90% and 78%, respectively. We also implement the algorithm on GPU. The parallel implementation is 22 times faster than its C/C++ sequential implementation. Conclusion The proposed overlapping cell segmentation algorithm can accurately detect the center of each overlapping cell and effectively separate each of the overlapping cells. GPU is proven to be an efficient parallel platform for overlapping cell segmentation. PMID:22526139
Accelerating k-NN Algorithm with Hybrid MPI and OpenSHMEM
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lin, Jian; Hamidouche, Khaled; Zheng, Jie
2015-08-05
Machine Learning algorithms are benefiting from the continuous improvement of programming models, including MPI, MapReduce and PGAS. k-Nearest Neighbors (k-NN) algorithm is a widely used machine learning algorithm, applied to supervised learning tasks such as classification. Several parallel implementations of k-NN have been proposed in the literature and practice. However, on high-performance computing systems with high-speed interconnects, it is important to further accelerate existing designs of the k-NN algorithm through taking advantage of scalable programming models. To improve the performance of k-NN on large-scale environment with InfiniBand network, this paper proposes several alternative hybrid MPI+OpenSHMEM designs and performs a systemicmore » evaluation and analysis on typical workloads. The hybrid designs leverage the one-sided memory access to better overlap communication with computation than the existing pure MPI design, and propose better schemes for efficient buffer management. The implementation based on k-NN program from MaTEx with MVAPICH2-X (Unified MPI+PGAS Communication Runtime over InfiniBand) shows up to 9.0% time reduction for training KDD Cup 2010 workload over 512 cores, and 27.6% time reduction for small workload with balanced communication and computation. Experiments of running with varied number of cores show that our design can maintain good scalability.« less
A Low Cost VLSI Architecture for Spike Sorting Based on Feature Extraction with Peak Search.
Chang, Yuan-Jyun; Hwang, Wen-Jyi; Chen, Chih-Chang
2016-12-07
The goal of this paper is to present a novel VLSI architecture for spike sorting with high classification accuracy, low area costs and low power consumption. A novel feature extraction algorithm with low computational complexities is proposed for the design of the architecture. In the feature extraction algorithm, a spike is separated into two portions based on its peak value. The area of each portion is then used as a feature. The algorithm is simple to implement and less susceptible to noise interference. Based on the algorithm, a novel architecture capable of identifying peak values and computing spike areas concurrently is proposed. To further accelerate the computation, a spike can be divided into a number of segments for the local feature computation. The local features are subsequently merged with the global ones by a simple hardware circuit. The architecture can also be easily operated in conjunction with the circuits for commonly-used spike detection algorithms, such as the Non-linear Energy Operator (NEO). The architecture has been implemented by an Application-Specific Integrated Circuit (ASIC) with 90-nm technology. Comparisons to the existing works show that the proposed architecture is well suited for real-time multi-channel spike detection and feature extraction requiring low hardware area costs, low power consumption and high classification accuracy.
Evolvable Neuronal Paths: A Novel Basis for Information and Search in the Brain
Fernando, Chrisantha; Vasas, Vera; Szathmáry, Eörs; Husbands, Phil
2011-01-01
We propose a previously unrecognized kind of informational entity in the brain that is capable of acting as the basis for unlimited hereditary variation in neuronal networks. This unit is a path of activity through a network of neurons, analogous to a path taken through a hidden Markov model. To prove in principle the capabilities of this new kind of informational substrate, we show how a population of paths can be used as the hereditary material for a neuronally implemented genetic algorithm, (the swiss-army knife of black-box optimization techniques) which we have proposed elsewhere could operate at somatic timescales in the brain. We compare this to the same genetic algorithm that uses a standard ‘genetic’ informational substrate, i.e. non-overlapping discrete genotypes, on a range of optimization problems. A path evolution algorithm (PEA) is defined as any algorithm that implements natural selection of paths in a network substrate. A PEA is a previously unrecognized type of natural selection that is well suited for implementation by biological neuronal networks with structural plasticity. The important similarities and differences between a standard genetic algorithm and a PEA are considered. Whilst most experiments are conducted on an abstract network model, at the conclusion of the paper a slightly more realistic neuronal implementation of a PEA is outlined based on Izhikevich spiking neurons. Finally, experimental predictions are made for the identification of such informational paths in the brain. PMID:21887266
Jiang, Ailian; Zheng, Lihong
2018-03-29
Low cost, high reliability and easy maintenance are key criteria in the design of routing protocols for wireless sensor networks (WSNs). This paper investigates the existing ant colony optimization (ACO)-based WSN routing algorithms and the minimum hop count WSN routing algorithms by reviewing their strengths and weaknesses. We also consider the critical factors of WSNs, such as energy constraint of sensor nodes, network load balancing and dynamic network topology. Then we propose a hybrid routing algorithm that integrates ACO and a minimum hop count scheme. The proposed algorithm is able to find the optimal routing path with minimal total energy consumption and balanced energy consumption on each node. The algorithm has unique superiority in terms of searching for the optimal path, balancing the network load and the network topology maintenance. The WSN model and the proposed algorithm have been implemented using C++. Extensive simulation experimental results have shown that our algorithm outperforms several other WSN routing algorithms on such aspects that include the rate of convergence, the success rate in searching for global optimal solution, and the network lifetime.
2018-01-01
Low cost, high reliability and easy maintenance are key criteria in the design of routing protocols for wireless sensor networks (WSNs). This paper investigates the existing ant colony optimization (ACO)-based WSN routing algorithms and the minimum hop count WSN routing algorithms by reviewing their strengths and weaknesses. We also consider the critical factors of WSNs, such as energy constraint of sensor nodes, network load balancing and dynamic network topology. Then we propose a hybrid routing algorithm that integrates ACO and a minimum hop count scheme. The proposed algorithm is able to find the optimal routing path with minimal total energy consumption and balanced energy consumption on each node. The algorithm has unique superiority in terms of searching for the optimal path, balancing the network load and the network topology maintenance. The WSN model and the proposed algorithm have been implemented using C++. Extensive simulation experimental results have shown that our algorithm outperforms several other WSN routing algorithms on such aspects that include the rate of convergence, the success rate in searching for global optimal solution, and the network lifetime. PMID:29596336
Goal-oriented sensitivity analysis for lattice kinetic Monte Carlo simulations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Arampatzis, Georgios, E-mail: garab@math.uoc.gr; Department of Mathematics and Statistics, University of Massachusetts, Amherst, Massachusetts 01003; Katsoulakis, Markos A., E-mail: markos@math.umass.edu
2014-03-28
In this paper we propose a new class of coupling methods for the sensitivity analysis of high dimensional stochastic systems and in particular for lattice Kinetic Monte Carlo (KMC). Sensitivity analysis for stochastic systems is typically based on approximating continuous derivatives with respect to model parameters by the mean value of samples from a finite difference scheme. Instead of using independent samples the proposed algorithm reduces the variance of the estimator by developing a strongly correlated-“coupled”- stochastic process for both the perturbed and unperturbed stochastic processes, defined in a common state space. The novelty of our construction is that themore » new coupled process depends on the targeted observables, e.g., coverage, Hamiltonian, spatial correlations, surface roughness, etc., hence we refer to the proposed method as goal-oriented sensitivity analysis. In particular, the rates of the coupled Continuous Time Markov Chain are obtained as solutions to a goal-oriented optimization problem, depending on the observable of interest, by considering the minimization functional of the corresponding variance. We show that this functional can be used as a diagnostic tool for the design and evaluation of different classes of couplings. Furthermore, the resulting KMC sensitivity algorithm has an easy implementation that is based on the Bortz–Kalos–Lebowitz algorithm's philosophy, where events are divided in classes depending on level sets of the observable of interest. Finally, we demonstrate in several examples including adsorption, desorption, and diffusion Kinetic Monte Carlo that for the same confidence interval and observable, the proposed goal-oriented algorithm can be two orders of magnitude faster than existing coupling algorithms for spatial KMC such as the Common Random Number approach. We also provide a complete implementation of the proposed sensitivity analysis algorithms, including various spatial KMC examples, in a supplementary MATLAB source code.« less
Gao, Ying; Wkram, Chris Hadri; Duan, Jiajie; Chou, Jarong
2015-01-01
In order to prolong the network lifetime, energy-efficient protocols adapted to the features of wireless sensor networks should be used. This paper explores in depth the nature of heterogeneous wireless sensor networks, and finally proposes an algorithm to address the problem of finding an effective pathway for heterogeneous clustering energy. The proposed algorithm implements cluster head selection according to the degree of energy attenuation during the network’s running and the degree of candidate nodes’ effective coverage on the whole network, so as to obtain an even energy consumption over the whole network for the situation with high degree of coverage. Simulation results show that the proposed clustering protocol has better adaptability to heterogeneous environments than existing clustering algorithms in prolonging the network lifetime. PMID:26690440
A novel orthoimage mosaic method using a weighted A∗ algorithm - Implementation and evaluation
NASA Astrophysics Data System (ADS)
Zheng, Maoteng; Xiong, Xiaodong; Zhu, Junfeng
2018-04-01
The implementation and evaluation of a weighted A∗ algorithm for orthoimage mosaic with UAV (Unmanned Aircraft Vehicle) imagery is proposed. The initial seam-line network is firstly generated by standard Voronoi Diagram algorithm; an edge diagram is generated based on DSM (Digital Surface Model) data; the vertices (conjunction nodes of seam-lines) of the initial network are relocated if they are on high objects (buildings, trees and other artificial structures); and the initial seam-lines are refined using the weighted A∗ algorithm based on the edge diagram and the relocated vertices. Our method was tested with three real UAV datasets. Two quantitative terms are introduced to evaluate the results of the proposed method. Preliminary results show that the method is suitable for regular and irregular aligned UAV images for most terrain types (flat or mountainous areas), and is better than the state-of-the-art method in both quality and efficiency based on the test datasets.
NASA Technical Reports Server (NTRS)
Collins, Jeffery D.; Volakis, John L.; Jin, Jian-Ming
1990-01-01
A new technique is presented for computing the scattering by 2-D structures of arbitrary composition. The proposed solution approach combines the usual finite element method with the boundary-integral equation to formulate a discrete system. This is subsequently solved via the conjugate gradient (CG) algorithm. A particular characteristic of the method is the use of rectangular boundaries to enclose the scatterer. Several of the resulting boundary integrals are therefore convolutions and may be evaluated via the fast Fourier transform (FFT) in the implementation of the CG algorithm. The solution approach offers the principal advantage of having O(N) memory demand and employs a 1-D FFT versus a 2-D FFT as required with a traditional implementation of the CGFFT algorithm. The speed of the proposed solution method is compared with that of the traditional CGFFT algorithm, and results for rectangular bodies are given and shown to be in excellent agreement with the moment method.
NASA Technical Reports Server (NTRS)
Collins, Jeffery D.; Volakis, John L.
1989-01-01
A new technique is presented for computing the scattering by 2-D structures of arbitrary composition. The proposed solution approach combines the usual finite element method with the boundary integral equation to formulate a discrete system. This is subsequently solved via the conjugate gradient (CG) algorithm. A particular characteristic of the method is the use of rectangular boundaries to enclose the scatterer. Several of the resulting boundary integrals are therefore convolutions and may be evaluated via the fast Fourier transform (FFT) in the implementation of the CG algorithm. The solution approach offers the principle advantage of having O(N) memory demand and employs a 1-D FFT versus a 2-D FFT as required with a traditional implementation of the CGFFT algorithm. The speed of the proposed solution method is compared with that of the traditional CGFFT algorithm, and results for rectangular bodies are given and shown to be in excellent agreement with the moment method.
An implementation of the look-ahead Lanczos algorithm for non-Hermitian matrices
NASA Technical Reports Server (NTRS)
Freund, Roland W.; Gutknecht, Martin H.; Nachtigal, Noel M.
1991-01-01
The nonsymmetric Lanczos method can be used to compute eigenvalues of large sparse non-Hermitian matrices or to solve large sparse non-Hermitian linear systems. However, the original Lanczos algorithm is susceptible to possible breakdowns and potential instabilities. An implementation is presented of a look-ahead version of the Lanczos algorithm that, except for the very special situation of an incurable breakdown, overcomes these problems by skipping over those steps in which a breakdown or near-breakdown would occur in the standard process. The proposed algorithm can handle look-ahead steps of any length and requires the same number of matrix-vector products and inner products as the standard Lanczos process without look-ahead.
Particle swarm optimization based space debris surveillance network scheduling
NASA Astrophysics Data System (ADS)
Jiang, Hai; Liu, Jing; Cheng, Hao-Wen; Zhang, Yao
2017-02-01
The increasing number of space debris has created an orbital debris environment that poses increasing impact risks to existing space systems and human space flights. For the safety of in-orbit spacecrafts, we should optimally schedule surveillance tasks for the existing facilities to allocate resources in a manner that most significantly improves the ability to predict and detect events involving affected spacecrafts. This paper analyzes two criteria that mainly affect the performance of a scheduling scheme and introduces an artificial intelligence algorithm into the scheduling of tasks of the space debris surveillance network. A new scheduling algorithm based on the particle swarm optimization algorithm is proposed, which can be implemented in two different ways: individual optimization and joint optimization. Numerical experiments with multiple facilities and objects are conducted based on the proposed algorithm, and simulation results have demonstrated the effectiveness of the proposed algorithm.
Computationally efficient algorithm for high sampling-frequency operation of active noise control
NASA Astrophysics Data System (ADS)
Rout, Nirmal Kumar; Das, Debi Prasad; Panda, Ganapati
2015-05-01
In high sampling-frequency operation of active noise control (ANC) system the length of the secondary path estimate and the ANC filter are very long. This increases the computational complexity of the conventional filtered-x least mean square (FXLMS) algorithm. To reduce the computational complexity of long order ANC system using FXLMS algorithm, frequency domain block ANC algorithms have been proposed in past. These full block frequency domain ANC algorithms are associated with some disadvantages such as large block delay, quantization error due to computation of large size transforms and implementation difficulties in existing low-end DSP hardware. To overcome these shortcomings, the partitioned block ANC algorithm is newly proposed where the long length filters in ANC are divided into a number of equal partitions and suitably assembled to perform the FXLMS algorithm in the frequency domain. The complexity of this proposed frequency domain partitioned block FXLMS (FPBFXLMS) algorithm is quite reduced compared to the conventional FXLMS algorithm. It is further reduced by merging one fast Fourier transform (FFT)-inverse fast Fourier transform (IFFT) combination to derive the reduced structure FPBFXLMS (RFPBFXLMS) algorithm. Computational complexity analysis for different orders of filter and partition size are presented. Systematic computer simulations are carried out for both the proposed partitioned block ANC algorithms to show its accuracy compared to the time domain FXLMS algorithm.
An embedded implementation based on adaptive filter bank for brain-computer interface systems.
Belwafi, Kais; Romain, Olivier; Gannouni, Sofien; Ghaffari, Fakhreddine; Djemal, Ridha; Ouni, Bouraoui
2018-07-15
Brain-computer interface (BCI) is a new communication pathway for users with neurological deficiencies. The implementation of a BCI system requires complex electroencephalography (EEG) signal processing including filtering, feature extraction and classification algorithms. Most of current BCI systems are implemented on personal computers. Therefore, there is a great interest in implementing BCI on embedded platforms to meet system specifications in terms of time response, cost effectiveness, power consumption, and accuracy. This article presents an embedded-BCI (EBCI) system based on a Stratix-IV field programmable gate array. The proposed system relays on the weighted overlap-add (WOLA) algorithm to perform dynamic filtering of EEG-signals by analyzing the event-related desynchronization/synchronization (ERD/ERS). The EEG-signals are classified, using the linear discriminant analysis algorithm, based on their spatial features. The proposed system performs fast classification within a time delay of 0.430 s/trial, achieving an average accuracy of 76.80% according to an offline approach and 80.25% using our own recording. The estimated power consumption of the prototype is approximately 0.7 W. Results show that the proposed EBCI system reduces the overall classification error rate for the three datasets of the BCI-competition by 5% compared to other similar implementations. Moreover, experiment shows that the proposed system maintains a high accuracy rate with a short processing time, a low power consumption, and a low cost. Performing dynamic filtering of EEG-signals using WOLA increases the recognition rate of ERD/ERS patterns of motor imagery brain activity. This approach allows to develop a complete prototype of a EBCI system that achieves excellent accuracy rates. Copyright © 2018 Elsevier B.V. All rights reserved.
Symmetric nonnegative matrix factorization: algorithms and applications to probabilistic clustering.
He, Zhaoshui; Xie, Shengli; Zdunek, Rafal; Zhou, Guoxu; Cichocki, Andrzej
2011-12-01
Nonnegative matrix factorization (NMF) is an unsupervised learning method useful in various applications including image processing and semantic analysis of documents. This paper focuses on symmetric NMF (SNMF), which is a special case of NMF decomposition. Three parallel multiplicative update algorithms using level 3 basic linear algebra subprograms directly are developed for this problem. First, by minimizing the Euclidean distance, a multiplicative update algorithm is proposed, and its convergence under mild conditions is proved. Based on it, we further propose another two fast parallel methods: α-SNMF and β -SNMF algorithms. All of them are easy to implement. These algorithms are applied to probabilistic clustering. We demonstrate their effectiveness for facial image clustering, document categorization, and pattern clustering in gene expression.
Research and implementation of finger-vein recognition algorithm
NASA Astrophysics Data System (ADS)
Pang, Zengyao; Yang, Jie; Chen, Yilei; Liu, Yin
2017-06-01
In finger vein image preprocessing, finger angle correction and ROI extraction are important parts of the system. In this paper, we propose an angle correction algorithm based on the centroid of the vein image, and extract the ROI region according to the bidirectional gray projection method. Inspired by the fact that features in those vein areas have similar appearance as valleys, a novel method was proposed to extract center and width of palm vein based on multi-directional gradients, which is easy-computing, quick and stable. On this basis, an encoding method was designed to determine the gray value distribution of texture image. This algorithm could effectively overcome the edge of the texture extraction error. Finally, the system was equipped with higher robustness and recognition accuracy by utilizing fuzzy threshold determination and global gray value matching algorithm. Experimental results on pairs of matched palm images show that, the proposed method has a EER with 3.21% extracts features at the speed of 27ms per image. It can be concluded that the proposed algorithm has obvious advantages in grain extraction efficiency, matching accuracy and algorithm efficiency.
Multi-jagged: A scalable parallel spatial partitioning algorithm
Deveci, Mehmet; Rajamanickam, Sivasankaran; Devine, Karen D.; ...
2015-03-18
Geometric partitioning is fast and effective for load-balancing dynamic applications, particularly those requiring geometric locality of data (particle methods, crash simulations). We present, to our knowledge, the first parallel implementation of a multidimensional-jagged geometric partitioner. In contrast to the traditional recursive coordinate bisection algorithm (RCB), which recursively bisects subdomains perpendicular to their longest dimension until the desired number of parts is obtained, our algorithm does recursive multi-section with a given number of parts in each dimension. By computing multiple cut lines concurrently and intelligently deciding when to migrate data while computing the partition, we minimize data movement compared to efficientmore » implementations of recursive bisection. We demonstrate the algorithm's scalability and quality relative to the RCB implementation in Zoltan on both real and synthetic datasets. Our experiments show that the proposed algorithm performs and scales better than RCB in terms of run-time without degrading the load balance. Lastly, our implementation partitions 24 billion points into 65,536 parts within a few seconds and exhibits near perfect weak scaling up to 6K cores.« less
Loukriz, Abdelhamid; Haddadi, Mourad; Messalti, Sabir
2016-05-01
Improvement of the efficiency of photovoltaic system based on new maximum power point tracking (MPPT) algorithms is the most promising solution due to its low cost and its easy implementation without equipment updating. Many MPPT methods with fixed step size have been developed. However, when atmospheric conditions change rapidly , the performance of conventional algorithms is reduced. In this paper, a new variable step size Incremental Conductance IC MPPT algorithm has been proposed. Modeling and simulation of different operational conditions of conventional Incremental Conductance IC and proposed methods are presented. The proposed method was developed and tested successfully on a photovoltaic system based on Flyback converter and control circuit using dsPIC30F4011. Both, simulation and experimental design are provided in several aspects. A comparative study between the proposed variable step size and fixed step size IC MPPT method under similar operating conditions is presented. The obtained results demonstrate the efficiency of the proposed MPPT algorithm in terms of speed in MPP tracking and accuracy. Copyright © 2015 ISA. Published by Elsevier Ltd. All rights reserved.
A new improved artificial bee colony algorithm for ship hull form optimization
NASA Astrophysics Data System (ADS)
Huang, Fuxin; Wang, Lijue; Yang, Chi
2016-04-01
The artificial bee colony (ABC) algorithm is a relatively new swarm intelligence-based optimization algorithm. Its simplicity of implementation, relatively few parameter settings and promising optimization capability make it widely used in different fields. However, it has problems of slow convergence due to its solution search equation. Here, a new solution search equation based on a combination of the elite solution pool and the block perturbation scheme is proposed to improve the performance of the algorithm. In addition, two different solution search equations are used by employed bees and onlooker bees to balance the exploration and exploitation of the algorithm. The developed algorithm is validated by a set of well-known numerical benchmark functions. It is then applied to optimize two ship hull forms with minimum resistance. The tested results show that the proposed new improved ABC algorithm can outperform the ABC algorithm in most of the tested problems.
Effective algorithm for routing integral structures with twolayer switching
NASA Astrophysics Data System (ADS)
Nazarov, A. V.; Shakhnov, V. A.; Vlasov, A. I.; Novikov, A. N.
2018-05-01
The paper presents an algorithm for routing switching objects such as large-scale integrated circuits (LSICs) with two layers of metallization, embossed printed circuit boards, microboards with pairs of wiring layers on each side, and other similar constructs. The algorithm allows eliminating the effect of mutual blocking of routes in the classical wave algorithm by implementing a special circuit of digital wave motion in two layers of metallization, allowing direct intersections of all circuit conductors in a combined layer. However, information about the belonging of the topology elements to the circuits is sufficient for layering and minimizing the number of contact holes. In addition, the paper presents a specific example which shows that, in contrast to the known routing algorithms using a wave model, just one byte of memory per discrete of the work field is sufficient to implement the proposed algorithm.
Extension of the firefly algorithm and preference rules for solving MINLP problems
NASA Astrophysics Data System (ADS)
Costa, M. Fernanda P.; Francisco, Rogério B.; Rocha, Ana Maria A. C.; Fernandes, Edite M. G. P.
2017-07-01
An extension of the firefly algorithm (FA) for solving mixed-integer nonlinear programming (MINLP) problems is presented. Although penalty functions are nowadays frequently used to handle integrality conditions and inequality and equality constraints, this paper proposes the implementation within the FA of a simple rounded-based heuristic and four preference rules to find and converge to MINLP feasible solutions. Preliminary numerical experiments are carried out to validate the proposed methodology.
Memory Management of Multimedia Services in Smart Homes
NASA Astrophysics Data System (ADS)
Kamel, Ibrahim; Muhaureq, Sanaa A.
Nowadays there is a wide spectrum of applications that run in smart home environments. Consequently, home gateway, which is a central component in the smart home, must manage many applications despite limited memory resources. OSGi is a middleware standard for home gateways. OSGi models services as dependent components. Moreover, these applications might differ in their importance. Services collaborate and complement each other to achieve the required results. This paper addresses the following problem: given a home gateway that hosts several applications with different priorities and arbitrary dependencies among them. When the gateway runs out of memory, which application or service will be stopped or kicked out of memory to start a new service. Note that stopping a given service means that all the services that depend on it will be stopped too. Because of the service dependencies, traditional memory management techniques, in the operating system literatures might not be efficient. Our goal is to stop the least important and the least number of services. The paper presents a novel algorithm for home gateway memory management. The proposed algorithm takes into consideration the priority of the application and dependencies between different services, in addition to the amount of memory occupied by each service. We implement the proposed algorithm and performed many experiments to evaluate its performance and execution time. The proposed algorithm is implemented as a part of the OSGi framework (Open Service Gateway initiative). We used best fit and worst fit as yardstick to show the effectiveness of the proposed algorithm.
NASA Astrophysics Data System (ADS)
Chen, Xiao; Li, Yaan; Yu, Jing; Li, Yuxing
2018-01-01
For fast and more effective implementation of tracking multiple targets in a cluttered environment, we propose a multiple targets tracking (MTT) algorithm called maximum entropy fuzzy c-means clustering joint probabilistic data association that combines fuzzy c-means clustering and the joint probabilistic data association (PDA) algorithm. The algorithm uses the membership value to express the probability of the target originating from measurement. The membership value is obtained through fuzzy c-means clustering objective function optimized by the maximum entropy principle. When considering the effect of the public measurement, we use a correction factor to adjust the association probability matrix to estimate the state of the target. As this algorithm avoids confirmation matrix splitting, it can solve the high computational load problem of the joint PDA algorithm. The results of simulations and analysis conducted for tracking neighbor parallel targets and cross targets in a different density cluttered environment show that the proposed algorithm can realize MTT quickly and efficiently in a cluttered environment. Further, the performance of the proposed algorithm remains constant with increasing process noise variance. The proposed algorithm has the advantages of efficiency and low computational load, which can ensure optimum performance when tracking multiple targets in a dense cluttered environment.
Virtual Network Embedding via Monte Carlo Tree Search.
Haeri, Soroush; Trajkovic, Ljiljana
2018-02-01
Network virtualization helps overcome shortcomings of the current Internet architecture. The virtualized network architecture enables coexistence of multiple virtual networks (VNs) on an existing physical infrastructure. VN embedding (VNE) problem, which deals with the embedding of VN components onto a physical network, is known to be -hard. In this paper, we propose two VNE algorithms: MaVEn-M and MaVEn-S. MaVEn-M employs the multicommodity flow algorithm for virtual link mapping while MaVEn-S uses the shortest-path algorithm. They formalize the virtual node mapping problem by using the Markov decision process (MDP) framework and devise action policies (node mappings) for the proposed MDP using the Monte Carlo tree search algorithm. Service providers may adjust the execution time of the MaVEn algorithms based on the traffic load of VN requests. The objective of the algorithms is to maximize the profit of infrastructure providers. We develop a discrete event VNE simulator to implement and evaluate performance of MaVEn-M, MaVEn-S, and several recently proposed VNE algorithms. We introduce profitability as a new performance metric that captures both acceptance and revenue to cost ratios. Simulation results show that the proposed algorithms find more profitable solutions than the existing algorithms. Given additional computation time, they further improve embedding solutions.
NASA Technical Reports Server (NTRS)
Choudhary, Alok Nidhi; Leung, Mun K.; Huang, Thomas S.; Patel, Janak H.
1989-01-01
Computer vision systems employ a sequence of vision algorithms in which the output of an algorithm is the input of the next algorithm in the sequence. Algorithms that constitute such systems exhibit vastly different computational characteristics, and therefore, require different data decomposition techniques and efficient load balancing techniques for parallel implementation. However, since the input data for a task is produced as the output data of the previous task, this information can be exploited to perform knowledge based data decomposition and load balancing. Presented here are algorithms for a motion estimation system. The motion estimation is based on the point correspondence between the involved images which are a sequence of stereo image pairs. Researchers propose algorithms to obtain point correspondences by matching feature points among stereo image pairs at any two consecutive time instants. Furthermore, the proposed algorithms employ non-iterative procedures, which results in saving considerable amounts of computation time. The system consists of the following steps: (1) extraction of features; (2) stereo match of images in one time instant; (3) time match of images from consecutive time instants; (4) stereo match to compute final unambiguous points; and (5) computation of motion parameters.
Multi-scale graph-cut algorithm for efficient water-fat separation.
Berglund, Johan; Skorpil, Mikael
2017-09-01
To improve the accuracy and robustness to noise in water-fat separation by unifying the multiscale and graph cut based approaches to B 0 -correction. A previously proposed water-fat separation algorithm that corrects for B 0 field inhomogeneity in 3D by a single quadratic pseudo-Boolean optimization (QPBO) graph cut was incorporated into a multi-scale framework, where field map solutions are propagated from coarse to fine scales for voxels that are not resolved by the graph cut. The accuracy of the single-scale and multi-scale QPBO algorithms was evaluated against benchmark reference datasets. The robustness to noise was evaluated by adding noise to the input data prior to water-fat separation. Both algorithms achieved the highest accuracy when compared with seven previously published methods, while computation times were acceptable for implementation in clinical routine. The multi-scale algorithm was more robust to noise than the single-scale algorithm, while causing only a small increase (+10%) of the reconstruction time. The proposed 3D multi-scale QPBO algorithm offers accurate water-fat separation, robustness to noise, and fast reconstruction. The software implementation is freely available to the research community. Magn Reson Med 78:941-949, 2017. © 2016 International Society for Magnetic Resonance in Medicine. © 2016 International Society for Magnetic Resonance in Medicine.
Apriori Versions Based on MapReduce for Mining Frequent Patterns on Big Data.
Luna, Jose Maria; Padillo, Francisco; Pechenizkiy, Mykola; Ventura, Sebastian
2017-09-27
Pattern mining is one of the most important tasks to extract meaningful and useful information from raw data. This task aims to extract item-sets that represent any type of homogeneity and regularity in data. Although many efficient algorithms have been developed in this regard, the growing interest in data has caused the performance of existing pattern mining techniques to be dropped. The goal of this paper is to propose new efficient pattern mining algorithms to work in big data. To this aim, a series of algorithms based on the MapReduce framework and the Hadoop open-source implementation have been proposed. The proposed algorithms can be divided into three main groups. First, two algorithms [Apriori MapReduce (AprioriMR) and iterative AprioriMR] with no pruning strategy are proposed, which extract any existing item-set in data. Second, two algorithms (space pruning AprioriMR and top AprioriMR) that prune the search space by means of the well-known anti-monotone property are proposed. Finally, a last algorithm (maximal AprioriMR) is also proposed for mining condensed representations of frequent patterns. To test the performance of the proposed algorithms, a varied collection of big data datasets have been considered, comprising up to 3 · 10#x00B9;⁸ transactions and more than 5 million of distinct single-items. The experimental stage includes comparisons against highly efficient and well-known pattern mining algorithms. Results reveal the interest of applying MapReduce versions when complex problems are considered, and also the unsuitability of this paradigm when dealing with small data.
Enhancement of event related potentials by iterative restoration algorithms
NASA Astrophysics Data System (ADS)
Pomalaza-Raez, Carlos A.; McGillem, Clare D.
1986-12-01
An iterative procedure for the restoration of event related potentials (ERP) is proposed and implemented. The method makes use of assumed or measured statistical information about latency variations in the individual ERP components. The signal model used for the restoration algorithm consists of a time-varying linear distortion and a positivity/negativity constraint. Additional preprocessing in the form of low-pass filtering is needed in order to mitigate the effects of additive noise. Numerical results obtained with real data show clearly the presence of enhanced and regenerated components in the restored ERP's. The procedure is easy to implement which makes it convenient when compared to other proposed techniques for the restoration of ERP signals.
Parallel optoelectronic trinary signed-digit division
NASA Astrophysics Data System (ADS)
Alam, Mohammad S.
1999-03-01
The trinary signed-digit (TSD) number system has been found to be very useful for parallel addition and subtraction of any arbitrary length operands in constant time. Using the TSD addition and multiplication modules as the basic building blocks, we develop an efficient algorithm for performing parallel TSD division in constant time. The proposed division technique uses one TSD subtraction and two TSD multiplication steps. An optoelectronic correlator based architecture is suggested for implementation of the proposed TSD division algorithm, which fully exploits the parallelism and high processing speed of optics. An efficient spatial encoding scheme is used to ensure better utilization of space bandwidth product of the spatial light modulators used in the optoelectronic implementation.
Hou, Yi-You
2017-09-01
This article addresses an evolutionary programming (EP) algorithm technique-based and proportional-integral-derivative (PID) control methods are established to guarantee synchronization of the master and slave Rikitake chaotic systems. For PID synchronous control, the evolutionary programming (EP) algorithm is used to find the optimal PID controller parameters k p , k i , k d by integrated absolute error (IAE) method for the convergence conditions. In order to verify the system performance, the basic electronic components containing operational amplifiers (OPAs), resistors, and capacitors are used to implement the proposed chaotic Rikitake systems. Finally, the experimental results validate the proposed Rikitake chaotic synchronization approach. Copyright © 2017. Published by Elsevier Ltd.
Time-delayed chameleon: Analysis, synchronization and FPGA implementation
NASA Astrophysics Data System (ADS)
Rajagopal, Karthikeyan; Jafari, Sajad; Laarem, Guessas
2017-12-01
In this paper we report a time-delayed chameleon-like chaotic system which can belong to different families of chaotic attractors depending on the choices of parameters. Such a characteristic of self-excited and hidden chaotic flows in a simple 3D system with time delay has not been reported earlier. Dynamic analysis of the proposed time-delayed systems are analysed in time-delay space and parameter space. A novel adaptive modified functional projective lag synchronization algorithm is derived for synchronizing identical time-delayed chameleon systems with uncertain parameters. The proposed time-delayed systems and the synchronization algorithm with controllers and parameter estimates are then implemented in FPGA using hardware-software co-simulation and the results are presented.
Model for the design of distributed data bases
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ram, S.
This research focuses on developing a model to solve the File Allocation Problem (FAP). The model integrates two major design issues, namely Concurrently Control and Data Distribution. The central node locking mechanism is incorporated in developing a nonlinear integer programming model. Two solution algorithms are proposed, one of which was implemented in FORTRAN.V. The allocation of data bases and programs are examined using this heuristic. Several decision rules were also formulated based on the results of the heuristic. A second more comprehensive heuristic was proposed, based on the knapsack problem. The development and implementation of this algorithm has been leftmore » as a topic for future research.« less
Stream-based Hebbian eigenfilter for real-time neuronal spike discrimination
2012-01-01
Background Principal component analysis (PCA) has been widely employed for automatic neuronal spike sorting. Calculating principal components (PCs) is computationally expensive, and requires complex numerical operations and large memory resources. Substantial hardware resources are therefore needed for hardware implementations of PCA. General Hebbian algorithm (GHA) has been proposed for calculating PCs of neuronal spikes in our previous work, which eliminates the needs of computationally expensive covariance analysis and eigenvalue decomposition in conventional PCA algorithms. However, large memory resources are still inherently required for storing a large volume of aligned spikes for training PCs. The large size memory will consume large hardware resources and contribute significant power dissipation, which make GHA difficult to be implemented in portable or implantable multi-channel recording micro-systems. Method In this paper, we present a new algorithm for PCA-based spike sorting based on GHA, namely stream-based Hebbian eigenfilter, which eliminates the inherent memory requirements of GHA while keeping the accuracy of spike sorting by utilizing the pseudo-stationarity of neuronal spikes. Because of the reduction of large hardware storage requirements, the proposed algorithm can lead to ultra-low hardware resources and power consumption of hardware implementations, which is critical for the future multi-channel micro-systems. Both clinical and synthetic neural recording data sets were employed for evaluating the accuracy of the stream-based Hebbian eigenfilter. The performance of spike sorting using stream-based eigenfilter and the computational complexity of the eigenfilter were rigorously evaluated and compared with conventional PCA algorithms. Field programmable logic arrays (FPGAs) were employed to implement the proposed algorithm, evaluate the hardware implementations and demonstrate the reduction in both power consumption and hardware memories achieved by the streaming computing Results and discussion Results demonstrate that the stream-based eigenfilter can achieve the same accuracy and is 10 times more computationally efficient when compared with conventional PCA algorithms. Hardware evaluations show that 90.3% logic resources, 95.1% power consumption and 86.8% computing latency can be reduced by the stream-based eigenfilter when compared with PCA hardware. By utilizing the streaming method, 92% memory resources and 67% power consumption can be saved when compared with the direct implementation of GHA. Conclusion Stream-based Hebbian eigenfilter presents a novel approach to enable real-time spike sorting with reduced computational complexity and hardware costs. This new design can be further utilized for multi-channel neuro-physiological experiments or chronic implants. PMID:22490725
Yurtkuran, Alkın; Emel, Erdal
2016-01-01
The artificial bee colony (ABC) algorithm is a popular swarm based technique, which is inspired from the intelligent foraging behavior of honeybee swarms. This paper proposes a new variant of ABC algorithm, namely, enhanced ABC with solution acceptance rule and probabilistic multisearch (ABC-SA) to address global optimization problems. A new solution acceptance rule is proposed where, instead of greedy selection between old solution and new candidate solution, worse candidate solutions have a probability to be accepted. Additionally, the acceptance probability of worse candidates is nonlinearly decreased throughout the search process adaptively. Moreover, in order to improve the performance of the ABC and balance the intensification and diversification, a probabilistic multisearch strategy is presented. Three different search equations with distinctive characters are employed using predetermined search probabilities. By implementing a new solution acceptance rule and a probabilistic multisearch approach, the intensification and diversification performance of the ABC algorithm is improved. The proposed algorithm has been tested on well-known benchmark functions of varying dimensions by comparing against novel ABC variants, as well as several recent state-of-the-art algorithms. Computational results show that the proposed ABC-SA outperforms other ABC variants and is superior to state-of-the-art algorithms proposed in the literature.
Lifetime Prediction of IGBT in a STATCOM Using Modified-Graphical Rainflow Counting Algorithm
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gopi Reddy, Lakshmi Reddy; Tolbert, Leon M; Ozpineci, Burak
Rainflow algorithms are one of the best counting methods used in fatigue and failure analysis [17]. There have been many approaches to the rainflow algorithm, some proposing modifications. Graphical Rainflow Method (GRM) was proposed recently with a claim of faster execution times [10]. However, the steps of the graphical method of rainflow algorithm, when implemented, do not generate the same output as the four-point or ASTM standard algorithm. A modified graphical method is presented and discussed in this paper to overcome the shortcomings of graphical rainflow algorithm. A fast rainflow algorithm based on four-point algorithm but considering point comparison thanmore » range comparison is also presented. A comparison between the performances of the common rainflow algorithms [6-10], including the proposed methods, in terms of execution time, memory used, and efficiency, complexity, and load sequences is presented. Finally, the rainflow algorithm is applied to temperature data of an IGBT in assessing the lifetime of a STATCOM operating for power factor correction of the load. From 5-minute data load profiles available, the lifetime is estimated to be at 3.4 years.« less
A semi-active suspension control algorithm for vehicle comprehensive vertical dynamics performance
NASA Astrophysics Data System (ADS)
Nie, Shida; Zhuang, Ye; Liu, Weiping; Chen, Fan
2017-08-01
Comprehensive performance of the vehicle, including ride qualities and road-holding, is essentially of great value in practice. Many up-to-date semi-active control algorithms improve vehicle dynamics performance effectively. However, it is hard to improve comprehensive performance for the conflict between ride qualities and road-holding around the second-order resonance. Hence, a new control algorithm is proposed to achieve a good trade-off between ride qualities and road-holding. In this paper, the properties of the invariant points are analysed, which gives an insight into the performance conflicting around the second-order resonance. Based on it, a new control algorithm is proposed. The algorithm employs a novel frequency selector to balance suspension ride and handling performance by adopting a medium damping around the second-order resonance. The results of this study show that the proposed control algorithm could improve the performance of ride qualities and suspension working space up to 18.3% and 8.2%, respectively, with little loss of road-holding compared to the passive suspension. Consequently, the comprehensive performance can be improved by 6.6%. Hence, the proposed algorithm is of great potential to be implemented in practice.
A Robust Sound Source Localization Approach for Microphone Array with Model Errors
NASA Astrophysics Data System (ADS)
Xiao, Hua; Shao, Huai-Zong; Peng, Qi-Cong
In this paper, a robust sound source localization approach is proposed. The approach retains good performance even when model errors exist. Compared with previous work in this field, the contributions of this paper are as follows. First, an improved broad-band and near-field array model is proposed. It takes array gain, phase perturbations into account and is based on the actual positions of the elements. It can be used in arbitrary planar geometry arrays. Second, a subspace model errors estimation algorithm and a Weighted 2-Dimension Multiple Signal Classification (W2D-MUSIC) algorithm are proposed. The subspace model errors estimation algorithm estimates unknown parameters of the array model, i. e., gain, phase perturbations, and positions of the elements, with high accuracy. The performance of this algorithm is improved with the increasing of SNR or number of snapshots. The W2D-MUSIC algorithm based on the improved array model is implemented to locate sound sources. These two algorithms compose the robust sound source approach. The more accurate steering vectors can be provided for further processing such as adaptive beamforming algorithm. Numerical examples confirm effectiveness of this proposed approach.
Zhang, Shang; Dong, Yuhan; Fu, Hongyan; Huang, Shao-Lun; Zhang, Lin
2018-02-22
The miniaturization of spectrometer can broaden the application area of spectrometry, which has huge academic and industrial value. Among various miniaturization approaches, filter-based miniaturization is a promising implementation by utilizing broadband filters with distinct transmission functions. Mathematically, filter-based spectral reconstruction can be modeled as solving a system of linear equations. In this paper, we propose an algorithm of spectral reconstruction based on sparse optimization and dictionary learning. To verify the feasibility of the reconstruction algorithm, we design and implement a simple prototype of a filter-based miniature spectrometer. The experimental results demonstrate that sparse optimization is well applicable to spectral reconstruction whether the spectra are directly sparse or not. As for the non-directly sparse spectra, their sparsity can be enhanced by dictionary learning. In conclusion, the proposed approach has a bright application prospect in fabricating a practical miniature spectrometer.
Zhang, Shang; Fu, Hongyan; Huang, Shao-Lun; Zhang, Lin
2018-01-01
The miniaturization of spectrometer can broaden the application area of spectrometry, which has huge academic and industrial value. Among various miniaturization approaches, filter-based miniaturization is a promising implementation by utilizing broadband filters with distinct transmission functions. Mathematically, filter-based spectral reconstruction can be modeled as solving a system of linear equations. In this paper, we propose an algorithm of spectral reconstruction based on sparse optimization and dictionary learning. To verify the feasibility of the reconstruction algorithm, we design and implement a simple prototype of a filter-based miniature spectrometer. The experimental results demonstrate that sparse optimization is well applicable to spectral reconstruction whether the spectra are directly sparse or not. As for the non-directly sparse spectra, their sparsity can be enhanced by dictionary learning. In conclusion, the proposed approach has a bright application prospect in fabricating a practical miniature spectrometer. PMID:29470406
NASA Astrophysics Data System (ADS)
Feng, Jian-xin; Tang, Jia-fu; Wang, Guang-xing
2007-04-01
On the basis of the analysis of clustering algorithm that had been proposed for MANET, a novel clustering strategy was proposed in this paper. With the trust defined by statistical hypothesis in probability theory and the cluster head selected by node trust and node mobility, this strategy can realize the function of the malicious nodes detection which was neglected by other clustering algorithms and overcome the deficiency of being incapable of implementing the relative mobility metric of corresponding nodes in the MOBIC algorithm caused by the fact that the receiving power of two consecutive HELLO packet cannot be measured. It's an effective solution to cluster MANET securely.
Orżanowski, Tomasz
2016-01-01
This paper presents an infrared focal plane array (IRFPA) response nonuniformity correction (NUC) algorithm which is easy to implement by hardware. The proposed NUC algorithm is based on the linear correction scheme with the useful method of pixel offset correction coefficients update. The new approach to IRFPA response nonuniformity correction consists in the use of pixel response change determined at the actual operating conditions in relation to the reference ones by means of shutter to compensate a pixel offset temporal drift. Moreover, it permits to remove any optics shading effect in the output image as well. To show efficiency of the proposed NUC algorithm some test results for microbolometer IRFPA are presented.
Zhang, Huaguang; Song, Ruizhuo; Wei, Qinglai; Zhang, Tieyan
2011-12-01
In this paper, a novel heuristic dynamic programming (HDP) iteration algorithm is proposed to solve the optimal tracking control problem for a class of nonlinear discrete-time systems with time delays. The novel algorithm contains state updating, control policy iteration, and performance index iteration. To get the optimal states, the states are also updated. Furthermore, the "backward iteration" is applied to state updating. Two neural networks are used to approximate the performance index function and compute the optimal control policy for facilitating the implementation of HDP iteration algorithm. At last, we present two examples to demonstrate the effectiveness of the proposed HDP iteration algorithm.
A bootstrap based Neyman-Pearson test for identifying variable importance.
Ditzler, Gregory; Polikar, Robi; Rosen, Gail
2015-04-01
Selection of most informative features that leads to a small loss on future data are arguably one of the most important steps in classification, data analysis and model selection. Several feature selection (FS) algorithms are available; however, due to noise present in any data set, FS algorithms are typically accompanied by an appropriate cross-validation scheme. In this brief, we propose a statistical hypothesis test derived from the Neyman-Pearson lemma for determining if a feature is statistically relevant. The proposed approach can be applied as a wrapper to any FS algorithm, regardless of the FS criteria used by that algorithm, to determine whether a feature belongs in the relevant set. Perhaps more importantly, this procedure efficiently determines the number of relevant features given an initial starting point. We provide freely available software implementations of the proposed methodology.
Demonstration of essentiality of entanglement in a Deutsch-like quantum algorithm
NASA Astrophysics Data System (ADS)
Huang, He-Liang; Goswami, Ashutosh K.; Bao, Wan-Su; Panigrahi, Prasanta K.
2018-06-01
Quantum algorithms can be used to efficiently solve certain classically intractable problems by exploiting quantum parallelism. However, the effectiveness of quantum entanglement in quantum computing remains a question of debate. This study presents a new quantum algorithm that shows entanglement could provide advantages over both classical algorithms and quantum algo- rithms without entanglement. Experiments are implemented to demonstrate the proposed algorithm using superconducting qubits. Results show the viability of the algorithm and suggest that entanglement is essential in obtaining quantum speedup for certain problems in quantum computing. The study provides reliable and clear guidance for developing useful quantum algorithms.
Applications and accuracy of the parallel diagonal dominant algorithm
NASA Technical Reports Server (NTRS)
Sun, Xian-He
1993-01-01
The Parallel Diagonal Dominant (PDD) algorithm is a highly efficient, ideally scalable tridiagonal solver. In this paper, a detailed study of the PDD algorithm is given. First the PDD algorithm is introduced. Then the algorithm is extended to solve periodic tridiagonal systems. A variant, the reduced PDD algorithm, is also proposed. Accuracy analysis is provided for a class of tridiagonal systems, the symmetric, and anti-symmetric Toeplitz tridiagonal systems. Implementation results show that the analysis gives a good bound on the relative error, and the algorithm is a good candidate for the emerging massively parallel machines.
ERGC: an efficient referential genome compression algorithm
Saha, Subrata; Rajasekaran, Sanguthevar
2015-01-01
Motivation: Genome sequencing has become faster and more affordable. Consequently, the number of available complete genomic sequences is increasing rapidly. As a result, the cost to store, process, analyze and transmit the data is becoming a bottleneck for research and future medical applications. So, the need for devising efficient data compression and data reduction techniques for biological sequencing data is growing by the day. Although there exists a number of standard data compression algorithms, they are not efficient in compressing biological data. These generic algorithms do not exploit some inherent properties of the sequencing data while compressing. To exploit statistical and information-theoretic properties of genomic sequences, we need specialized compression algorithms. Five different next-generation sequencing data compression problems have been identified and studied in the literature. We propose a novel algorithm for one of these problems known as reference-based genome compression. Results: We have done extensive experiments using five real sequencing datasets. The results on real genomes show that our proposed algorithm is indeed competitive and performs better than the best known algorithms for this problem. It achieves compression ratios that are better than those of the currently best performing algorithms. The time to compress and decompress the whole genome is also very promising. Availability and implementation: The implementations are freely available for non-commercial purposes. They can be downloaded from http://engr.uconn.edu/∼rajasek/ERGC.zip. Contact: rajasek@engr.uconn.edu PMID:26139636
Minimization of Delay Costs in the Realization of Production Orders in Two-Machine System
NASA Astrophysics Data System (ADS)
Dylewski, Robert; Jardzioch, Andrzej; Dworak, Oliver
2018-03-01
The article presents a new algorithm that enables the allocation of the optimal scheduling of the production orders in the two-machine system based on the minimum cost of order delays. The formulated algorithm uses the method of branch and bounds and it is a particular generalisation of the algorithm enabling for the determination of the sequence of the production orders with the minimal sum of the delays. In order to illustrate the proposed algorithm in the best way, the article contains examples accompanied by the graphical trees of solutions. The research analysing the utility of the said algorithm was conducted. The achieved results proved the usefulness of the proposed algorithm when applied to scheduling of orders. The formulated algorithm was implemented in the Matlab programme. In addition, the studies for different sets of production orders were conducted.
Research on retailer data clustering algorithm based on Spark
NASA Astrophysics Data System (ADS)
Huang, Qiuman; Zhou, Feng
2017-03-01
Big data analysis is a hot topic in the IT field now. Spark is a high-reliability and high-performance distributed parallel computing framework for big data sets. K-means algorithm is one of the classical partition methods in clustering algorithm. In this paper, we study the k-means clustering algorithm on Spark. Firstly, the principle of the algorithm is analyzed, and then the clustering analysis is carried out on the supermarket customers through the experiment to find out the different shopping patterns. At the same time, this paper proposes the parallelization of k-means algorithm and the distributed computing framework of Spark, and gives the concrete design scheme and implementation scheme. This paper uses the two-year sales data of a supermarket to validate the proposed clustering algorithm and achieve the goal of subdividing customers, and then analyze the clustering results to help enterprises to take different marketing strategies for different customer groups to improve sales performance.
Real-time image dehazing using local adaptive neighborhoods and dark-channel-prior
NASA Astrophysics Data System (ADS)
Valderrama, Jesus A.; Díaz-Ramírez, Víctor H.; Kober, Vitaly; Hernandez, Enrique
2015-09-01
A real-time algorithm for single image dehazing is presented. The algorithm is based on calculation of local neighborhoods of a hazed image inside a moving window. The local neighborhoods are constructed by computing rank-order statistics. Next the dark-channel-prior approach is applied to the local neighborhoods to estimate the transmission function of the scene. By using the suggested approach there is no need for applying a refining algorithm to the estimated transmission such as the soft matting algorithm. To achieve high-rate signal processing the proposed algorithm is implemented exploiting massive parallelism on a graphics processing unit (GPU). Computer simulation results are carried out to test the performance of the proposed algorithm in terms of dehazing efficiency and speed of processing. These tests are performed using several synthetic and real images. The obtained results are analyzed and compared with those obtained with existing dehazing algorithms.
Wang, Min; Tian, Yun
2018-01-01
The Canny operator is widely used to detect edges in images. However, as the size of the image dataset increases, the edge detection performance of the Canny operator decreases and its runtime becomes excessive. To improve the runtime and edge detection performance of the Canny operator, in this paper, we propose a parallel design and implementation for an Otsu-optimized Canny operator using a MapReduce parallel programming model that runs on the Hadoop platform. The Otsu algorithm is used to optimize the Canny operator's dual threshold and improve the edge detection performance, while the MapReduce parallel programming model facilitates parallel processing for the Canny operator to solve the processing speed and communication cost problems that occur when the Canny edge detection algorithm is applied to big data. For the experiments, we constructed datasets of different scales from the Pascal VOC2012 image database. The proposed parallel Otsu-Canny edge detection algorithm performs better than other traditional edge detection algorithms. The parallel approach reduced the running time by approximately 67.2% on a Hadoop cluster architecture consisting of 5 nodes with a dataset of 60,000 images. Overall, our approach system speeds up the system by approximately 3.4 times when processing large-scale datasets, which demonstrates the obvious superiority of our method. The proposed algorithm in this study demonstrates both better edge detection performance and improved time performance. PMID:29861711
2015-01-01
We present and discuss philosophy and methodology of chaotic evolution that is theoretically supported by chaos theory. We introduce four chaotic systems, that is, logistic map, tent map, Gaussian map, and Hénon map, in a well-designed chaotic evolution algorithm framework to implement several chaotic evolution (CE) algorithms. By comparing our previous proposed CE algorithm with logistic map and two canonical differential evolution (DE) algorithms, we analyse and discuss optimization performance of CE algorithm. An investigation on the relationship between optimization capability of CE algorithm and distribution characteristic of chaotic system is conducted and analysed. From evaluation result, we find that distribution of chaotic system is an essential factor to influence optimization performance of CE algorithm. We propose a new interactive EC (IEC) algorithm, interactive chaotic evolution (ICE) that replaces fitness function with a real human in CE algorithm framework. There is a paired comparison-based mechanism behind CE search scheme in nature. A simulation experimental evaluation is conducted with a pseudo-IEC user to evaluate our proposed ICE algorithm. The evaluation result indicates that ICE algorithm can obtain a significant better performance than or the same performance as interactive DE. Some open topics on CE, ICE, fusion of these optimization techniques, algorithmic notation, and others are presented and discussed. PMID:25879067
Pei, Yan
2015-01-01
We present and discuss philosophy and methodology of chaotic evolution that is theoretically supported by chaos theory. We introduce four chaotic systems, that is, logistic map, tent map, Gaussian map, and Hénon map, in a well-designed chaotic evolution algorithm framework to implement several chaotic evolution (CE) algorithms. By comparing our previous proposed CE algorithm with logistic map and two canonical differential evolution (DE) algorithms, we analyse and discuss optimization performance of CE algorithm. An investigation on the relationship between optimization capability of CE algorithm and distribution characteristic of chaotic system is conducted and analysed. From evaluation result, we find that distribution of chaotic system is an essential factor to influence optimization performance of CE algorithm. We propose a new interactive EC (IEC) algorithm, interactive chaotic evolution (ICE) that replaces fitness function with a real human in CE algorithm framework. There is a paired comparison-based mechanism behind CE search scheme in nature. A simulation experimental evaluation is conducted with a pseudo-IEC user to evaluate our proposed ICE algorithm. The evaluation result indicates that ICE algorithm can obtain a significant better performance than or the same performance as interactive DE. Some open topics on CE, ICE, fusion of these optimization techniques, algorithmic notation, and others are presented and discussed.
Searching for efficient Markov chain Monte Carlo proposal kernels
Yang, Ziheng; Rodríguez, Carlos E.
2013-01-01
Markov chain Monte Carlo (MCMC) or the Metropolis–Hastings algorithm is a simulation algorithm that has made modern Bayesian statistical inference possible. Nevertheless, the efficiency of different Metropolis–Hastings proposal kernels has rarely been studied except for the Gaussian proposal. Here we propose a unique class of Bactrian kernels, which avoid proposing values that are very close to the current value, and compare their efficiency with a number of proposals for simulating different target distributions, with efficiency measured by the asymptotic variance of a parameter estimate. The uniform kernel is found to be more efficient than the Gaussian kernel, whereas the Bactrian kernel is even better. When optimal scales are used for both, the Bactrian kernel is at least 50% more efficient than the Gaussian. Implementation in a Bayesian program for molecular clock dating confirms the general applicability of our results to generic MCMC algorithms. Our results refute a previous claim that all proposals had nearly identical performance and will prompt further research into efficient MCMC proposals. PMID:24218600
Luo, Junhai; Fu, Liang
2017-06-09
With the development of communication technology, the demand for location-based services is growing rapidly. This paper presents an algorithm for indoor localization based on Received Signal Strength (RSS), which is collected from Access Points (APs). The proposed localization algorithm contains the offline information acquisition phase and online positioning phase. Firstly, the AP selection algorithm is reviewed and improved based on the stability of signals to remove useless AP; secondly, Kernel Principal Component Analysis (KPCA) is analyzed and used to remove the data redundancy and maintain useful characteristics for nonlinear feature extraction; thirdly, the Affinity Propagation Clustering (APC) algorithm utilizes RSS values to classify data samples and narrow the positioning range. In the online positioning phase, the classified data will be matched with the testing data to determine the position area, and the Maximum Likelihood (ML) estimate will be employed for precise positioning. Eventually, the proposed algorithm is implemented in a real-world environment for performance evaluation. Experimental results demonstrate that the proposed algorithm improves the accuracy and computational complexity.
NASA Astrophysics Data System (ADS)
Busanelli, Stefano; Martalò, Marco; Ferrari, Gianluigi; Spigoni, Giovanni; Iotti, Nicola
In this paper, we analyze the performance of vertical handover (VHO) algorithms for seamless mobility between WiFi and UMTS networks. We focus on a no-coupling scenario, characterized by the lack of any form of cooperation between the involved players (users and network operators). In this context, we first propose a low-complexity Received Signal Strength Indicator (RSSI)-based algorithm, and then an improved hybrid RSSI/goodput version. We present experimental results based on the implementation of a real testbed with commercial WiFi (Guglielmo) and UMTS (Telecom Italia) deployed networks. Despite the relatively long handover times experienced in our testbed, the proposed RSSI-based VHO algorithm guarantees an effective goodput increase at the MTs. Moreover, this algorithm mitigates the ping-pong phenomenon.
A multi-group firefly algorithm for numerical optimization
NASA Astrophysics Data System (ADS)
Tong, Nan; Fu, Qiang; Zhong, Caiming; Wang, Pengjun
2017-08-01
To solve the problem of premature convergence of firefly algorithm (FA), this paper analyzes the evolution mechanism of the algorithm, and proposes an improved Firefly algorithm based on modified evolution model and multi-group learning mechanism (IMGFA). A Firefly colony is divided into several subgroups with different model parameters. Within each subgroup, the optimal firefly is responsible for leading the others fireflies to implement the early global evolution, and establish the information mutual system among the fireflies. And then, each firefly achieves local search by following the brighter firefly in its neighbors. At the same time, learning mechanism among the best fireflies in various subgroups to exchange information can help the population to obtain global optimization goals more effectively. Experimental results verify the effectiveness of the proposed algorithm.
NASA Astrophysics Data System (ADS)
Thiebaut, C.; Perraud, L.; Delvit, J. M.; Latry, C.
2016-07-01
We present an on-board satellite implementation of a gradient-based (optical flows) algorithm for the shifts estimation between images of a Shack-Hartmann wave-front sensor on extended landscapes. The proposed algorithm has low complexity in comparison with classical correlation methods which is a big advantage for being used on-board a satellite at high instrument data rate and in real-time. The electronic board used for this implementation is designed for space applications and is composed of radiation-hardened software and hardware. Processing times of both shift estimations and pre-processing steps are compatible of on-board real-time computation.
Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication
Azad, Ariful; Ballard, Grey; Buluc, Aydin; ...
2016-11-08
Sparse matrix-matrix multiplication (or SpGEMM) is a key primitive for many high-performance graph algorithms as well as for some linear solvers, such as algebraic multigrid. The scaling of existing parallel implementations of SpGEMM is heavily bound by communication. Even though 3D (or 2.5D) algorithms have been proposed and theoretically analyzed in the flat MPI model on Erdös-Rényi matrices, those algorithms had not been implemented in practice and their complexities had not been analyzed for the general case. In this work, we present the first implementation of the 3D SpGEMM formulation that exploits multiple (intranode and internode) levels of parallelism, achievingmore » significant speedups over the state-of-the-art publicly available codes at all levels of concurrencies. We extensively evaluate our implementation and identify bottlenecks that should be subject to further research.« less
Further optimization of SeDDaRA blind image deconvolution algorithm and its DSP implementation
NASA Astrophysics Data System (ADS)
Wen, Bo; Zhang, Qiheng; Zhang, Jianlin
2011-11-01
Efficient algorithm for blind image deconvolution and its high-speed implementation is of great value in practice. Further optimization of SeDDaRA is developed, from algorithm structure to numerical calculation methods. The main optimization covers that, the structure's modularization for good implementation feasibility, reducing the data computation and dependency of 2D-FFT/IFFT, and acceleration of power operation by segmented look-up table. Then the Fast SeDDaRA is proposed and specialized for low complexity. As the final implementation, a hardware system of image restoration is conducted by using the multi-DSP parallel processing. Experimental results show that, the processing time and memory demand of Fast SeDDaRA decreases 50% at least; the data throughput of image restoration system is over 7.8Msps. The optimization is proved efficient and feasible, and the Fast SeDDaRA is able to support the real-time application.
Resource efficient data compression algorithms for demanding, WSN based biomedical applications.
Antonopoulos, Christos P; Voros, Nikolaos S
2016-02-01
During the last few years, medical research areas of critical importance such as Epilepsy monitoring and study, increasingly utilize wireless sensor network technologies in order to achieve better understanding and significant breakthroughs. However, the limited memory and communication bandwidth offered by WSN platforms comprise a significant shortcoming to such demanding application scenarios. Although, data compression can mitigate such deficiencies there is a lack of objective and comprehensive evaluation of relative approaches and even more on specialized approaches targeting specific demanding applications. The research work presented in this paper focuses on implementing and offering an in-depth experimental study regarding prominent, already existing as well as novel proposed compression algorithms. All algorithms have been implemented in a common Matlab framework. A major contribution of this paper, that differentiates it from similar research efforts, is the employment of real world Electroencephalography (EEG) and Electrocardiography (ECG) datasets comprising the two most demanding Epilepsy modalities. Emphasis is put on WSN applications, thus the respective metrics focus on compression rate and execution latency for the selected datasets. The evaluation results reveal significant performance and behavioral characteristics of the algorithms related to their complexity and the relative negative effect on compression latency as opposed to the increased compression rate. It is noted that the proposed schemes managed to offer considerable advantage especially aiming to achieve the optimum tradeoff between compression rate-latency. Specifically, proposed algorithm managed to combine highly completive level of compression while ensuring minimum latency thus exhibiting real-time capabilities. Additionally, one of the proposed schemes is compared against state-of-the-art general-purpose compression algorithms also exhibiting considerable advantages as far as the compression rate is concerned. Copyright © 2015 Elsevier Inc. All rights reserved.
Experimental determination of Ramsey numbers.
Bian, Zhengbing; Chudak, Fabian; Macready, William G; Clark, Lane; Gaitan, Frank
2013-09-27
Ramsey theory is a highly active research area in mathematics that studies the emergence of order in large disordered structures. Ramsey numbers mark the threshold at which order first appears and are extremely difficult to calculate due to their explosive rate of growth. Recently, an algorithm that can be implemented using adiabatic quantum evolution has been proposed that calculates the two-color Ramsey numbers R(m,n). Here we present results of an experimental implementation of this algorithm and show that it correctly determines the Ramsey numbers R(3,3) and R(m,2) for 4≤m≤8. The R(8,2) computation used 84 qubits of which 28 were computational qubits. This computation is the largest experimental implementation of a scientifically meaningful adiabatic evolution algorithm that has been done to date.
Experimental Determination of Ramsey Numbers
NASA Astrophysics Data System (ADS)
Bian, Zhengbing; Chudak, Fabian; Macready, William G.; Clark, Lane; Gaitan, Frank
2013-09-01
Ramsey theory is a highly active research area in mathematics that studies the emergence of order in large disordered structures. Ramsey numbers mark the threshold at which order first appears and are extremely difficult to calculate due to their explosive rate of growth. Recently, an algorithm that can be implemented using adiabatic quantum evolution has been proposed that calculates the two-color Ramsey numbers R(m,n). Here we present results of an experimental implementation of this algorithm and show that it correctly determines the Ramsey numbers R(3,3) and R(m,2) for 4≤m≤8. The R(8,2) computation used 84 qubits of which 28 were computational qubits. This computation is the largest experimental implementation of a scientifically meaningful adiabatic evolution algorithm that has been done to date.
A robust color image watermarking algorithm against rotation attacks
NASA Astrophysics Data System (ADS)
Han, Shao-cheng; Yang, Jin-feng; Wang, Rui; Jia, Gui-min
2018-01-01
A robust digital watermarking algorithm is proposed based on quaternion wavelet transform (QWT) and discrete cosine transform (DCT) for copyright protection of color images. The luminance component Y of a host color image in YIQ space is decomposed by QWT, and then the coefficients of four low-frequency subbands are transformed by DCT. An original binary watermark scrambled by Arnold map and iterated sine chaotic system is embedded into the mid-frequency DCT coefficients of the subbands. In order to improve the performance of the proposed algorithm against rotation attacks, a rotation detection scheme is implemented before watermark extracting. The experimental results demonstrate that the proposed watermarking scheme shows strong robustness not only against common image processing attacks but also against arbitrary rotation attacks.
An efficient variable projection formulation for separable nonlinear least squares problems.
Gan, Min; Li, Han-Xiong
2014-05-01
We consider in this paper a class of nonlinear least squares problems in which the model can be represented as a linear combination of nonlinear functions. The variable projection algorithm projects the linear parameters out of the problem, leaving the nonlinear least squares problems involving only the nonlinear parameters. To implement the variable projection algorithm more efficiently, we propose a new variable projection functional based on matrix decomposition. The advantage of the proposed formulation is that the size of the decomposed matrix may be much smaller than those of previous ones. The Levenberg-Marquardt algorithm using finite difference method is then applied to minimize the new criterion. Numerical results show that the proposed approach achieves significant reduction in computing time.
Finding topological center of a geographic space via road network
NASA Astrophysics Data System (ADS)
Gao, Liang; Miao, Yanan; Qin, Yuhao; Zhao, Xiaomei; Gao, Zi-You
2015-02-01
Previous studies show that the center of a geographic space is of great importance in urban and regional studies, including study of population distribution, urban growth modeling, and scaling properties of urban systems, etc. But how to well define and how to efficiently extract the center of a geographic space are still largely unknown. Recently, Jiang et al. have presented a definition of topological center by their block detection (BD) algorithm. Despite the fact that they first introduced the definition and discovered the 'true center', in human minds, their algorithm left several redundancies in its traversal process. Here, we propose an alternative road-cycle detection (RCD) algorithm to find the topological center, which extracts the outmost road-cycle recursively. To foster the application of the topological center in related research fields, we first reproduce the BD algorithm in Python (pyBD), then implement the RCD algorithm in two ways: the ArcPy implementation (arcRCD) and the Python implementation (pyRCD). After the experiments on twenty-four typical road networks, we find that the results of our RCD algorithm are consistent with those of Jiang's BD algorithm. We also find that the RCD algorithm is at least seven times more efficient than the BD algorithm on all the ten typical road networks.
He, Chenlong; Feng, Zuren; Ren, Zhigang
2018-01-01
In this paper, we propose a connectivity-preserving flocking algorithm for multi-agent systems in which the neighbor set of each agent is determined by the hybrid metric-topological distance so that the interaction topology can be represented as the range-limited Delaunay graph, which combines the properties of the commonly used disk graph and Delaunay graph. As a result, the proposed flocking algorithm has the following advantages over the existing ones. First, range-limited Delaunay graph is sparser than the disk graph so that the information exchange among agents is reduced significantly. Second, some links irrelevant to the connectivity can be dynamically deleted during the evolution of the system. Thus, the proposed flocking algorithm is more flexible than existing algorithms, where links are not allowed to be disconnected once they are created. Finally, the multi-agent system spontaneously generates a regular quasi-lattice formation without imposing the constraint on the ratio of the sensing range of the agent to the desired distance between two adjacent agents. With the interaction topology induced by the hybrid distance, the proposed flocking algorithm can still be implemented in a distributed manner. We prove that the proposed flocking algorithm can steer the multi-agent system to a stable flocking motion, provided the initial interaction topology of multi-agent systems is connected and the hysteresis in link addition is smaller than a derived upper bound. The correctness and effectiveness of the proposed algorithm are verified by extensive numerical simulations, where the flocking algorithms based on the disk and Delaunay graph are compared.
Feng, Zuren; Ren, Zhigang
2018-01-01
In this paper, we propose a connectivity-preserving flocking algorithm for multi-agent systems in which the neighbor set of each agent is determined by the hybrid metric-topological distance so that the interaction topology can be represented as the range-limited Delaunay graph, which combines the properties of the commonly used disk graph and Delaunay graph. As a result, the proposed flocking algorithm has the following advantages over the existing ones. First, range-limited Delaunay graph is sparser than the disk graph so that the information exchange among agents is reduced significantly. Second, some links irrelevant to the connectivity can be dynamically deleted during the evolution of the system. Thus, the proposed flocking algorithm is more flexible than existing algorithms, where links are not allowed to be disconnected once they are created. Finally, the multi-agent system spontaneously generates a regular quasi-lattice formation without imposing the constraint on the ratio of the sensing range of the agent to the desired distance between two adjacent agents. With the interaction topology induced by the hybrid distance, the proposed flocking algorithm can still be implemented in a distributed manner. We prove that the proposed flocking algorithm can steer the multi-agent system to a stable flocking motion, provided the initial interaction topology of multi-agent systems is connected and the hysteresis in link addition is smaller than a derived upper bound. The correctness and effectiveness of the proposed algorithm are verified by extensive numerical simulations, where the flocking algorithms based on the disk and Delaunay graph are compared. PMID:29462217
Correlation coefficient based supervised locally linear embedding for pulmonary nodule recognition.
Wu, Panpan; Xia, Kewen; Yu, Hengyong
2016-11-01
Dimensionality reduction techniques are developed to suppress the negative effects of high dimensional feature space of lung CT images on classification performance in computer aided detection (CAD) systems for pulmonary nodule detection. An improved supervised locally linear embedding (SLLE) algorithm is proposed based on the concept of correlation coefficient. The Spearman's rank correlation coefficient is introduced to adjust the distance metric in the SLLE algorithm to ensure that more suitable neighborhood points could be identified, and thus to enhance the discriminating power of embedded data. The proposed Spearman's rank correlation coefficient based SLLE (SC(2)SLLE) is implemented and validated in our pilot CAD system using a clinical dataset collected from the publicly available lung image database consortium and image database resource initiative (LICD-IDRI). Particularly, a representative CAD system for solitary pulmonary nodule detection is designed and implemented. After a sequential medical image processing steps, 64 nodules and 140 non-nodules are extracted, and 34 representative features are calculated. The SC(2)SLLE, as well as SLLE and LLE algorithm, are applied to reduce the dimensionality. Several quantitative measurements are also used to evaluate and compare the performances. Using a 5-fold cross-validation methodology, the proposed algorithm achieves 87.65% accuracy, 79.23% sensitivity, 91.43% specificity, and 8.57% false positive rate, on average. Experimental results indicate that the proposed algorithm outperforms the original locally linear embedding and SLLE coupled with the support vector machine (SVM) classifier. Based on the preliminary results from a limited number of nodules in our dataset, this study demonstrates the great potential to improve the performance of a CAD system for nodule detection using the proposed SC(2)SLLE. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
GPU-based Branchless Distance-Driven Projection and Backprojection
Liu, Rui; Fu, Lin; De Man, Bruno; Yu, Hengyong
2017-01-01
Projection and backprojection operations are essential in a variety of image reconstruction and physical correction algorithms in CT. The distance-driven (DD) projection and backprojection are widely used for their highly sequential memory access pattern and low arithmetic cost. However, a typical DD implementation has an inner loop that adjusts the calculation depending on the relative position between voxel and detector cell boundaries. The irregularity of the branch behavior makes it inefficient to be implemented on massively parallel computing devices such as graphics processing units (GPUs). Such irregular branch behaviors can be eliminated by factorizing the DD operation as three branchless steps: integration, linear interpolation, and differentiation, all of which are highly amenable to massive vectorization. In this paper, we implement and evaluate a highly parallel branchless DD algorithm for 3D cone beam CT. The algorithm utilizes the texture memory and hardware interpolation on GPUs to achieve fast computational speed. The developed branchless DD algorithm achieved 137-fold speedup for forward projection and 188-fold speedup for backprojection relative to a single-thread CPU implementation. Compared with a state-of-the-art 32-thread CPU implementation, the proposed branchless DD achieved 8-fold acceleration for forward projection and 10-fold acceleration for backprojection. GPU based branchless DD method was evaluated by iterative reconstruction algorithms with both simulation and real datasets. It obtained visually identical images as the CPU reference algorithm. PMID:29333480
GPU-based Branchless Distance-Driven Projection and Backprojection.
Liu, Rui; Fu, Lin; De Man, Bruno; Yu, Hengyong
2017-12-01
Projection and backprojection operations are essential in a variety of image reconstruction and physical correction algorithms in CT. The distance-driven (DD) projection and backprojection are widely used for their highly sequential memory access pattern and low arithmetic cost. However, a typical DD implementation has an inner loop that adjusts the calculation depending on the relative position between voxel and detector cell boundaries. The irregularity of the branch behavior makes it inefficient to be implemented on massively parallel computing devices such as graphics processing units (GPUs). Such irregular branch behaviors can be eliminated by factorizing the DD operation as three branchless steps: integration, linear interpolation, and differentiation, all of which are highly amenable to massive vectorization. In this paper, we implement and evaluate a highly parallel branchless DD algorithm for 3D cone beam CT. The algorithm utilizes the texture memory and hardware interpolation on GPUs to achieve fast computational speed. The developed branchless DD algorithm achieved 137-fold speedup for forward projection and 188-fold speedup for backprojection relative to a single-thread CPU implementation. Compared with a state-of-the-art 32-thread CPU implementation, the proposed branchless DD achieved 8-fold acceleration for forward projection and 10-fold acceleration for backprojection. GPU based branchless DD method was evaluated by iterative reconstruction algorithms with both simulation and real datasets. It obtained visually identical images as the CPU reference algorithm.
An efficient implementation of Forward-Backward Least-Mean-Square Adaptive Line Enhancers
NASA Technical Reports Server (NTRS)
Yeh, H.-G.; Nguyen, T. M.
1995-01-01
An efficient implementation of the forward-backward least-mean-square (FBLMS) adaptive line enhancer is presented in this article. Without changing the characteristics of the FBLMS adaptive line enhancer, the proposed implementation technique reduces multiplications by 25% and additions by 12.5% in two successive time samples in comparison with those operations of direct implementation in both prediction and weight control. The proposed FBLMS architecture and algorithm can be applied to digital receivers for enhancing signal-to-noise ratio to allow fast carrier acquisition and tracking in both stationary and nonstationary environments.
Spiking neuron network Helmholtz machine.
Sountsov, Pavel; Miller, Paul
2015-01-01
An increasing amount of behavioral and neurophysiological data suggests that the brain performs optimal (or near-optimal) probabilistic inference and learning during perception and other tasks. Although many machine learning algorithms exist that perform inference and learning in an optimal way, the complete description of how one of those algorithms (or a novel algorithm) can be implemented in the brain is currently incomplete. There have been many proposed solutions that address how neurons can perform optimal inference but the question of how synaptic plasticity can implement optimal learning is rarely addressed. This paper aims to unify the two fields of probabilistic inference and synaptic plasticity by using a neuronal network of realistic model spiking neurons to implement a well-studied computational model called the Helmholtz Machine. The Helmholtz Machine is amenable to neural implementation as the algorithm it uses to learn its parameters, called the wake-sleep algorithm, uses a local delta learning rule. Our spiking-neuron network implements both the delta rule and a small example of a Helmholtz machine. This neuronal network can learn an internal model of continuous-valued training data sets without supervision. The network can also perform inference on the learned internal models. We show how various biophysical features of the neural implementation constrain the parameters of the wake-sleep algorithm, such as the duration of the wake and sleep phases of learning and the minimal sample duration. We examine the deviations from optimal performance and tie them to the properties of the synaptic plasticity rule.
Spiking neuron network Helmholtz machine
Sountsov, Pavel; Miller, Paul
2015-01-01
An increasing amount of behavioral and neurophysiological data suggests that the brain performs optimal (or near-optimal) probabilistic inference and learning during perception and other tasks. Although many machine learning algorithms exist that perform inference and learning in an optimal way, the complete description of how one of those algorithms (or a novel algorithm) can be implemented in the brain is currently incomplete. There have been many proposed solutions that address how neurons can perform optimal inference but the question of how synaptic plasticity can implement optimal learning is rarely addressed. This paper aims to unify the two fields of probabilistic inference and synaptic plasticity by using a neuronal network of realistic model spiking neurons to implement a well-studied computational model called the Helmholtz Machine. The Helmholtz Machine is amenable to neural implementation as the algorithm it uses to learn its parameters, called the wake-sleep algorithm, uses a local delta learning rule. Our spiking-neuron network implements both the delta rule and a small example of a Helmholtz machine. This neuronal network can learn an internal model of continuous-valued training data sets without supervision. The network can also perform inference on the learned internal models. We show how various biophysical features of the neural implementation constrain the parameters of the wake-sleep algorithm, such as the duration of the wake and sleep phases of learning and the minimal sample duration. We examine the deviations from optimal performance and tie them to the properties of the synaptic plasticity rule. PMID:25954191
Design and Implementation of Hybrid CORDIC Algorithm Based on Phase Rotation Estimation for NCO
Zhang, Chaozhu; Han, Jinan; Li, Ke
2014-01-01
The numerical controlled oscillator has wide application in radar, digital receiver, and software radio system. Firstly, this paper introduces the traditional CORDIC algorithm. Then in order to improve computing speed and save resources, this paper proposes a kind of hybrid CORDIC algorithm based on phase rotation estimation applied in numerical controlled oscillator (NCO). Through estimating the direction of part phase rotation, the algorithm reduces part phase rotation and add-subtract unit, so that it decreases delay. Furthermore, the paper simulates and implements the numerical controlled oscillator by Quartus II software and Modelsim software. Finally, simulation results indicate that the improvement over traditional CORDIC algorithm is achieved in terms of ease of computation, resource utilization, and computing speed/delay while maintaining the precision. It is suitable for high speed and precision digital modulation and demodulation. PMID:25110750
Peng, Bo; Wang, Yuqi; Hall, Timothy J; Jiang, Jingfeng
2017-04-01
Our primary objective of this paper was to extend a previously published 2-D coupled subsample tracking algorithm for 3-D speckle tracking in the framework of ultrasound breast strain elastography. In order to overcome heavy computational cost, we investigated the use of a graphic processing unit (GPU) to accelerate the 3-D coupled subsample speckle tracking method. The performance of the proposed GPU implementation was tested using a tissue-mimicking phantom and in vivo breast ultrasound data. The performance of this 3-D subsample tracking algorithm was compared with the conventional 3-D quadratic subsample estimation algorithm. On the basis of these evaluations, we concluded that the GPU implementation of this 3-D subsample estimation algorithm can provide high-quality strain data (i.e., high correlation between the predeformation and the motion-compensated postdeformation radio frequency echo data and high contrast-to-noise ratio strain images), as compared with the conventional 3-D quadratic subsample algorithm. Using the GPU implementation of the 3-D speckle tracking algorithm, volumetric strain data can be achieved relatively fast (approximately 20 s per volume [2.5 cm ×2.5 cm ×2.5 cm]).
Numerical estimation of the relative entropy of entanglement
NASA Astrophysics Data System (ADS)
Zinchenko, Yuriy; Friedland, Shmuel; Gour, Gilad
2010-11-01
We propose a practical algorithm for the calculation of the relative entropy of entanglement (REE), defined as the minimum relative entropy between a state and the set of states with positive partial transpose. Our algorithm is based on a practical semidefinite cutting plane approach. In low dimensions the implementation of the algorithm in matlab provides an estimation for the REE with an absolute error smaller than 10-3.
Majorana-Based Fermionic Quantum Computation.
O'Brien, T E; Rożek, P; Akhmerov, A R
2018-06-01
Because Majorana zero modes store quantum information nonlocally, they are protected from noise, and have been proposed as a building block for a quantum computer. We show how to use the same protection from noise to implement universal fermionic quantum computation. Our architecture requires only two Majorana modes to encode a fermionic quantum degree of freedom, compared to alternative implementations which require a minimum of four Majorana modes for a spin quantum degree of freedom. The fermionic degrees of freedom support both unitary coupled cluster variational quantum eigensolver and quantum phase estimation algorithms, proposed for quantum chemistry simulations. Because we avoid the Jordan-Wigner transformation, our scheme has a lower overhead for implementing both of these algorithms, allowing for simulation of the Trotterized Hubbard Hamiltonian in O(1) time per unitary step. We finally demonstrate magic state distillation in our fermionic architecture, giving a universal set of topologically protected fermionic quantum gates.
Design and implementation of ticket price forecasting system
NASA Astrophysics Data System (ADS)
Li, Yuling; Li, Zhichao
2018-05-01
With the advent of the aviation travel industry, a large number of data mining technologies have been developed to increase profits for airlines in the past two decades. The implementation of the digital optimization strategy leads to price discrimination, for example, similar seats on the same flight are purchased at different prices, depending on the time of purchase, the supplier, and so on. Price fluctuations make the prediction of ticket prices have application value. In this paper, a combination of ARMA algorithm and random forest algorithm is proposed to predict the price of air ticket. The experimental results show that the model is more reliable by comparing the forecasting results with the actual results of each price model. The model is helpful for passengers to buy tickets and to save money. Based on the proposed model, using Python language and SQL Server database, we design and implement the ticket price forecasting system.
Majorana-Based Fermionic Quantum Computation
NASA Astrophysics Data System (ADS)
O'Brien, T. E.; RoŻek, P.; Akhmerov, A. R.
2018-06-01
Because Majorana zero modes store quantum information nonlocally, they are protected from noise, and have been proposed as a building block for a quantum computer. We show how to use the same protection from noise to implement universal fermionic quantum computation. Our architecture requires only two Majorana modes to encode a fermionic quantum degree of freedom, compared to alternative implementations which require a minimum of four Majorana modes for a spin quantum degree of freedom. The fermionic degrees of freedom support both unitary coupled cluster variational quantum eigensolver and quantum phase estimation algorithms, proposed for quantum chemistry simulations. Because we avoid the Jordan-Wigner transformation, our scheme has a lower overhead for implementing both of these algorithms, allowing for simulation of the Trotterized Hubbard Hamiltonian in O (1 ) time per unitary step. We finally demonstrate magic state distillation in our fermionic architecture, giving a universal set of topologically protected fermionic quantum gates.
NASA Astrophysics Data System (ADS)
Nabavi, N.
2018-07-01
The author investigates the monitoring methods for fine adjustment of the previously proposed on-chip architecture for frequency multiplication and translation of harmonics by design. Digital signal processing (DSP) algorithms are utilized to create an optimized microwave photonic integrated circuit functionality toward automated frequency multiplication. The implemented DSP algorithms are formed on discrete Fourier transform and optimization-based algorithms (Greedy and gradient-based algorithms), which are analytically derived and numerically compared based on the accuracy and speed of convergence criteria.
Design of an FMCW radar baseband signal processing system for automotive application.
Lin, Jau-Jr; Li, Yuan-Ping; Hsu, Wei-Chiang; Lee, Ta-Sung
2016-01-01
For a typical FMCW automotive radar system, a new design of baseband signal processing architecture and algorithms is proposed to overcome the ghost targets and overlapping problems in the multi-target detection scenario. To satisfy the short measurement time constraint without increasing the RF front-end loading, a three-segment waveform with different slopes is utilized. By introducing a new pairing mechanism and a spatial filter design algorithm, the proposed detection architecture not only provides high accuracy and reliability, but also requires low pairing time and computational loading. This proposed baseband signal processing architecture and algorithms balance the performance and complexity, and are suitable to be implemented in a real automotive radar system. Field measurement results demonstrate that the proposed automotive radar signal processing system can perform well in a realistic application scenario.
NASA Astrophysics Data System (ADS)
Hadia, Sarman K.; Thakker, R. A.; Bhatt, Kirit R.
2016-05-01
The study proposes an application of evolutionary algorithms, specifically an artificial bee colony (ABC), variant ABC and particle swarm optimisation (PSO), to extract the parameters of metal oxide semiconductor field effect transistor (MOSFET) model. These algorithms are applied for the MOSFET parameter extraction problem using a Pennsylvania surface potential model. MOSFET parameter extraction procedures involve reducing the error between measured and modelled data. This study shows that ABC algorithm optimises the parameter values based on intelligent activities of honey bee swarms. Some modifications have also been applied to the basic ABC algorithm. Particle swarm optimisation is a population-based stochastic optimisation method that is based on bird flocking activities. The performances of these algorithms are compared with respect to the quality of the solutions. The simulation results of this study show that the PSO algorithm performs better than the variant ABC and basic ABC algorithm for the parameter extraction of the MOSFET model; also the implementation of the ABC algorithm is shown to be simpler than that of the PSO algorithm.
NASA Astrophysics Data System (ADS)
Lisitsa, Y. V.; Yatskou, M. M.; Apanasovich, V. V.; Apanasovich, T. V.
2015-09-01
We have developed an algorithm for segmentation of cancer cell nuclei in three-channel luminescent images of microbiological specimens. The algorithm is based on using a correlation between fluorescence signals in the detection channels for object segmentation, which permits complete automation of the data analysis procedure. We have carried out a comparative analysis of the proposed method and conventional algorithms implemented in the CellProfiler and ImageJ software packages. Our algorithm has an object localization uncertainty which is 2-3 times smaller than for the conventional algorithms, with comparable segmentation accuracy.
Delakis, Ioannis; Hammad, Omer; Kitney, Richard I
2007-07-07
Wavelet-based de-noising has been shown to improve image signal-to-noise ratio in magnetic resonance imaging (MRI) while maintaining spatial resolution. Wavelet-based de-noising techniques typically implemented in MRI require that noise displays uniform spatial distribution. However, images acquired with parallel MRI have spatially varying noise levels. In this work, a new algorithm for filtering images with parallel MRI is presented. The proposed algorithm extracts the edges from the original image and then generates a noise map from the wavelet coefficients at finer scales. The noise map is zeroed at locations where edges have been detected and directional analysis is also used to calculate noise in regions of low-contrast edges that may not have been detected. The new methodology was applied on phantom and brain images and compared with other applicable de-noising techniques. The performance of the proposed algorithm was shown to be comparable with other techniques in central areas of the images, where noise levels are high. In addition, finer details and edges were maintained in peripheral areas, where noise levels are low. The proposed methodology is fully automated and can be applied on final reconstructed images without requiring sensitivity profiles or noise matrices of the receiver coils, therefore making it suitable for implementation in a clinical MRI setting.
A novel gene network inference algorithm using predictive minimum description length approach.
Chaitankar, Vijender; Ghosh, Preetam; Perkins, Edward J; Gong, Ping; Deng, Youping; Zhang, Chaoyang
2010-05-28
Reverse engineering of gene regulatory networks using information theory models has received much attention due to its simplicity, low computational cost, and capability of inferring large networks. One of the major problems with information theory models is to determine the threshold which defines the regulatory relationships between genes. The minimum description length (MDL) principle has been implemented to overcome this problem. The description length of the MDL principle is the sum of model length and data encoding length. A user-specified fine tuning parameter is used as control mechanism between model and data encoding, but it is difficult to find the optimal parameter. In this work, we proposed a new inference algorithm which incorporated mutual information (MI), conditional mutual information (CMI) and predictive minimum description length (PMDL) principle to infer gene regulatory networks from DNA microarray data. In this algorithm, the information theoretic quantities MI and CMI determine the regulatory relationships between genes and the PMDL principle method attempts to determine the best MI threshold without the need of a user-specified fine tuning parameter. The performance of the proposed algorithm was evaluated using both synthetic time series data sets and a biological time series data set for the yeast Saccharomyces cerevisiae. The benchmark quantities precision and recall were used as performance measures. The results show that the proposed algorithm produced less false edges and significantly improved the precision, as compared to the existing algorithm. For further analysis the performance of the algorithms was observed over different sizes of data. We have proposed a new algorithm that implements the PMDL principle for inferring gene regulatory networks from time series DNA microarray data that eliminates the need of a fine tuning parameter. The evaluation results obtained from both synthetic and actual biological data sets show that the PMDL principle is effective in determining the MI threshold and the developed algorithm improves precision of gene regulatory network inference. Based on the sensitivity analysis of all tested cases, an optimal CMI threshold value has been identified. Finally it was observed that the performance of the algorithms saturates at a certain threshold of data size.
Gaussian diffusion sinogram inpainting for X-ray CT metal artifact reduction.
Peng, Chengtao; Qiu, Bensheng; Li, Ming; Guan, Yihui; Zhang, Cheng; Wu, Zhongyi; Zheng, Jian
2017-01-05
Metal objects implanted in the bodies of patients usually generate severe streaking artifacts in reconstructed images of X-ray computed tomography, which degrade the image quality and affect the diagnosis of disease. Therefore, it is essential to reduce these artifacts to meet the clinical demands. In this work, we propose a Gaussian diffusion sinogram inpainting metal artifact reduction algorithm based on prior images to reduce these artifacts for fan-beam computed tomography reconstruction. In this algorithm, prior information that originated from a tissue-classified prior image is used for the inpainting of metal-corrupted projections, and it is incorporated into a Gaussian diffusion function. The prior knowledge is particularly designed to locate the diffusion position and improve the sparsity of the subtraction sinogram, which is obtained by subtracting the prior sinogram of the metal regions from the original sinogram. The sinogram inpainting algorithm is implemented through an approach of diffusing prior energy and is then solved by gradient descent. The performance of the proposed metal artifact reduction algorithm is compared with two conventional metal artifact reduction algorithms, namely the interpolation metal artifact reduction algorithm and normalized metal artifact reduction algorithm. The experimental datasets used included both simulated and clinical datasets. By evaluating the results subjectively, the proposed metal artifact reduction algorithm causes fewer secondary artifacts than the two conventional metal artifact reduction algorithms, which lead to severe secondary artifacts resulting from impertinent interpolation and normalization. Additionally, the objective evaluation shows the proposed approach has the smallest normalized mean absolute deviation and the highest signal-to-noise ratio, indicating that the proposed method has produced the image with the best quality. No matter for the simulated datasets or the clinical datasets, the proposed algorithm has reduced the metal artifacts apparently.
Smitha, K G; Vinod, A P
2015-11-01
Children with autism spectrum disorder have difficulty in understanding the emotional and mental states from the facial expressions of the people they interact. The inability to understand other people's emotions will hinder their interpersonal communication. Though many facial emotion recognition algorithms have been proposed in the literature, they are mainly intended for processing by a personal computer, which limits their usability in on-the-move applications where portability is desired. The portability of the system will ensure ease of use and real-time emotion recognition and that will aid for immediate feedback while communicating with caretakers. Principal component analysis (PCA) has been identified as the least complex feature extraction algorithm to be implemented in hardware. In this paper, we present a detailed study of the implementation of serial and parallel implementation of PCA in order to identify the most feasible method for realization of a portable emotion detector for autistic children. The proposed emotion recognizer architectures are implemented on Virtex 7 XC7VX330T FFG1761-3 FPGA. We achieved 82.3% detection accuracy for a word length of 8 bits.
Visual Tracking via Sparse and Local Linear Coding.
Wang, Guofeng; Qin, Xueying; Zhong, Fan; Liu, Yue; Li, Hongbo; Peng, Qunsheng; Yang, Ming-Hsuan
2015-11-01
The state search is an important component of any object tracking algorithm. Numerous algorithms have been proposed, but stochastic sampling methods (e.g., particle filters) are arguably one of the most effective approaches. However, the discretization of the state space complicates the search for the precise object location. In this paper, we propose a novel tracking algorithm that extends the state space of particle observations from discrete to continuous. The solution is determined accurately via iterative linear coding between two convex hulls. The algorithm is modeled by an optimal function, which can be efficiently solved by either convex sparse coding or locality constrained linear coding. The algorithm is also very flexible and can be combined with many generic object representations. Thus, we first use sparse representation to achieve an efficient searching mechanism of the algorithm and demonstrate its accuracy. Next, two other object representation models, i.e., least soft-threshold squares and adaptive structural local sparse appearance, are implemented with improved accuracy to demonstrate the flexibility of our algorithm. Qualitative and quantitative experimental results demonstrate that the proposed tracking algorithm performs favorably against the state-of-the-art methods in dynamic scenes.
NASA Astrophysics Data System (ADS)
Chen, Ming-Chih; Hsiao, Shen-Fu
In this paper, we propose an area-efficient design of Advanced Encryption Standard (AES) processor by applying a new common-expression-elimination (CSE) method to the sub-functions of various transformations required in AES. The proposed method reduces the area cost of realizing the sub-functions by extracting the common factors in the bit-level XOR/AND-based sum-of-product expressions of these sub-functions using a new CSE algorithm. Cell-based implementation results show that the AES processor with our proposed CSE method has significant area improvement compared with previous designs.
Robust tuning of robot control systems
NASA Technical Reports Server (NTRS)
Minis, I.; Uebel, M.
1992-01-01
The computed torque control problem is examined for a robot arm with flexible, geared, joint drive systems which are typical in many industrial robots. The standard computed torque algorithm is not directly applicable to this class of manipulators because of the dynamics introduced by the joint drive system. The proposed approach to computed torque control combines a computed torque algorithm with torque controller at each joint. Three such control schemes are proposed. The first scheme uses the joint torque control system currently implemented on the robot arm and a novel form of the computed torque algorithm. The other two use the standard computed torque algorithm and a novel model following torque control system based on model following techniques. Standard tasks and performance indices are used to evaluate the performance of the controllers. Both numerical simulations and experiments are used in evaluation. The study shows that all three proposed systems lead to improved tracking performance over a conventional PD controller.
NASA Astrophysics Data System (ADS)
Liu, Jianming; Grant, Steven L.; Benesty, Jacob
2015-12-01
A new reweighted proportionate affine projection algorithm (RPAPA) with memory and row action projection (MRAP) is proposed in this paper. The reweighted PAPA is derived from a family of sparseness measures, which demonstrate performance similar to mu-law and the l 0 norm PAPA but with lower computational complexity. The sparseness of the channel is taken into account to improve the performance for dispersive system identification. Meanwhile, the memory of the filter's coefficients is combined with row action projections (RAP) to significantly reduce computational complexity. Simulation results demonstrate that the proposed RPAPA MRAP algorithm outperforms both the affine projection algorithm (APA) and PAPA, and has performance similar to l 0 PAPA and mu-law PAPA, in terms of convergence speed and tracking ability. Meanwhile, the proposed RPAPA MRAP has much lower computational complexity than PAPA, mu-law PAPA, and l 0 PAPA, etc., which makes it very appealing for real-time implementation.
Wang, Fei-Yue; Jin, Ning; Liu, Derong; Wei, Qinglai
2011-01-01
In this paper, we study the finite-horizon optimal control problem for discrete-time nonlinear systems using the adaptive dynamic programming (ADP) approach. The idea is to use an iterative ADP algorithm to obtain the optimal control law which makes the performance index function close to the greatest lower bound of all performance indices within an ε-error bound. The optimal number of control steps can also be obtained by the proposed ADP algorithms. A convergence analysis of the proposed ADP algorithms in terms of performance index function and control policy is made. In order to facilitate the implementation of the iterative ADP algorithms, neural networks are used for approximating the performance index function, computing the optimal control policy, and modeling the nonlinear system. Finally, two simulation examples are employed to illustrate the applicability of the proposed method.
NASA Astrophysics Data System (ADS)
Masoumi, Massoud; Raissi, Farshid; Ahmadian, Mahmoud; Keshavarzi, Parviz
2006-01-01
We are proposing that the recently proposed semiconductor-nanowire-molecular architecture (CMOL) is an optimum platform to realize encryption algorithms. The basic modules for the advanced encryption standard algorithm (Rijndael) have been designed using CMOL architecture. The performance of this design has been evaluated with respect to chip area and speed. It is observed that CMOL provides considerable improvement over implementation with regular CMOS architecture even with a 20% defect rate. Pseudo-optimum gate placement and routing are provided for Rijndael building blocks and the possibility of designing high speed, attack tolerant and long key encryptions are discussed.
NASA Astrophysics Data System (ADS)
Lohvithee, Manasavee; Biguri, Ander; Soleimani, Manuchehr
2017-12-01
There are a number of powerful total variation (TV) regularization methods that have great promise in limited data cone-beam CT reconstruction with an enhancement of image quality. These promising TV methods require careful selection of the image reconstruction parameters, for which there are no well-established criteria. This paper presents a comprehensive evaluation of parameter selection in a number of major TV-based reconstruction algorithms. An appropriate way of selecting the values for each individual parameter has been suggested. Finally, a new adaptive-weighted projection-controlled steepest descent (AwPCSD) algorithm is presented, which implements the edge-preserving function for CBCT reconstruction with limited data. The proposed algorithm shows significant robustness compared to three other existing algorithms: ASD-POCS, AwASD-POCS and PCSD. The proposed AwPCSD algorithm is able to preserve the edges of the reconstructed images better with fewer sensitive parameters to tune.
CPU-GPU hybrid accelerating the Zuker algorithm for RNA secondary structure prediction applications.
Lei, Guoqing; Dou, Yong; Wan, Wen; Xia, Fei; Li, Rongchun; Ma, Meng; Zou, Dan
2012-01-01
Prediction of ribonucleic acid (RNA) secondary structure remains one of the most important research areas in bioinformatics. The Zuker algorithm is one of the most popular methods of free energy minimization for RNA secondary structure prediction. Thus far, few studies have been reported on the acceleration of the Zuker algorithm on general-purpose processors or on extra accelerators such as Field Programmable Gate-Array (FPGA) and Graphics Processing Units (GPU). To the best of our knowledge, no implementation combines both CPU and extra accelerators, such as GPUs, to accelerate the Zuker algorithm applications. In this paper, a CPU-GPU hybrid computing system that accelerates Zuker algorithm applications for RNA secondary structure prediction is proposed. The computing tasks are allocated between CPU and GPU for parallel cooperate execution. Performance differences between the CPU and the GPU in the task-allocation scheme are considered to obtain workload balance. To improve the hybrid system performance, the Zuker algorithm is optimally implemented with special methods for CPU and GPU architecture. Speedup of 15.93× over optimized multi-core SIMD CPU implementation and performance advantage of 16% over optimized GPU implementation are shown in the experimental results. More than 14% of the sequences are executed on CPU in the hybrid system. The system combining CPU and GPU to accelerate the Zuker algorithm is proven to be promising and can be applied to other bioinformatics applications.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nakhleh, Luay
I proposed to develop computationally efficient tools for accurate detection and reconstruction of microbes' complex evolutionary mechanisms, thus enabling rapid and accurate annotation, analysis and understanding of their genomes. To achieve this goal, I proposed to address three aspects. (1) Mathematical modeling. A major challenge facing the accurate detection of HGT is that of distinguishing between these two events on the one hand and other events that have similar "effects." I proposed to develop a novel mathematical approach for distinguishing among these events. Further, I proposed to develop a set of novel optimization criteria for the evolutionary analysis of microbialmore » genomes in the presence of these complex evolutionary events. (2) Algorithm design. In this aspect of the project, I proposed to develop an array of e cient and accurate algorithms for analyzing microbial genomes based on the formulated optimization criteria. Further, I proposed to test the viability of the criteria and the accuracy of the algorithms in an experimental setting using both synthetic as well as biological data. (3) Software development. I proposed the nal outcome to be a suite of software tools which implements the mathematical models as well as the algorithms developed.« less
Efficient Record Linkage Algorithms Using Complete Linkage Clustering.
Mamun, Abdullah-Al; Aseltine, Robert; Rajasekaran, Sanguthevar
2016-01-01
Data from different agencies share data of the same individuals. Linking these datasets to identify all the records belonging to the same individuals is a crucial and challenging problem, especially given the large volumes of data. A large number of available algorithms for record linkage are prone to either time inefficiency or low-accuracy in finding matches and non-matches among the records. In this paper we propose efficient as well as reliable sequential and parallel algorithms for the record linkage problem employing hierarchical clustering methods. We employ complete linkage hierarchical clustering algorithms to address this problem. In addition to hierarchical clustering, we also use two other techniques: elimination of duplicate records and blocking. Our algorithms use sorting as a sub-routine to identify identical copies of records. We have tested our algorithms on datasets with millions of synthetic records. Experimental results show that our algorithms achieve nearly 100% accuracy. Parallel implementations achieve almost linear speedups. Time complexities of these algorithms do not exceed those of previous best-known algorithms. Our proposed algorithms outperform previous best-known algorithms in terms of accuracy consuming reasonable run times.
Efficient Record Linkage Algorithms Using Complete Linkage Clustering
Mamun, Abdullah-Al; Aseltine, Robert; Rajasekaran, Sanguthevar
2016-01-01
Data from different agencies share data of the same individuals. Linking these datasets to identify all the records belonging to the same individuals is a crucial and challenging problem, especially given the large volumes of data. A large number of available algorithms for record linkage are prone to either time inefficiency or low-accuracy in finding matches and non-matches among the records. In this paper we propose efficient as well as reliable sequential and parallel algorithms for the record linkage problem employing hierarchical clustering methods. We employ complete linkage hierarchical clustering algorithms to address this problem. In addition to hierarchical clustering, we also use two other techniques: elimination of duplicate records and blocking. Our algorithms use sorting as a sub-routine to identify identical copies of records. We have tested our algorithms on datasets with millions of synthetic records. Experimental results show that our algorithms achieve nearly 100% accuracy. Parallel implementations achieve almost linear speedups. Time complexities of these algorithms do not exceed those of previous best-known algorithms. Our proposed algorithms outperform previous best-known algorithms in terms of accuracy consuming reasonable run times. PMID:27124604
SURGNET: An Integrated Surgical Data Transmission System for Telesurgery.
Natarajan, Sriram; Ganz, Aura
2009-01-01
Remote surgery information requires quick and reliable transmission between the surgeon and the patient site. However, the networks that interconnect the surgeon and patient sites are usually time varying and lossy which can cause packet loss and delay jitter. In this paper we propose SURGNET, a telesurgery system for which we developed the architecture, algorithms and implemented it on a testbed. The algorithms include adaptive packet prediction and buffer time adjustment techniques which reduce the negative effects caused by the lossy and time varying networks. To evaluate the proposed SURGNET system, at the therapist site, we implemented a therapist panel which controls the force feedback device movements and provides image analysis functionality. At the patient site we controlled a virtual reality applet built in Matlab. The varying network conditions were emulated using NISTNet emulator. Our results show that even for severe packet loss and variable delay jitter, the proposed integrated synchronization techniques significantly improve SURGNET performance.
Fault-tolerant composite Householder reflection
NASA Astrophysics Data System (ADS)
Torosov, Boyan T.; Kyoseva, Elica; Vitanov, Nikolay V.
2015-07-01
We propose a fault-tolerant implementation of the quantum Householder reflection, which is a key operation in various quantum algorithms, quantum-state engineering, generation of arbitrary unitaries, and entanglement characterization. We construct this operation using the modular approach of composite pulses and a relation between the Householder reflection and the quantum phase gate. The proposed implementation is highly insensitive to variations in the experimental parameters, which makes it suitable for high-fidelity quantum information processing.
Hybrid employment recommendation algorithm based on Spark
NASA Astrophysics Data System (ADS)
Li, Zuoquan; Lin, Yubei; Zhang, Xingming
2017-08-01
Aiming at the real-time application of collaborative filtering employment recommendation algorithm (CF), a clustering collaborative filtering recommendation algorithm (CCF) is developed, which applies hierarchical clustering to CF and narrows the query range of neighbour items. In addition, to solve the cold-start problem of content-based recommendation algorithm (CB), a content-based algorithm with users’ information (CBUI) is introduced for job recommendation. Furthermore, a hybrid recommendation algorithm (HRA) which combines CCF and CBUI algorithms is proposed, and implemented on Spark platform. The experimental results show that HRA can overcome the problems of cold start and data sparsity, and achieve good recommendation accuracy and scalability for employment recommendation.
Design of Belief Propagation Based on FPGA for the Multistereo CAFADIS Camera
Magdaleno, Eduardo; Lüke, Jonás Philipp; Rodríguez, Manuel; Rodríguez-Ramos, José Manuel
2010-01-01
In this paper we describe a fast, specialized hardware implementation of the belief propagation algorithm for the CAFADIS camera, a new plenoptic sensor patented by the University of La Laguna. This camera captures the lightfield of the scene and can be used to find out at which depth each pixel is in focus. The algorithm has been designed for FPGA devices using VHDL. We propose a parallel and pipeline architecture to implement the algorithm without external memory. Although the BRAM resources of the device increase considerably, we can maintain real-time restrictions by using extremely high-performance signal processing capability through parallelism and by accessing several memories simultaneously. The quantifying results with 16 bit precision have shown that performances are really close to the original Matlab programmed algorithm. PMID:22163404
Design of belief propagation based on FPGA for the multistereo CAFADIS camera.
Magdaleno, Eduardo; Lüke, Jonás Philipp; Rodríguez, Manuel; Rodríguez-Ramos, José Manuel
2010-01-01
In this paper we describe a fast, specialized hardware implementation of the belief propagation algorithm for the CAFADIS camera, a new plenoptic sensor patented by the University of La Laguna. This camera captures the lightfield of the scene and can be used to find out at which depth each pixel is in focus. The algorithm has been designed for FPGA devices using VHDL. We propose a parallel and pipeline architecture to implement the algorithm without external memory. Although the BRAM resources of the device increase considerably, we can maintain real-time restrictions by using extremely high-performance signal processing capability through parallelism and by accessing several memories simultaneously. The quantifying results with 16 bit precision have shown that performances are really close to the original Matlab programmed algorithm.
A Hybrid Cellular Genetic Algorithm for Multi-objective Crew Scheduling Problem
NASA Astrophysics Data System (ADS)
Jolai, Fariborz; Assadipour, Ghazal
Crew scheduling is one of the important problems of the airline industry. This problem aims to cover a number of flights by crew members, such that all the flights are covered. In a robust scheduling the assignment should be so that the total cost, delays, and unbalanced utilization are minimized. As the problem is NP-hard and the objectives are in conflict with each other, a multi-objective meta-heuristic called CellDE, which is a hybrid cellular genetic algorithm, is implemented as the optimization method. The proposed algorithm provides the decision maker with a set of non-dominated or Pareto-optimal solutions, and enables them to choose the best one according to their preferences. A set of problems of different sizes is generated and solved using the proposed algorithm. Evaluating the performance of the proposed algorithm, three metrics are suggested, and the diversity and the convergence of the achieved Pareto front are appraised. Finally a comparison is made between CellDE and PAES, another meta-heuristic algorithm. The results show the superiority of CellDE.
NASA Astrophysics Data System (ADS)
Zhang, Lijuan; Li, Yang; Wang, Junnan; Liu, Ying
2018-03-01
In this paper, we propose a point spread function (PSF) reconstruction method and joint maximum a posteriori (JMAP) estimation method for the adaptive optics image restoration. Using the JMAP method as the basic principle, we establish the joint log likelihood function of multi-frame adaptive optics (AO) images based on the image Gaussian noise models. To begin with, combining the observed conditions and AO system characteristics, a predicted PSF model for the wavefront phase effect is developed; then, we build up iterative solution formulas of the AO image based on our proposed algorithm, addressing the implementation process of multi-frame AO images joint deconvolution method. We conduct a series of experiments on simulated and real degraded AO images to evaluate our proposed algorithm. Compared with the Wiener iterative blind deconvolution (Wiener-IBD) algorithm and Richardson-Lucy IBD algorithm, our algorithm has better restoration effects including higher peak signal-to-noise ratio ( PSNR) and Laplacian sum ( LS) value than the others. The research results have a certain application values for actual AO image restoration.
Xia, Youshen; Kamel, Mohamed S
2007-06-01
Identification of a general nonlinear noisy system viewed as an estimation of a predictor function is studied in this article. A measurement fusion method for the predictor function estimate is proposed. In the proposed scheme, observed data are first fused by using an optimal fusion technique, and then the optimal fused data are incorporated in a nonlinear function estimator based on a robust least squares support vector machine (LS-SVM). A cooperative learning algorithm is proposed to implement the proposed measurement fusion method. Compared with related identification methods, the proposed method can minimize both the approximation error and the noise error. The performance analysis shows that the proposed optimal measurement fusion function estimate has a smaller mean square error than the LS-SVM function estimate. Moreover, the proposed cooperative learning algorithm can converge globally to the optimal measurement fusion function estimate. Finally, the proposed measurement fusion method is applied to ARMA signal and spatial temporal signal modeling. Experimental results show that the proposed measurement fusion method can provide a more accurate model.
Parallel computing of physical maps--a comparative study in SIMD and MIMD parallelism.
Bhandarkar, S M; Chirravuri, S; Arnold, J
1996-01-01
Ordering clones from a genomic library into physical maps of whole chromosomes presents a central computational problem in genetics. Chromosome reconstruction via clone ordering is usually isomorphic to the NP-complete Optimal Linear Arrangement problem. Parallel SIMD and MIMD algorithms for simulated annealing based on Markov chain distribution are proposed and applied to the problem of chromosome reconstruction via clone ordering. Perturbation methods and problem-specific annealing heuristics are proposed and described. The SIMD algorithms are implemented on a 2048 processor MasPar MP-2 system which is an SIMD 2-D toroidal mesh architecture whereas the MIMD algorithms are implemented on an 8 processor Intel iPSC/860 which is an MIMD hypercube architecture. A comparative analysis of the various SIMD and MIMD algorithms is presented in which the convergence, speedup, and scalability characteristics of the various algorithms are analyzed and discussed. On a fine-grained, massively parallel SIMD architecture with a low synchronization overhead such as the MasPar MP-2, a parallel simulated annealing algorithm based on multiple periodically interacting searches performs the best. For a coarse-grained MIMD architecture with high synchronization overhead such as the Intel iPSC/860, a parallel simulated annealing algorithm based on multiple independent searches yields the best results. In either case, distribution of clonal data across multiple processors is shown to exacerbate the tendency of the parallel simulated annealing algorithm to get trapped in a local optimum.
An adaptive SVSF-SLAM algorithm to improve the success and solving the UGVs cooperation problem
NASA Astrophysics Data System (ADS)
Demim, Fethi; Nemra, Abdelkrim; Louadj, Kahina; Hamerlain, Mustapha; Bazoula, Abdelouahab
2018-05-01
This paper aims to present a Decentralised Cooperative Simultaneous Localization and Mapping (DCSLAM) solution based on 2D laser data using an Adaptive Covariance Intersection (ACI). The ACI-DCSLAM algorithm will be validated on a swarm of Unmanned Ground Vehicles (UGVs) receiving features to estimate the position and covariance of shared features before adding them to the global map. With the proposed solution, a group of (UGVs) will be able to construct a large reliable map and localise themselves within this map without any user intervention. The most popular solutions to this problem are the EKF-SLAM, Nonlinear H-infinity ? SLAM and the FAST-SLAM. The former suffers from two important problems which are the poor consistency caused by the linearization problem and the calculation of Jacobian. The second solution is the ? which is a very promising filter because it doesn't make any assumption about noise characteristics, while the latter is not suitable for real time implementation. Therefore, a new alternative solution based on the smooth variable structure filter (SVSF) is adopted. Cooperative adaptive SVSF-SLAM algorithm is proposed in this paper to solve the UGVs SLAM problem. Our main contribution consists in adapting the SVSF filter to solve the Decentralised Cooperative SLAM problem for multiple UGVs. The algorithms developed in this paper were implemented using two mobile robots Pioneer ?, equiped with 2D laser telemetry sensors. Good results are obtained by the Cooperative adaptive SVSF-SLAM algorithm compared to the Cooperative EKF/?-SLAM algorithms, especially when the noise is colored or affected by a variable bias. Simulation results confirm and show the efficiency of the proposed algorithm which is more robust, stable and adapted to real time applications.
Efficient Hardware Implementation of the Lightweight Block Encryption Algorithm LEA
Lee, Donggeon; Kim, Dong-Chan; Kwon, Daesung; Kim, Howon
2014-01-01
Recently, due to the advent of resource-constrained trends, such as smartphones and smart devices, the computing environment is changing. Because our daily life is deeply intertwined with ubiquitous networks, the importance of security is growing. A lightweight encryption algorithm is essential for secure communication between these kinds of resource-constrained devices, and many researchers have been investigating this field. Recently, a lightweight block cipher called LEA was proposed. LEA was originally targeted for efficient implementation on microprocessors, as it is fast when implemented in software and furthermore, it has a small memory footprint. To reflect on recent technology, all required calculations utilize 32-bit wide operations. In addition, the algorithm is comprised of not complex S-Box-like structures but simple Addition, Rotation, and XOR operations. To the best of our knowledge, this paper is the first report on a comprehensive hardware implementation of LEA. We present various hardware structures and their implementation results according to key sizes. Even though LEA was originally targeted at software efficiency, it also shows high efficiency when implemented as hardware. PMID:24406859
Field-programmable analogue arrays for the sensorless control of DC motors
NASA Astrophysics Data System (ADS)
Rivera, J.; Dueñas, I.; Ortega, S.; Del Valle, J. L.
2018-02-01
This work presents the analogue implementation of a sensorless controller for direct current motors based on the super-twisting (ST) sliding mode technique, by means of field programmable analogue arrays (FPAA). The novelty of this work is twofold, first is the use of the ST algorithm in a sensorless scheme for DC motors, and the implementation method of this type of sliding mode controllers in FPAAs. The ST algorithm reduces the chattering problem produced with the deliberate use of the sign function in classical sliding mode approaches. On the other hand, the advantages of the implementation method over a digital one are that the controller is not digitally approximated, the controller gains are not fine tuned and the implementation does not require the use of analogue-to-digital and digital-to-analogue converter circuits. In addition to this, the FPAA is a reconfigurable, lower cost and power consumption technology. Simulation and experimentation results were registered, where a more accurate transient response and lower power consumption were obtained by the proposed implementation method when compared to a digital implementation. Also, a more accurate performance by the DC motor is obtained with proposed sensorless ST technique when compared with a classical sliding mode approach.
Algorithm of Taxonomy: Method of Design and Implementation Mechanism
NASA Astrophysics Data System (ADS)
Shalanov, N. V.; Aletdinova, A. A.
2018-05-01
The authors propose that the method of design of the algorithm of taxonomy should be based on the calculation of integral indicators for the estimation of the level of an object according to the set of initial indicators (i. e. potential). Their values will be the values of the projected lengths of the objects on the numeric axis, which will take values [0.100]. This approach will reduce the task of multidimensional classification to the task of one-dimensional classification. The algorithm for solving the task of taxonomy contains 14 stages; the example of its implementation is illustrated by the data of 46 consumer societies of the Yakut Union of Consumer Societies of Russia.
An algorithm of discovering signatures from DNA databases on a computer cluster.
Lee, Hsiao Ping; Sheu, Tzu-Fang
2014-10-05
Signatures are short sequences that are unique and not similar to any other sequence in a database that can be used as the basis to identify different species. Even though several signature discovery algorithms have been proposed in the past, these algorithms require the entirety of databases to be loaded in the memory, thus restricting the amount of data that they can process. It makes those algorithms unable to process databases with large amounts of data. Also, those algorithms use sequential models and have slower discovery speeds, meaning that the efficiency can be improved. In this research, we are debuting the utilization of a divide-and-conquer strategy in signature discovery and have proposed a parallel signature discovery algorithm on a computer cluster. The algorithm applies the divide-and-conquer strategy to solve the problem posed to the existing algorithms where they are unable to process large databases and uses a parallel computing mechanism to effectively improve the efficiency of signature discovery. Even when run with just the memory of regular personal computers, the algorithm can still process large databases such as the human whole-genome EST database which were previously unable to be processed by the existing algorithms. The algorithm proposed in this research is not limited by the amount of usable memory and can rapidly find signatures in large databases, making it useful in applications such as Next Generation Sequencing and other large database analysis and processing. The implementation of the proposed algorithm is available at http://www.cs.pu.edu.tw/~fang/DDCSDPrograms/DDCSD.htm.
Fractional-order TV-L2 model for image denoising
NASA Astrophysics Data System (ADS)
Chen, Dali; Sun, Shenshen; Zhang, Congrong; Chen, YangQuan; Xue, Dingyu
2013-10-01
This paper proposes a new fractional order total variation (TV) denoising method, which provides a much more elegant and effective way of treating problems of the algorithm implementation, ill-posed inverse, regularization parameter selection and blocky effect. Two fractional order TV-L2 models are constructed for image denoising. The majorization-minimization (MM) algorithm is used to decompose these two complex fractional TV optimization problems into a set of linear optimization problems which can be solved by the conjugate gradient algorithm. The final adaptive numerical procedure is given. Finally, we report experimental results which show that the proposed methodology avoids the blocky effect and achieves state-of-the-art performance. In addition, two medical image processing experiments are presented to demonstrate the validity of the proposed methodology.
Fu, Xingang; Li, Shuhui; Fairbank, Michael; Wunsch, Donald C; Alonso, Eduardo
2015-09-01
This paper investigates how to train a recurrent neural network (RNN) using the Levenberg-Marquardt (LM) algorithm as well as how to implement optimal control of a grid-connected converter (GCC) using an RNN. To successfully and efficiently train an RNN using the LM algorithm, a new forward accumulation through time (FATT) algorithm is proposed to calculate the Jacobian matrix required by the LM algorithm. This paper explores how to incorporate FATT into the LM algorithm. The results show that the combination of the LM and FATT algorithms trains RNNs better than the conventional backpropagation through time algorithm. This paper presents an analytical study on the optimal control of GCCs, including theoretically ideal optimal and suboptimal controllers. To overcome the inapplicability of the optimal GCC controller under practical conditions, a new RNN controller with an improved input structure is proposed to approximate the ideal optimal controller. The performance of an ideal optimal controller and a well-trained RNN controller was compared in close to real-life power converter switching environments, demonstrating that the proposed RNN controller can achieve close to ideal optimal control performance even under low sampling rate conditions. The excellent performance of the proposed RNN controller under challenging and distorted system conditions further indicates the feasibility of using an RNN to approximate optimal control in practical applications.
A biconjugate gradient type algorithm on massively parallel architectures
NASA Technical Reports Server (NTRS)
Freund, Roland W.; Hochbruck, Marlis
1991-01-01
The biconjugate gradient (BCG) method is the natural generalization of the classical conjugate gradient algorithm for Hermitian positive definite matrices to general non-Hermitian linear systems. Unfortunately, the original BCG algorithm is susceptible to possible breakdowns and numerical instabilities. Recently, Freund and Nachtigal have proposed a novel BCG type approach, the quasi-minimal residual method (QMR), which overcomes the problems of BCG. Here, an implementation is presented of QMR based on an s-step version of the nonsymmetric look-ahead Lanczos algorithm. The main feature of the s-step Lanczos algorithm is that, in general, all inner products, except for one, can be computed in parallel at the end of each block; this is unlike the other standard Lanczos process where inner products are generated sequentially. The resulting implementation of QMR is particularly attractive on massively parallel SIMD architectures, such as the Connection Machine.
Multiple objects tracking with HOGs matching in circular windows
NASA Astrophysics Data System (ADS)
Miramontes-Jaramillo, Daniel; Kober, Vitaly; Díaz-Ramírez, Víctor H.
2014-09-01
In recent years tracking applications with development of new technologies like smart TVs, Kinect, Google Glass and Oculus Rift become very important. When tracking uses a matching algorithm, a good prediction algorithm is required to reduce the search area for each object to be tracked as well as processing time. In this work, we analyze the performance of different tracking algorithms based on prediction and matching for a real-time tracking multiple objects. The used matching algorithm utilizes histograms of oriented gradients. It carries out matching in circular windows, and possesses rotation invariance and tolerance to viewpoint and scale changes. The proposed algorithm is implemented in a personal computer with GPU, and its performance is analyzed in terms of processing time in real scenarios. Such implementation takes advantage of current technologies and helps to process video sequences in real-time for tracking several objects at the same time.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xu, Qiaofeng; Sawatzky, Alex; Anastasio, Mark A., E-mail: anastasio@wustl.edu
Purpose: The development of iterative image reconstruction algorithms for cone-beam computed tomography (CBCT) remains an active and important research area. Even with hardware acceleration, the overwhelming majority of the available 3D iterative algorithms that implement nonsmooth regularizers remain computationally burdensome and have not been translated for routine use in time-sensitive applications such as image-guided radiation therapy (IGRT). In this work, two variants of the fast iterative shrinkage thresholding algorithm (FISTA) are proposed and investigated for accelerated iterative image reconstruction in CBCT. Methods: Algorithm acceleration was achieved by replacing the original gradient-descent step in the FISTAs by a subproblem that ismore » solved by use of the ordered subset simultaneous algebraic reconstruction technique (OS-SART). Due to the preconditioning matrix adopted in the OS-SART method, two new weighted proximal problems were introduced and corresponding fast gradient projection-type algorithms were developed for solving them. We also provided efficient numerical implementations of the proposed algorithms that exploit the massive data parallelism of multiple graphics processing units. Results: The improved rates of convergence of the proposed algorithms were quantified in computer-simulation studies and by use of clinical projection data corresponding to an IGRT study. The accelerated FISTAs were shown to possess dramatically improved convergence properties as compared to the standard FISTAs. For example, the number of iterations to achieve a specified reconstruction error could be reduced by an order of magnitude. Volumetric images reconstructed from clinical data were produced in under 4 min. Conclusions: The FISTA achieves a quadratic convergence rate and can therefore potentially reduce the number of iterations required to produce an image of a specified image quality as compared to first-order methods. We have proposed and investigated accelerated FISTAs for use with two nonsmooth penalty functions that will lead to further reductions in image reconstruction times while preserving image quality. Moreover, with the help of a mixed sparsity-regularization, better preservation of soft-tissue structures can be potentially obtained. The algorithms were systematically evaluated by use of computer-simulated and clinical data sets.« less
Xu, Qiaofeng; Yang, Deshan; Tan, Jun; Sawatzky, Alex; Anastasio, Mark A
2016-04-01
The development of iterative image reconstruction algorithms for cone-beam computed tomography (CBCT) remains an active and important research area. Even with hardware acceleration, the overwhelming majority of the available 3D iterative algorithms that implement nonsmooth regularizers remain computationally burdensome and have not been translated for routine use in time-sensitive applications such as image-guided radiation therapy (IGRT). In this work, two variants of the fast iterative shrinkage thresholding algorithm (FISTA) are proposed and investigated for accelerated iterative image reconstruction in CBCT. Algorithm acceleration was achieved by replacing the original gradient-descent step in the FISTAs by a subproblem that is solved by use of the ordered subset simultaneous algebraic reconstruction technique (OS-SART). Due to the preconditioning matrix adopted in the OS-SART method, two new weighted proximal problems were introduced and corresponding fast gradient projection-type algorithms were developed for solving them. We also provided efficient numerical implementations of the proposed algorithms that exploit the massive data parallelism of multiple graphics processing units. The improved rates of convergence of the proposed algorithms were quantified in computer-simulation studies and by use of clinical projection data corresponding to an IGRT study. The accelerated FISTAs were shown to possess dramatically improved convergence properties as compared to the standard FISTAs. For example, the number of iterations to achieve a specified reconstruction error could be reduced by an order of magnitude. Volumetric images reconstructed from clinical data were produced in under 4 min. The FISTA achieves a quadratic convergence rate and can therefore potentially reduce the number of iterations required to produce an image of a specified image quality as compared to first-order methods. We have proposed and investigated accelerated FISTAs for use with two nonsmooth penalty functions that will lead to further reductions in image reconstruction times while preserving image quality. Moreover, with the help of a mixed sparsity-regularization, better preservation of soft-tissue structures can be potentially obtained. The algorithms were systematically evaluated by use of computer-simulated and clinical data sets.
Xu, Qiaofeng; Yang, Deshan; Tan, Jun; Sawatzky, Alex; Anastasio, Mark A.
2016-01-01
Purpose: The development of iterative image reconstruction algorithms for cone-beam computed tomography (CBCT) remains an active and important research area. Even with hardware acceleration, the overwhelming majority of the available 3D iterative algorithms that implement nonsmooth regularizers remain computationally burdensome and have not been translated for routine use in time-sensitive applications such as image-guided radiation therapy (IGRT). In this work, two variants of the fast iterative shrinkage thresholding algorithm (FISTA) are proposed and investigated for accelerated iterative image reconstruction in CBCT. Methods: Algorithm acceleration was achieved by replacing the original gradient-descent step in the FISTAs by a subproblem that is solved by use of the ordered subset simultaneous algebraic reconstruction technique (OS-SART). Due to the preconditioning matrix adopted in the OS-SART method, two new weighted proximal problems were introduced and corresponding fast gradient projection-type algorithms were developed for solving them. We also provided efficient numerical implementations of the proposed algorithms that exploit the massive data parallelism of multiple graphics processing units. Results: The improved rates of convergence of the proposed algorithms were quantified in computer-simulation studies and by use of clinical projection data corresponding to an IGRT study. The accelerated FISTAs were shown to possess dramatically improved convergence properties as compared to the standard FISTAs. For example, the number of iterations to achieve a specified reconstruction error could be reduced by an order of magnitude. Volumetric images reconstructed from clinical data were produced in under 4 min. Conclusions: The FISTA achieves a quadratic convergence rate and can therefore potentially reduce the number of iterations required to produce an image of a specified image quality as compared to first-order methods. We have proposed and investigated accelerated FISTAs for use with two nonsmooth penalty functions that will lead to further reductions in image reconstruction times while preserving image quality. Moreover, with the help of a mixed sparsity-regularization, better preservation of soft-tissue structures can be potentially obtained. The algorithms were systematically evaluated by use of computer-simulated and clinical data sets. PMID:27036582
Quantum algorithm for support matrix machines
NASA Astrophysics Data System (ADS)
Duan, Bojia; Yuan, Jiabin; Liu, Ying; Li, Dan
2017-09-01
We propose a quantum algorithm for support matrix machines (SMMs) that efficiently addresses an image classification problem by introducing a least-squares reformulation. This algorithm consists of two core subroutines: a quantum matrix inversion (Harrow-Hassidim-Lloyd, HHL) algorithm and a quantum singular value thresholding (QSVT) algorithm. The two algorithms can be implemented on a universal quantum computer with complexity O[log(npq) ] and O[log(pq)], respectively, where n is the number of the training data and p q is the size of the feature space. By iterating the algorithms, we can find the parameters for the SMM classfication model. Our analysis shows that both HHL and QSVT algorithms achieve an exponential increase of speed over their classical counterparts.
Tan, Weng Chun; Mat Isa, Nor Ashidi
2016-01-01
In human sperm motility analysis, sperm segmentation plays an important role to determine the location of multiple sperms. To ensure an improved segmentation result, the Laplacian of Gaussian filter is implemented as a kernel in a pre-processing step before applying the image segmentation process to automatically segment and detect human spermatozoa. This study proposes an intersecting cortical model (ICM), which was derived from several visual cortex models, to segment the sperm head region. However, the proposed method suffered from parameter selection; thus, the ICM network is optimised using particle swarm optimization where feature mutual information is introduced as the new fitness function. The final results showed that the proposed method is more accurate and robust than four state-of-the-art segmentation methods. The proposed method resulted in rates of 98.14%, 98.82%, 86.46% and 99.81% in accuracy, sensitivity, specificity and precision, respectively, after testing with 1200 sperms. The proposed algorithm is expected to be implemented in analysing sperm motility because of the robustness and capability of this algorithm.
NASA Astrophysics Data System (ADS)
Zhang, Yuli; Han, Jun; Weng, Xinqian; He, Zhongzhu; Zeng, Xiaoyang
This paper presents an Application Specific Instruction-set Processor (ASIP) for the SHA-3 BLAKE algorithm family by instruction set extensions (ISE) from an RISC (reduced instruction set computer) processor. With a design space exploration for this ASIP to increase the performance and reduce the area cost, we accomplish an efficient hardware and software implementation of BLAKE algorithm. The special instructions and their well-matched hardware function unit improve the calculation of the key section of the algorithm, namely G-functions. Also, relaxing the time constraint of the special function unit can decrease its hardware cost, while keeping the high data throughput of the processor. Evaluation results reveal the ASIP achieves 335Mbps and 176Mbps for BLAKE-256 and BLAKE-512. The extra area cost is only 8.06k equivalent gates. The proposed ASIP outperforms several software approaches on various platforms in cycle per byte. In fact, both high throughput and low hardware cost achieved by this programmable processor are comparable to that of ASIC implementations.
NASA Astrophysics Data System (ADS)
Park, Nam In; Kim, Seon Man; Kim, Hong Kook; Kim, Ji Woon; Kim, Myeong Bo; Yun, Su Won
In this paper, we propose a video-zoom driven audio-zoom algorithm in order to provide audio zooming effects in accordance with the degree of video-zoom. The proposed algorithm is designed based on a super-directive beamformer operating with a 4-channel microphone system, in conjunction with a soft masking process that considers the phase differences between microphones. Thus, the audio-zoom processed signal is obtained by multiplying an audio gain derived from a video-zoom level by the masked signal. After all, a real-time audio-zoom system is implemented on an ARM-CORETEX-A8 having a clock speed of 600 MHz after different levels of optimization are performed such as algorithmic level, C-code, and memory optimizations. To evaluate the complexity of the proposed real-time audio-zoom system, test data whose length is 21.3 seconds long is sampled at 48 kHz. As a result, it is shown from the experiments that the processing time for the proposed audio-zoom system occupies 14.6% or less of the ARM clock cycles. It is also shown from the experimental results performed in a semi-anechoic chamber that the signal with the front direction can be amplified by approximately 10 dB compared to the other directions.
A Probabilistic Model of Social Working Memory for Information Retrieval in Social Interactions.
Li, Liyuan; Xu, Qianli; Gan, Tian; Tan, Cheston; Lim, Joo-Hwee
2018-05-01
Social working memory (SWM) plays an important role in navigating social interactions. Inspired by studies in psychology, neuroscience, cognitive science, and machine learning, we propose a probabilistic model of SWM to mimic human social intelligence for personal information retrieval (IR) in social interactions. First, we establish a semantic hierarchy as social long-term memory to encode personal information. Next, we propose a semantic Bayesian network as the SWM, which integrates the cognitive functions of accessibility and self-regulation. One subgraphical model implements the accessibility function to learn the social consensus about IR-based on social information concept, clustering, social context, and similarity between persons. Beyond accessibility, one more layer is added to simulate the function of self-regulation to perform the personal adaptation to the consensus based on human personality. Two learning algorithms are proposed to train the probabilistic SWM model on a raw dataset of high uncertainty and incompleteness. One is an efficient learning algorithm of Newton's method, and the other is a genetic algorithm. Systematic evaluations show that the proposed SWM model is able to learn human social intelligence effectively and outperforms the baseline Bayesian cognitive model. Toward real-world applications, we implement our model on Google Glass as a wearable assistant for social interaction.
Efficient parallel implementation of active appearance model fitting algorithm on GPU.
Wang, Jinwei; Ma, Xirong; Zhu, Yuanping; Sun, Jizhou
2014-01-01
The active appearance model (AAM) is one of the most powerful model-based object detecting and tracking methods which has been widely used in various situations. However, the high-dimensional texture representation causes very time-consuming computations, which makes the AAM difficult to apply to real-time systems. The emergence of modern graphics processing units (GPUs) that feature a many-core, fine-grained parallel architecture provides new and promising solutions to overcome the computational challenge. In this paper, we propose an efficient parallel implementation of the AAM fitting algorithm on GPUs. Our design idea is fine grain parallelism in which we distribute the texture data of the AAM, in pixels, to thousands of parallel GPU threads for processing, which makes the algorithm fit better into the GPU architecture. We implement our algorithm using the compute unified device architecture (CUDA) on the Nvidia's GTX 650 GPU, which has the latest Kepler architecture. To compare the performance of our algorithm with different data sizes, we built sixteen face AAM models of different dimensional textures. The experiment results show that our parallel AAM fitting algorithm can achieve real-time performance for videos even on very high-dimensional textures.
Efficient Parallel Implementation of Active Appearance Model Fitting Algorithm on GPU
Wang, Jinwei; Ma, Xirong; Zhu, Yuanping; Sun, Jizhou
2014-01-01
The active appearance model (AAM) is one of the most powerful model-based object detecting and tracking methods which has been widely used in various situations. However, the high-dimensional texture representation causes very time-consuming computations, which makes the AAM difficult to apply to real-time systems. The emergence of modern graphics processing units (GPUs) that feature a many-core, fine-grained parallel architecture provides new and promising solutions to overcome the computational challenge. In this paper, we propose an efficient parallel implementation of the AAM fitting algorithm on GPUs. Our design idea is fine grain parallelism in which we distribute the texture data of the AAM, in pixels, to thousands of parallel GPU threads for processing, which makes the algorithm fit better into the GPU architecture. We implement our algorithm using the compute unified device architecture (CUDA) on the Nvidia's GTX 650 GPU, which has the latest Kepler architecture. To compare the performance of our algorithm with different data sizes, we built sixteen face AAM models of different dimensional textures. The experiment results show that our parallel AAM fitting algorithm can achieve real-time performance for videos even on very high-dimensional textures. PMID:24723812
New fast DCT algorithms based on Loeffler's factorization
NASA Astrophysics Data System (ADS)
Hong, Yoon Mi; Kim, Il-Koo; Lee, Tammy; Cheon, Min-Su; Alshina, Elena; Han, Woo-Jin; Park, Jeong-Hoon
2012-10-01
This paper proposes a new 32-point fast discrete cosine transform (DCT) algorithm based on the Loeffler's 16-point transform. Fast integer realizations of 16-point and 32-point transforms are also provided based on the proposed transform. For the recent development of High Efficiency Video Coding (HEVC), simplified quanti-zation and de-quantization process are proposed. Three different forms of implementation with the essentially same performance, namely matrix multiplication, partial butterfly, and full factorization can be chosen accord-ing to the given platform. In terms of the number of multiplications required for the realization, our proposed full-factorization is 3~4 times faster than a partial butterfly, and about 10 times faster than direct matrix multiplication.
The graph neural network model.
Scarselli, Franco; Gori, Marco; Tsoi, Ah Chung; Hagenbuchner, Markus; Monfardini, Gabriele
2009-01-01
Many underlying relationships among data in several areas of science and engineering, e.g., computer vision, molecular chemistry, molecular biology, pattern recognition, and data mining, can be represented in terms of graphs. In this paper, we propose a new neural network model, called graph neural network (GNN) model, that extends existing neural network methods for processing the data represented in graph domains. This GNN model, which can directly process most of the practically useful types of graphs, e.g., acyclic, cyclic, directed, and undirected, implements a function tau(G,n) is an element of IR(m) that maps a graph G and one of its nodes n into an m-dimensional Euclidean space. A supervised learning algorithm is derived to estimate the parameters of the proposed GNN model. The computational cost of the proposed algorithm is also considered. Some experimental results are shown to validate the proposed learning algorithm, and to demonstrate its generalization capabilities.
A Lightweight White-Box Symmetric Encryption Algorithm against Node Capture for WSNs †
Shi, Yang; Wei, Wujing; He, Zongjian
2015-01-01
Wireless Sensor Networks (WSNs) are often deployed in hostile environments and, thus, nodes can be potentially captured by an adversary. This is a typical white-box attack context, i.e., the adversary may have total visibility of the implementation of the build-in cryptosystem and full control over its execution platform. Handling white-box attacks in a WSN scenario is a challenging task. Existing encryption algorithms for white-box attack contexts require large memory footprint and, hence, are not applicable for wireless sensor networks scenarios. As a countermeasure against the threat in this context, in this paper, we propose a class of lightweight secure implementations of the symmetric encryption algorithm SMS4. The basic idea of our approach is to merge several steps of the round function of SMS4 into table lookups, blended by randomly generated mixing bijections. Therefore, the size of the implementations are significantly reduced while keeping the same security efficiency. The security and efficiency of the proposed solutions are theoretically analyzed. Evaluation shows our solutions satisfy the requirement of sensor nodes in terms of limited memory size and low computational costs. PMID:26007737
Proposal for An Algorithm for Screening for Undernutrition in Hospitalized Children.
Huysentruyt, Koen; De Schepper, Jean; Bontems, Patrick; Alliet, Philippe; Peeters, Ellen; Roelants, Mathieu; Van Biervliet, Stephanie; Hauser, Bruno; Vandenplas, Yvan
2016-11-01
The prevalence of disease-related undernutrition in hospitalized children has not decreased significantly in the last decades in Europe. A recent large multicentric European study reported a percentage of underweight children ranging across countries from 4.0% to 9.3%. Nutritional screening has been put forward as a strategy to detect and prevent undernutrition in hospitalized children. It allows timely implementation of adequate nutritional support and prevents further nutritional deterioration of hospitalized children. In this article, a hands-on practical guideline for the implementation of a nutritional care program in hospitalized children is provided. The difference between nutritional status (anthropometry with or without additional technical investigations) at admission and nutritional risk (the risk of the need for a nutritional intervention or the risk for nutritional deterioration during hospital stay) is the focus of this article. Based on the quality control circle principle of Deming, a nutritional care algorithm, with detailed instructions specific for the pediatric population was developed and implementation in daily practice is proposed. Further research is required to prove the applicability and the merit of this algorithm. It can, however, serve as a basis to provide European or even wider guidelines.
Fine-grained parallel RNAalifold algorithm for RNA secondary structure prediction on FPGA
Xia, Fei; Dou, Yong; Zhou, Xingming; Yang, Xuejun; Xu, Jiaqing; Zhang, Yang
2009-01-01
Background In the field of RNA secondary structure prediction, the RNAalifold algorithm is one of the most popular methods using free energy minimization. However, general-purpose computers including parallel computers or multi-core computers exhibit parallel efficiency of no more than 50%. Field Programmable Gate-Array (FPGA) chips provide a new approach to accelerate RNAalifold by exploiting fine-grained custom design. Results RNAalifold shows complicated data dependences, in which the dependence distance is variable, and the dependence direction is also across two dimensions. We propose a systolic array structure including one master Processing Element (PE) and multiple slave PEs for fine grain hardware implementation on FPGA. We exploit data reuse schemes to reduce the need to load energy matrices from external memory. We also propose several methods to reduce energy table parameter size by 80%. Conclusion To our knowledge, our implementation with 16 PEs is the only FPGA accelerator implementing the complete RNAalifold algorithm. The experimental results show a factor of 12.2 speedup over the RNAalifold (ViennaPackage – 1.6.5) software for a group of aligned RNA sequences with 2981-residue running on a Personal Computer (PC) platform with Pentium 4 2.6 GHz CPU. PMID:19208138
Lu, Jianing; Li, Xiang; Fu, Songnian; Luo, Ming; Xiang, Meng; Zhou, Huibin; Tang, Ming; Liu, Deming
2017-03-06
We present dual-polarization complex-weighted, decision-aided, maximum-likelihood algorithm with superscalar parallelization (SSP-DP-CW-DA-ML) for joint carrier phase and frequency-offset estimation (FOE) in coherent optical receivers. By pre-compensation of the phase offset between signals in dual polarizations, the performance can be substantially improved. Meanwhile, with the help of modified SSP-based parallel implementation, the acquisition time of FO and the required number of training symbols are reduced by transferring the complex weights of the filters between adjacent buffers, where differential coding/decoding is not required. Simulation results show that the laser linewidth tolerance of our proposed algorithm is comparable to traditional blind phase search (BPS), while a complete FOE range of ± symbol rate/2 can be achieved. Finally, performance of our proposed algorithm is experimentally verified under the scenario of back-to-back (B2B) transmission using 10 Gbaud DP-16/32-QAM formats.
NASA Astrophysics Data System (ADS)
Li, Xinhua; Song, Zhenyu; Zhan, Yongjie; Wu, Qiongzhi
2009-12-01
Since the system capacity is severely limited, reducing the multiple access interfere (MAI) is necessary in the multiuser direct-sequence code division multiple access (DS-CDMA) system which is used in the telecommunication terminals data-transferred link system. In this paper, we adopt an adaptive multistage parallel interference cancellation structure in the demodulator based on the least mean square (LMS) algorithm to eliminate the MAI on the basis of overviewing various of multiuser dectection schemes. Neither a training sequence nor a pilot signal is needed in the proposed scheme, and its implementation complexity can be greatly reduced by a LMS approximate algorithm. The algorithm and its FPGA implementation is then derived. Simulation results of the proposed adaptive PIC can outperform some of the existing interference cancellation methods in AWGN channels. The hardware setup of mutiuser demodulator is described, and the experimental results based on it demonstrate that the simulation results shows large performance gains over the conventional single-user demodulator.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Katti, Amogh; Di Fatta, Giuseppe; Naughton, Thomas
Future extreme-scale high-performance computing systems will be required to work under frequent component failures. The MPI Forum s User Level Failure Mitigation proposal has introduced an operation, MPI Comm shrink, to synchronize the alive processes on the list of failed processes, so that applications can continue to execute even in the presence of failures by adopting algorithm-based fault tolerance techniques. This MPI Comm shrink operation requires a failure detection and consensus algorithm. This paper presents three novel failure detection and consensus algorithms using Gossiping. The proposed algorithms were implemented and tested using the Extreme-scale Simulator. The results show that inmore » all algorithms the number of Gossip cycles to achieve global consensus scales logarithmically with system size. The second algorithm also shows better scalability in terms of memory and network bandwidth usage and a perfect synchronization in achieving global consensus. The third approach is a three-phase distributed failure detection and consensus algorithm and provides consistency guarantees even in very large and extreme-scale systems while at the same time being memory and bandwidth efficient.« less
FPGA Implementation of Generalized Hebbian Algorithm for Texture Classification
Lin, Shiow-Jyu; Hwang, Wen-Jyi; Lee, Wei-Hao
2012-01-01
This paper presents a novel hardware architecture for principal component analysis. The architecture is based on the Generalized Hebbian Algorithm (GHA) because of its simplicity and effectiveness. The architecture is separated into three portions: the weight vector updating unit, the principal computation unit and the memory unit. In the weight vector updating unit, the computation of different synaptic weight vectors shares the same circuit for reducing the area costs. To show the effectiveness of the circuit, a texture classification system based on the proposed architecture is physically implemented by Field Programmable Gate Array (FPGA). It is embedded in a System-On-Programmable-Chip (SOPC) platform for performance measurement. Experimental results show that the proposed architecture is an efficient design for attaining both high speed performance and low area costs. PMID:22778640
NASA Technical Reports Server (NTRS)
Lin, Shu; Fossorier, Marc
1998-01-01
The Viterbi algorithm is indeed a very simple and efficient method of implementing the maximum likelihood decoding. However, if we take advantage of the structural properties in a trellis section, other efficient trellis-based decoding algorithms can be devised. Recently, an efficient trellis-based recursive maximum likelihood decoding (RMLD) algorithm for linear block codes has been proposed. This algorithm is more efficient than the conventional Viterbi algorithm in both computation and hardware requirements. Most importantly, the implementation of this algorithm does not require the construction of the entire code trellis, only some special one-section trellises of relatively small state and branch complexities are needed for constructing path (or branch) metric tables recursively. At the end, there is only one table which contains only the most likely code-word and its metric for a given received sequence r = (r(sub 1), r(sub 2),...,r(sub n)). This algorithm basically uses the divide and conquer strategy. Furthermore, it allows parallel/pipeline processing of received sequences to speed up decoding.
Petri nets SM-cover-based on heuristic coloring algorithm
NASA Astrophysics Data System (ADS)
Tkacz, Jacek; Doligalski, Michał
2015-09-01
In the paper, coloring heuristic algorithm of interpreted Petri nets is presented. Coloring is used to determine the State Machines (SM) subnets. The present algorithm reduces the Petri net in order to reduce the computational complexity and finds one of its possible State Machines cover. The proposed algorithm uses elements of interpretation of Petri nets. The obtained result may not be the best, but it is sufficient for use in rapid prototyping of logic controllers. Found SM-cover will be also used in the development of algorithms for decomposition, and modular synthesis and implementation of parallel logic controllers. Correctness developed heuristic algorithm was verified using Gentzen formal reasoning system.
Development of Educational Support System for Algorithm using Flowchart
NASA Astrophysics Data System (ADS)
Ohchi, Masashi; Aoki, Noriyuki; Furukawa, Tatsuya; Takayama, Kanta
Recently, an information technology is indispensable for the business and industrial developments. However, it has been a social problem that the number of software developers has been insufficient. To solve the problem, it is necessary to develop and implement the environment for learning the algorithm and programming language. In the paper, we will describe the algorithm study support system for a programmer using the flowchart. Since the proposed system uses Graphical User Interface(GUI), it will become easy for a programmer to understand the algorithm in programs.
Complexity of the Quantum Adiabatic Algorithm
NASA Technical Reports Server (NTRS)
Hen, Itay
2013-01-01
The Quantum Adiabatic Algorithm (QAA) has been proposed as a mechanism for efficiently solving optimization problems on a quantum computer. Since adiabatic computation is analog in nature and does not require the design and use of quantum gates, it can be thought of as a simpler and perhaps more profound method for performing quantum computations that might also be easier to implement experimentally. While these features have generated substantial research in QAA, to date there is still a lack of solid evidence that the algorithm can outperform classical optimization algorithms.
Simple Common Plane contact detection algorithm for FE/FD methods
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vorobiev, O
2006-07-19
Common-plane (CP) algorithm is widely used in Discrete Element Method (DEM) to model contact forces between interacting particles or blocks. A new simple contact detection algorithm is proposed to model contacts in FE/FD methods which is similar to the CP algorithm. The CP is defined as a plane separating interacting faces of FE/FD mesh instead of blocks or particles in the original CP method. The method does not require iterations. It is very robust and easy to implement both in 2D and 3D case.
Peng, Bo; Wang, Yuqi; Hall, Timothy J; Jiang, Jingfeng
2017-01-01
Our primary objective of this work was to extend a previously published 2D coupled sub-sample tracking algorithm for 3D speckle tracking in the framework of ultrasound breast strain elastography. In order to overcome heavy computational cost, we investigated the use of a graphic processing unit (GPU) to accelerate the 3D coupled sub-sample speckle tracking method. The performance of the proposed GPU implementation was tested using a tissue-mimicking (TM) phantom and in vivo breast ultrasound data. The performance of this 3D sub-sample tracking algorithm was compared with the conventional 3D quadratic sub-sample estimation algorithm. On the basis of these evaluations, we concluded that the GPU implementation of this 3D sub-sample estimation algorithm can provide high-quality strain data (i.e. high correlation between the pre- and the motion-compensated post-deformation RF echo data and high contrast-to-noise ratio strain images), as compared to the conventional 3D quadratic sub-sample algorithm. Using the GPU implementation of the 3D speckle tracking algorithm, volumetric strain data can be achieved relatively fast (approximately 20 seconds per volume [2.5 cm × 2.5 cm × 2.5 cm]). PMID:28166493
Yang, Yuan; Quan, Nannan; Bu, Jingjing; Li, Xueping; Yu, Ningmei
2016-09-26
High order modulation and demodulation technology can solve the frequency requirement between the wireless energy transmission and data communication. In order to achieve reliable wireless data communication based on high order modulation technology for visual prosthesis, this work proposed a Reed-Solomon (RS) error correcting code (ECC) circuit on the basis of differential amplitude and phase shift keying (DAPSK) soft demodulation. Firstly, recognizing the weakness of the traditional DAPSK soft demodulation algorithm based on division that is complex for hardware implementation, an improved phase soft demodulation algorithm for visual prosthesis to reduce the hardware complexity is put forward. Based on this new algorithm, an improved RS soft decoding method is hence proposed. In this new decoding method, the combination of Chase algorithm and hard decoding algorithms is used to achieve soft decoding. In order to meet the requirements of implantable visual prosthesis, the method to calculate reliability of symbol-level based on multiplication of bit reliability is derived, which reduces the testing vectors number of Chase algorithm. The proposed algorithms are verified by MATLAB simulation and FPGA experimental results. During MATLAB simulation, the biological channel attenuation property model is added into the ECC circuit. The data rate is 8 Mbps in the MATLAB simulation and FPGA experiments. MATLAB simulation results show that the improved phase soft demodulation algorithm proposed in this paper saves hardware resources without losing bit error rate (BER) performance. Compared with the traditional demodulation circuit, the coding gain of the ECC circuit has been improved by about 3 dB under the same BER of [Formula: see text]. The FPGA experimental results show that under the condition of data demodulation error with wireless coils 3 cm away, the system can correct it. The greater the distance, the higher the BER. Then we use a bit error rate analyzer to measure BER of the demodulation circuit and the RS ECC circuit with different distance of two coils. And the experimental results show that the RS ECC circuit has about an order of magnitude lower BER than the demodulation circuit when under the same coils distance. Therefore, the RS ECC circuit has more higher reliability of the communication in the system. The improved phase soft demodulation algorithm and soft decoding algorithm proposed in this paper enables data communication that is more reliable than other demodulation system, which also provide a significant reference for further study to the visual prosthesis system.
A novel pipeline based FPGA implementation of a genetic algorithm
NASA Astrophysics Data System (ADS)
Thirer, Nonel
2014-05-01
To solve problems when an analytical solution is not available, more and more bio-inspired computation techniques have been applied in the last years. Thus, an efficient algorithm is the Genetic Algorithm (GA), which imitates the biological evolution process, finding the solution by the mechanism of "natural selection", where the strong has higher chances to survive. A genetic algorithm is an iterative procedure which operates on a population of individuals called "chromosomes" or "possible solutions" (usually represented by a binary code). GA performs several processes with the population individuals to produce a new population, like in the biological evolution. To provide a high speed solution, pipelined based FPGA hardware implementations are used, with a nstages pipeline for a n-phases genetic algorithm. The FPGA pipeline implementations are constraints by the different execution time of each stage and by the FPGA chip resources. To minimize these difficulties, we propose a bio-inspired technique to modify the crossover step by using non identical twins. Thus two of the chosen chromosomes (parents) will build up two new chromosomes (children) not only one as in classical GA. We analyze the contribution of this method to reduce the execution time in the asynchronous and synchronous pipelines and also the possibility to a cheaper FPGA implementation, by using smaller populations. The full hardware architecture for a FPGA implementation to our target ALTERA development card is presented and analyzed.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aziz, H. M. Abdul; Zhu, Feng; Ukkusuri, Satish V.
Here, this research applies R-Markov Average Reward Technique based reinforcement learning (RL) algorithm, namely RMART, for vehicular signal control problem leveraging information sharing among signal controllers in connected vehicle environment. We implemented the algorithm in a network of 18 signalized intersections and compare the performance of RMART with fixed, adaptive, and variants of the RL schemes. Results show significant improvement in system performance for RMART algorithm with information sharing over both traditional fixed signal timing plans and real time adaptive control schemes. Additionally, the comparison with reinforcement learning algorithms including Q learning and SARSA indicate that RMART performs better atmore » higher congestion levels. Further, a multi-reward structure is proposed that dynamically adjusts the reward function with varying congestion states at the intersection. Finally, the results from test networks show significant reduction in emissions (CO, CO 2, NO x, VOC, PM 10) when RL algorithms are implemented compared to fixed signal timings and adaptive schemes.« less
NASA Astrophysics Data System (ADS)
Asilah Khairi, Nor; Bahari Jambek, Asral
2017-11-01
An Internet of Things (IoT) device is usually powered by a small battery, which does not last long. As a result, saving energy in IoT devices has become an important issue when it comes to this subject. Since power consumption is the primary cause of radio communication, some researchers have proposed several compression algorithms with the purpose of overcoming this particular problem. Several data compression algorithms from previous reference papers are discussed in this paper. The description of the compression algorithm in the reference papers was collected and summarized in a table form. From the analysis, MAS compression algorithm was selected as a project prototype due to its high potential for meeting the project requirements. Besides that, it also produced better performance regarding energy-saving, better memory usage, and data transmission efficiency. This method is also suitable to be implemented in WSN. MAS compression algorithm will be prototyped and applied in portable electronic devices for Internet of Things applications.
Medical Image Compression Based on Vector Quantization with Variable Block Sizes in Wavelet Domain
Jiang, Huiyan; Ma, Zhiyuan; Hu, Yang; Yang, Benqiang; Zhang, Libo
2012-01-01
An optimized medical image compression algorithm based on wavelet transform and improved vector quantization is introduced. The goal of the proposed method is to maintain the diagnostic-related information of the medical image at a high compression ratio. Wavelet transformation was first applied to the image. For the lowest-frequency subband of wavelet coefficients, a lossless compression method was exploited; for each of the high-frequency subbands, an optimized vector quantization with variable block size was implemented. In the novel vector quantization method, local fractal dimension (LFD) was used to analyze the local complexity of each wavelet coefficients, subband. Then an optimal quadtree method was employed to partition each wavelet coefficients, subband into several sizes of subblocks. After that, a modified K-means approach which is based on energy function was used in the codebook training phase. At last, vector quantization coding was implemented in different types of sub-blocks. In order to verify the effectiveness of the proposed algorithm, JPEG, JPEG2000, and fractal coding approach were chosen as contrast algorithms. Experimental results show that the proposed method can improve the compression performance and can achieve a balance between the compression ratio and the image visual quality. PMID:23049544
Medical image compression based on vector quantization with variable block sizes in wavelet domain.
Jiang, Huiyan; Ma, Zhiyuan; Hu, Yang; Yang, Benqiang; Zhang, Libo
2012-01-01
An optimized medical image compression algorithm based on wavelet transform and improved vector quantization is introduced. The goal of the proposed method is to maintain the diagnostic-related information of the medical image at a high compression ratio. Wavelet transformation was first applied to the image. For the lowest-frequency subband of wavelet coefficients, a lossless compression method was exploited; for each of the high-frequency subbands, an optimized vector quantization with variable block size was implemented. In the novel vector quantization method, local fractal dimension (LFD) was used to analyze the local complexity of each wavelet coefficients, subband. Then an optimal quadtree method was employed to partition each wavelet coefficients, subband into several sizes of subblocks. After that, a modified K-means approach which is based on energy function was used in the codebook training phase. At last, vector quantization coding was implemented in different types of sub-blocks. In order to verify the effectiveness of the proposed algorithm, JPEG, JPEG2000, and fractal coding approach were chosen as contrast algorithms. Experimental results show that the proposed method can improve the compression performance and can achieve a balance between the compression ratio and the image visual quality.
Formulations and algorithms for problems on rock mass and support deformation during mining
NASA Astrophysics Data System (ADS)
Seryakov, VM
2018-03-01
The analysis of problem formulations to calculate stress-strain state of mine support and surrounding rocks mass in rock mechanics shows that such formulations incompletely describe the mechanical features of joint deformation in the rock mass–support system. The present paper proposes an algorithm to take into account the actual conditions of rock mass and support interaction and the algorithm implementation method to ensure efficient calculation of stresses in rocks and support.
Ahirwal, M K; Kumar, Anil; Singh, G K
2013-01-01
This paper explores the migration of adaptive filtering with swarm intelligence/evolutionary techniques employed in the field of electroencephalogram/event-related potential noise cancellation or extraction. A new approach is proposed in the form of controlled search space to stabilize the randomness of swarm intelligence techniques especially for the EEG signal. Swarm-based algorithms such as Particles Swarm Optimization, Artificial Bee Colony, and Cuckoo Optimization Algorithm with their variants are implemented to design optimized adaptive noise canceler. The proposed controlled search space technique is tested on each of the swarm intelligence techniques and is found to be more accurate and powerful. Adaptive noise canceler with traditional algorithms such as least-mean-square, normalized least-mean-square, and recursive least-mean-square algorithms are also implemented to compare the results. ERP signals such as simulated visual evoked potential, real visual evoked potential, and real sensorimotor evoked potential are used, due to their physiological importance in various EEG studies. Average computational time and shape measures of evolutionary techniques are observed 8.21E-01 sec and 1.73E-01, respectively. Though, traditional algorithms take negligible time consumption, but are unable to offer good shape preservation of ERP, noticed as average computational time and shape measure difference, 1.41E-02 sec and 2.60E+00, respectively.
New algorithms to represent complex pseudoknotted RNA structures in dot-bracket notation.
Antczak, Maciej; Popenda, Mariusz; Zok, Tomasz; Zurkowski, Michal; Adamiak, Ryszard W; Szachniuk, Marta
2018-04-15
Understanding the formation, architecture and roles of pseudoknots in RNA structures are one of the most difficult challenges in RNA computational biology and structural bioinformatics. Methods predicting pseudoknots typically perform this with poor accuracy, often despite experimental data incorporation. Existing bioinformatic approaches differ in terms of pseudoknots' recognition and revealing their nature. A few ways of pseudoknot classification exist, most common ones refer to a genus or order. Following the latter one, we propose new algorithms that identify pseudoknots in RNA structure provided in BPSEQ format, determine their order and encode in dot-bracket-letter notation. The proposed encoding aims to illustrate the hierarchy of RNA folding. New algorithms are based on dynamic programming and hybrid (combining exhaustive search and random walk) approaches. They evolved from elementary algorithm implemented within the workflow of RNA FRABASE 1.0, our database of RNA structure fragments. They use different scoring functions to rank dissimilar dot-bracket representations of RNA structure. Computational experiments show an advantage of new methods over the others, especially for large RNA structures. Presented algorithms have been implemented as new functionality of RNApdbee webserver and are ready to use at http://rnapdbee.cs.put.poznan.pl. mszachniuk@cs.put.poznan.pl. Supplementary data are available at Bioinformatics online.
Generic framework for mining cellular automata models on protein-folding simulations.
Diaz, N; Tischer, I
2016-05-13
Cellular automata model identification is an important way of building simplified simulation models. In this study, we describe a generic architectural framework to ease the development process of new metaheuristic-based algorithms for cellular automata model identification in protein-folding trajectories. Our framework was developed by a methodology based on design patterns that allow an improved experience for new algorithms development. The usefulness of the proposed framework is demonstrated by the implementation of four algorithms, able to obtain extremely precise cellular automata models of the protein-folding process with a protein contact map representation. Dynamic rules obtained by the proposed approach are discussed, and future use for the new tool is outlined.
A Modified Artificial Bee Colony Algorithm Application for Economic Environmental Dispatch
NASA Astrophysics Data System (ADS)
Tarafdar Hagh, M.; Baghban Orandi, Omid
2018-03-01
In conventional fossil-fuel power systems, the economic environmental dispatch (EED) problem is a major problem that optimally determines the output power of generating units in a way that cost of total production and emission level be minimized simultaneously, and at the same time all the constraints of units and system are satisfied properly. To solve EED problem which is a non-convex optimization problem, a modified artificial bee colony (MABC) algorithm is proposed in this paper. This algorithm by implementing weighted sum method is applied on two test systems, and eventually, obtained results are compared with other reported results. Comparison of results confirms superiority and efficiency of proposed method clearly.
A proposed technique for the Venus balloon telemetry and Doppler frequency recovery
NASA Technical Reports Server (NTRS)
Jurgens, R. F.; Divsalar, D.
1985-01-01
A technique is proposed to accurately estimate the Doppler frequency and demodulate the digitally encoded telemetry signal that contains the measurements from balloon instruments. Since the data are prerecorded, one can take advantage of noncausal estimators that are both simpler and more computationally efficient than the usual closed-loop or real-time estimators for signal detection and carrier tracking. Algorithms for carrier frequency estimation subcarrier demodulation, bit and frame synchronization are described. A Viterbi decoder algorithm using a branch indexing technique has been devised to decode constraint length 6, rate 1/2 convolutional code that is being used by the balloon transmitter. These algorithms are memory efficient and can be implemented on microcomputer systems.
Gao, Bin; Li, Xiaoqing; Woo, Wai Lok; Tian, Gui Yun
2018-05-01
Thermographic inspection has been widely applied to non-destructive testing and evaluation with the capabilities of rapid, contactless, and large surface area detection. Image segmentation is considered essential for identifying and sizing defects. To attain a high-level performance, specific physics-based models that describe defects generation and enable the precise extraction of target region are of crucial importance. In this paper, an effective genetic first-order statistical image segmentation algorithm is proposed for quantitative crack detection. The proposed method automatically extracts valuable spatial-temporal patterns from unsupervised feature extraction algorithm and avoids a range of issues associated with human intervention in laborious manual selection of specific thermal video frames for processing. An internal genetic functionality is built into the proposed algorithm to automatically control the segmentation threshold to render enhanced accuracy in sizing the cracks. Eddy current pulsed thermography will be implemented as a platform to demonstrate surface crack detection. Experimental tests and comparisons have been conducted to verify the efficacy of the proposed method. In addition, a global quantitative assessment index F-score has been adopted to objectively evaluate the performance of different segmentation algorithms.
Hybrid cryptosystem RSA - CRT optimization and VMPC
NASA Astrophysics Data System (ADS)
Rahmadani, R.; Mawengkang, H.; Sutarman
2018-03-01
Hybrid cryptosystem combines symmetric algorithms and asymmetric algorithms. This combination utilizes speeds on encryption/decryption processes of symmetric algorithms and asymmetric algorithms to secure symmetric keys. In this paper we propose hybrid cryptosystem that combine symmetric algorithms VMPC and asymmetric algorithms RSA - CRT optimization. RSA - CRT optimization speeds up the decryption process by obtaining plaintext with dp and p key only, so there is no need to perform CRT processes. The VMPC algorithm is more efficient in software implementation and reduces known weaknesses in RC4 key generation. The results show hybrid cryptosystem RSA - CRT optimization and VMPC is faster than hybrid cryptosystem RSA - VMPC and hybrid cryptosystem RSA - CRT - VMPC. Keyword : Cryptography, RSA, RSA - CRT, VMPC, Hybrid Cryptosystem.
Brian hears: online auditory processing using vectorization over channels.
Fontaine, Bertrand; Goodman, Dan F M; Benichoux, Victor; Brette, Romain
2011-01-01
The human cochlea includes about 3000 inner hair cells which filter sounds at frequencies between 20 Hz and 20 kHz. This massively parallel frequency analysis is reflected in models of auditory processing, which are often based on banks of filters. However, existing implementations do not exploit this parallelism. Here we propose algorithms to simulate these models by vectorizing computation over frequency channels, which are implemented in "Brian Hears," a library for the spiking neural network simulator package "Brian." This approach allows us to use high-level programming languages such as Python, because with vectorized operations, the computational cost of interpretation represents a small fraction of the total cost. This makes it possible to define and simulate complex models in a simple way, while all previous implementations were model-specific. In addition, we show that these algorithms can be naturally parallelized using graphics processing units, yielding substantial speed improvements. We demonstrate these algorithms with several state-of-the-art cochlear models, and show that they compare favorably with existing, less flexible, implementations.
NASA Astrophysics Data System (ADS)
Wang, Q.; Alfalou, A.; Brosseau, C.
2016-04-01
Here, we report a brief review on the recent developments of correlation algorithms. Several implementation schemes and specific applications proposed in recent years are also given to illustrate powerful applications of these methods. Following a discussion and comparison of the implementation of these schemes, we believe that all-numerical implementation is the most practical choice for application of the correlation method because the advantages of optical processing cannot compensate the technical and/or financial cost needed for an optical implementation platform. We also present a simple iterative algorithm to optimize the training images of composite correlation filters. By making use of three or four iterations, the peak-to-correlation energy (PCE) value of correlation plane can be significantly enhanced. A simulation test using the Pointing Head Pose Image Database (PHPID) illustrates the effectiveness of this statement. Our method can be applied in many composite filters based on linear composition of training images as an optimization means.
NASA Astrophysics Data System (ADS)
Kapalova, N.; Haumen, A.
2018-05-01
This paper addresses to structures and properties of the cryptographic information protection algorithm model based on NPNs and constructed on an SP-network. The main task of the research is to increase the cryptostrength of the algorithm. In the paper, the transformation resulting in the improvement of the cryptographic strength of the algorithm is described in detail. The proposed model is based on an SP-network. The reasons for using the SP-network in this model are the conversion properties used in these networks. In the encryption process, transformations based on S-boxes and P-boxes are used. It is known that these transformations can withstand cryptanalysis. In addition, in the proposed model, transformations that satisfy the requirements of the "avalanche effect" are used. As a result of this work, a computer program that implements an encryption algorithm model based on the SP-network has been developed.
A Novel Segment-Based Approach for Improving Classification Performance of Transport Mode Detection.
Guvensan, M Amac; Dusun, Burak; Can, Baris; Turkmen, H Irem
2017-12-30
Transportation planning and solutions have an enormous impact on city life. To minimize the transport duration, urban planners should understand and elaborate the mobility of a city. Thus, researchers look toward monitoring people's daily activities including transportation types and duration by taking advantage of individual's smartphones. This paper introduces a novel segment-based transport mode detection architecture in order to improve the results of traditional classification algorithms in the literature. The proposed post-processing algorithm, namely the Healing algorithm, aims to correct the misclassification results of machine learning-based solutions. Our real-life test results show that the Healing algorithm could achieve up to 40% improvement of the classification results. As a result, the implemented mobile application could predict eight classes including stationary, walking, car, bus, tram, train, metro and ferry with a success rate of 95% thanks to the proposed multi-tier architecture and Healing algorithm.
Handwritten digits recognition based on immune network
NASA Astrophysics Data System (ADS)
Li, Yangyang; Wu, Yunhui; Jiao, Lc; Wu, Jianshe
2011-11-01
With the development of society, handwritten digits recognition technique has been widely applied to production and daily life. It is a very difficult task to solve these problems in the field of pattern recognition. In this paper, a new method is presented for handwritten digit recognition. The digit samples firstly are processed and features extraction. Based on these features, a novel immune network classification algorithm is designed and implemented to the handwritten digits recognition. The proposed algorithm is developed by Jerne's immune network model for feature selection and KNN method for classification. Its characteristic is the novel network with parallel commutating and learning. The performance of the proposed method is experimented to the handwritten number datasets MNIST and compared with some other recognition algorithms-KNN, ANN and SVM algorithm. The result shows that the novel classification algorithm based on immune network gives promising performance and stable behavior for handwritten digits recognition.
Huang, X N; Ren, H P
2016-05-13
Robust adaptation is a critical ability of gene regulatory network (GRN) to survive in a fluctuating environment, which represents the system responding to an input stimulus rapidly and then returning to its pre-stimulus steady state timely. In this paper, the GRN is modeled using the Michaelis-Menten rate equations, which are highly nonlinear differential equations containing 12 undetermined parameters. The robust adaption is quantitatively described by two conflicting indices. To identify the parameter sets in order to confer the GRNs with robust adaptation is a multi-variable, multi-objective, and multi-peak optimization problem, which is difficult to acquire satisfactory solutions especially high-quality solutions. A new best-neighbor particle swarm optimization algorithm is proposed to implement this task. The proposed algorithm employs a Latin hypercube sampling method to generate the initial population. The particle crossover operation and elitist preservation strategy are also used in the proposed algorithm. The simulation results revealed that the proposed algorithm could identify multiple solutions in one time running. Moreover, it demonstrated a superior performance as compared to the previous methods in the sense of detecting more high-quality solutions within an acceptable time. The proposed methodology, owing to its universality and simplicity, is useful for providing the guidance to design GRN with superior robust adaptation.
Distributed-Memory Fast Maximal Independent Set
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kanewala Appuhamilage, Thejaka Amila J.; Zalewski, Marcin J.; Lumsdaine, Andrew
The Maximal Independent Set (MIS) graph problem arises in many applications such as computer vision, information theory, molecular biology, and process scheduling. The growing scale of MIS problems suggests the use of distributed-memory hardware as a cost-effective approach to providing necessary compute and memory resources. Luby proposed four randomized algorithms to solve the MIS problem. All those algorithms are designed focusing on shared-memory machines and are analyzed using the PRAM model. These algorithms do not have direct efficient distributed-memory implementations. In this paper, we extend two of Luby’s seminal MIS algorithms, “Luby(A)” and “Luby(B),” to distributed-memory execution, and we evaluatemore » their performance. We compare our results with the “Filtered MIS” implementation in the Combinatorial BLAS library for two types of synthetic graph inputs.« less
CPU-GPU hybrid accelerating the Zuker algorithm for RNA secondary structure prediction applications
2012-01-01
Background Prediction of ribonucleic acid (RNA) secondary structure remains one of the most important research areas in bioinformatics. The Zuker algorithm is one of the most popular methods of free energy minimization for RNA secondary structure prediction. Thus far, few studies have been reported on the acceleration of the Zuker algorithm on general-purpose processors or on extra accelerators such as Field Programmable Gate-Array (FPGA) and Graphics Processing Units (GPU). To the best of our knowledge, no implementation combines both CPU and extra accelerators, such as GPUs, to accelerate the Zuker algorithm applications. Results In this paper, a CPU-GPU hybrid computing system that accelerates Zuker algorithm applications for RNA secondary structure prediction is proposed. The computing tasks are allocated between CPU and GPU for parallel cooperate execution. Performance differences between the CPU and the GPU in the task-allocation scheme are considered to obtain workload balance. To improve the hybrid system performance, the Zuker algorithm is optimally implemented with special methods for CPU and GPU architecture. Conclusions Speedup of 15.93× over optimized multi-core SIMD CPU implementation and performance advantage of 16% over optimized GPU implementation are shown in the experimental results. More than 14% of the sequences are executed on CPU in the hybrid system. The system combining CPU and GPU to accelerate the Zuker algorithm is proven to be promising and can be applied to other bioinformatics applications. PMID:22369626
NASA Astrophysics Data System (ADS)
Nightingale, James; Wang, Qi; Grecos, Christos
2011-03-01
Users of the next generation wireless paradigm known as multihomed mobile networks expect satisfactory quality of service (QoS) when accessing streamed multimedia content. The recent H.264 Scalable Video Coding (SVC) extension to the Advanced Video Coding standard (AVC), offers the facility to adapt real-time video streams in response to the dynamic conditions of multiple network paths encountered in multihomed wireless mobile networks. Nevertheless, preexisting streaming algorithms were mainly proposed for AVC delivery over multipath wired networks and were evaluated by software simulation. This paper introduces a practical, hardware-based testbed upon which we implement and evaluate real-time H.264 SVC streaming algorithms in a realistic multihomed wireless mobile networks environment. We propose an optimised streaming algorithm with multi-fold technical contributions. Firstly, we extended the AVC packet prioritisation schemes to reflect the three-dimensional granularity of SVC. Secondly, we designed a mechanism for evaluating the effects of different streamer 'read ahead window' sizes on real-time performance. Thirdly, we took account of the previously unconsidered path switching and mobile networks tunnelling overheads encountered in real-world deployments. Finally, we implemented a path condition monitoring and reporting scheme to facilitate the intelligent path switching. The proposed system has been experimentally shown to offer a significant improvement in PSNR of the received stream compared with representative existing algorithms.
A Hardware-Supported Algorithm for Self-Managed and Choreographed Task Execution in Sensor Networks.
Bordel, Borja; Miguel, Carlos; Alcarria, Ramón; Robles, Tomás
2018-03-07
Nowadays, sensor networks are composed of a great number of tiny resource-constraint nodes, whose management is increasingly more complex. In fact, although collaborative or choreographic task execution schemes are which fit in the most perfect way with the nature of sensor networks, they are rarely implemented because of the high resource consumption of these algorithms (especially if networks include many resource-constrained devices). On the contrary, hierarchical networks are usually designed, in whose cusp it is included a heavy orchestrator with a remarkable processing power, being able to implement any necessary management solution. However, although this orchestration approach solves most practical management problems of sensor networks, a great amount of the operation time is wasted while nodes request the orchestrator to address a conflict and they obtain the required instructions to operate. Therefore, in this paper it is proposed a new mechanism for self-managed and choreographed task execution in sensor networks. The proposed solution considers only a lightweight gateway instead of traditional heavy orchestrators and a hardware-supported algorithm, which consume a negligible amount of resources in sensor nodes. The gateway avoids the congestion of the entire sensor network and the hardware-supported algorithm enables a choreographed task execution scheme, so no particular node is overloaded. The performance of the proposed solution is evaluated through numerical and electronic ModelSim-based simulations.
A Hardware-Supported Algorithm for Self-Managed and Choreographed Task Execution in Sensor Networks
2018-01-01
Nowadays, sensor networks are composed of a great number of tiny resource-constraint nodes, whose management is increasingly more complex. In fact, although collaborative or choreographic task execution schemes are which fit in the most perfect way with the nature of sensor networks, they are rarely implemented because of the high resource consumption of these algorithms (especially if networks include many resource-constrained devices). On the contrary, hierarchical networks are usually designed, in whose cusp it is included a heavy orchestrator with a remarkable processing power, being able to implement any necessary management solution. However, although this orchestration approach solves most practical management problems of sensor networks, a great amount of the operation time is wasted while nodes request the orchestrator to address a conflict and they obtain the required instructions to operate. Therefore, in this paper it is proposed a new mechanism for self-managed and choreographed task execution in sensor networks. The proposed solution considers only a lightweight gateway instead of traditional heavy orchestrators and a hardware-supported algorithm, which consume a negligible amount of resources in sensor nodes. The gateway avoids the congestion of the entire sensor network and the hardware-supported algorithm enables a choreographed task execution scheme, so no particular node is overloaded. The performance of the proposed solution is evaluated through numerical and electronic ModelSim-based simulations. PMID:29518986
Real time ray tracing based on shader
NASA Astrophysics Data System (ADS)
Gui, JiangHeng; Li, Min
2017-07-01
Ray tracing is a rendering algorithm for generating an image through tracing lights into an image plane, it can simulate complicate optical phenomenon like refraction, depth of field and motion blur. Compared with rasterization, ray tracing can achieve more realistic rendering result, however with greater computational cost, simple scene rendering can consume tons of time. With the GPU's performance improvement and the advent of programmable rendering pipeline, complicated algorithm can also be implemented directly on shader. So, this paper proposes a new method that implement ray tracing directly on fragment shader, mainly include: surface intersection, importance sampling and progressive rendering. With the help of GPU's powerful throughput capability, it can implement real time rendering of simple scene.
Construction of Gene Regulatory Networks Using Recurrent Neural Networks and Swarm Intelligence.
Khan, Abhinandan; Mandal, Sudip; Pal, Rajat Kumar; Saha, Goutam
2016-01-01
We have proposed a methodology for the reverse engineering of biologically plausible gene regulatory networks from temporal genetic expression data. We have used established information and the fundamental mathematical theory for this purpose. We have employed the Recurrent Neural Network formalism to extract the underlying dynamics present in the time series expression data accurately. We have introduced a new hybrid swarm intelligence framework for the accurate training of the model parameters. The proposed methodology has been first applied to a small artificial network, and the results obtained suggest that it can produce the best results available in the contemporary literature, to the best of our knowledge. Subsequently, we have implemented our proposed framework on experimental (in vivo) datasets. Finally, we have investigated two medium sized genetic networks (in silico) extracted from GeneNetWeaver, to understand how the proposed algorithm scales up with network size. Additionally, we have implemented our proposed algorithm with half the number of time points. The results indicate that a reduction of 50% in the number of time points does not have an effect on the accuracy of the proposed methodology significantly, with a maximum of just over 15% deterioration in the worst case.
Efficient Implementation of a Symbol Timing Estimator for Broadband PLC.
Nombela, Francisco; García, Enrique; Mateos, Raúl; Hernández, Álvaro
2015-08-21
Broadband Power Line Communications (PLC) have taken advantage of the research advances in multi-carrier modulations to mitigate frequency selective fading, and their adoption opens up a myriad of applications in the field of sensory and automation systems, multimedia connectivity or smart spaces. Nonetheless, the use of these multi-carrier modulations, such as Wavelet-OFDM, requires a highly accurate symbol timing estimation for reliably recovering of transmitted data. Furthermore, the PLC channel presents some particularities that prevent the direct use of previous synchronization algorithms proposed in wireless communication systems. Therefore more research effort should be involved in the design and implementation of novel and robust synchronization algorithms for PLC, thus enabling real-time synchronization. This paper proposes a symbol timing estimator for broadband PLC based on cross-correlation with multilevel complementary sequences or Zadoff-Chu sequences and its efficient implementation in a FPGA; the obtained results show a 90% of success rate in symbol timing estimation for a certain PLC channel model and a reduced resource consumption for its implementation in a Xilinx Kyntex FPGA.
A Flexible VHDL Floating Point Module for Control Algorithm Implementation in Space Applications
NASA Astrophysics Data System (ADS)
Padierna, A.; Nicoleau, C.; Sanchez, J.; Hidalgo, I.; Elvira, S.
2012-08-01
The implementation of control loops for space applications is an area with great potential. However, the characteristics of this kind of systems, such as its wide dynamic range of numeric values, make inadequate the use of fixed-point algorithms.However, because the generic chips available for the treatment of floating point data are, in general, not qualified to operate in space environments and the possibility of using an IP module in a FPGA/ASIC qualified for space is not viable due to the low amount of logic cells available for these type of devices, it is necessary to find a viable alternative.For these reasons, in this paper a VHDL Floating Point Module is presented. This proposal allows the design and execution of floating point algorithms with acceptable occupancy to be implemented in FPGAs/ASICs qualified for space environments.
Image segmentation based upon topological operators: real-time implementation case study
NASA Astrophysics Data System (ADS)
Mahmoudi, R.; Akil, M.
2009-02-01
In miscellaneous applications of image treatment, thinning and crest restoring present a lot of interests. Recommended algorithms for these procedures are those able to act directly over grayscales images while preserving topology. But their strong consummation in term of time remains the major disadvantage in their choice. In this paper we present an efficient hardware implementation on RISC processor of two powerful algorithms of thinning and crest restoring developed by our team. Proposed implementation enhances execution time. A chain of segmentation applied to medical imaging will serve as a concrete example to illustrate the improvements brought thanks to the optimization techniques in both algorithm and architectural levels. The particular use of the SSE instruction set relative to the X86_32 processors (PIV 3.06 GHz) will allow a best performance for real time processing: a cadency of 33 images (512*512) per second is assured.
Analysis and simulation tools for solar array power systems
NASA Astrophysics Data System (ADS)
Pongratananukul, Nattorn
This dissertation presents simulation tools developed specifically for the design of solar array power systems. Contributions are made in several aspects of the system design phases, including solar source modeling, system simulation, and controller verification. A tool to automate the study of solar array configurations using general purpose circuit simulators has been developed based on the modeling of individual solar cells. Hierarchical structure of solar cell elements, including semiconductor properties, allows simulation of electrical properties as well as the evaluation of the impact of environmental conditions. A second developed tool provides a co-simulation platform with the capability to verify the performance of an actual digital controller implemented in programmable hardware such as a DSP processor, while the entire solar array including the DC-DC power converter is modeled in software algorithms running on a computer. This "virtual plant" allows developing and debugging code for the digital controller, and also to improve the control algorithm. One important task in solar arrays is to track the maximum power point on the array in order to maximize the power that can be delivered. Digital controllers implemented with programmable processors are particularly attractive for this task because sophisticated tracking algorithms can be implemented and revised when needed to optimize their performance. The proposed co-simulation tools are thus very valuable in developing and optimizing the control algorithm, before the system is built. Examples that demonstrate the effectiveness of the proposed methodologies are presented. The proposed simulation tools are also valuable in the design of multi-channel arrays. In the specific system that we have designed and tested, the control algorithm is implemented on a single digital signal processor. In each of the channels the maximum power point is tracked individually. In the prototype we built, off-the-shelf commercial DC-DC converters were utilized. At the end, the overall performance of the entire system was evaluated using solar array simulators capable of simulating various I-V characteristics, and also by using an electronic load. Experimental results are presented.
OPC for curved designs in application to photonics on silicon
NASA Astrophysics Data System (ADS)
Orlando, Bastien; Farys, Vincent; Schneider, Loïc.; Cremer, Sébastien; Postnikov, Sergei V.; Millequant, Matthieu; Dirrenberger, Mathieu; Tiphine, Charles; Bayle, Sébastian; Tranquillin, Céline; Schiavone, Patrick
2016-03-01
Today's design for photonics devices on silicon relies on non-Manhattan features such as curves and a wide variety of angles with minimum feature size below 100nm. Industrial manufacturing of such devices requires optimized process window with 193nm lithography. Therefore, Resolution Enhancement Techniques (RET) that are commonly used for CMOS manufacturing are required. However, most RET algorithms are based on Manhattan fragmentation (0°, 45° and 90°) which can generate large CD dispersion on masks for photonic designs. Industrial implementation of RET solutions to photonic designs is challenging as most currently available OPC tools are CMOS-oriented. Discrepancy from design to final results induced by RET techniques can lead to lower photonic device performance. We propose a novel sizing algorithm allowing adjustment of design edge fragments while preserving the topology of the original structures. The results of the algorithm implementation in the rule based sizing, SRAF placement and model based correction will be discussed in this paper. Corrections based on this novel algorithm were applied and characterized on real photonics devices. The obtained results demonstrate the validity of the proposed correction method integrated in Inscale software of Aselta Nanographics.
Parallelization and implementation of approximate root isolation for nonlinear system by Monte Carlo
NASA Astrophysics Data System (ADS)
Khosravi, Ebrahim
1998-12-01
This dissertation solves a fundamental problem of isolating the real roots of nonlinear systems of equations by Monte-Carlo that were published by Bush Jones. This algorithm requires only function values and can be applied readily to complicated systems of transcendental functions. The implementation of this sequential algorithm provides scientists with the means to utilize function analysis in mathematics or other fields of science. The algorithm, however, is so computationally intensive that the system is limited to a very small set of variables, and this will make it unfeasible for large systems of equations. Also a computational technique was needed for investigating a metrology of preventing the algorithm structure from converging to the same root along different paths of computation. The research provides techniques for improving the efficiency and correctness of the algorithm. The sequential algorithm for this technique was corrected and a parallel algorithm is presented. This parallel method has been formally analyzed and is compared with other known methods of root isolation. The effectiveness, efficiency, enhanced overall performance of the parallel processing of the program in comparison to sequential processing is discussed. The message passing model was used for this parallel processing, and it is presented and implemented on Intel/860 MIMD architecture. The parallel processing proposed in this research has been implemented in an ongoing high energy physics experiment: this algorithm has been used to track neutrinoes in a super K detector. This experiment is located in Japan, and data can be processed on-line or off-line locally or remotely.
Array distribution in data-parallel programs
NASA Technical Reports Server (NTRS)
Chatterjee, Siddhartha; Gilbert, John R.; Schreiber, Robert; Sheffler, Thomas J.
1994-01-01
We consider distribution at compile time of the array data in a distributed-memory implementation of a data-parallel program written in a language like Fortran 90. We allow dynamic redistribution of data and define a heuristic algorithmic framework that chooses distribution parameters to minimize an estimate of program completion time. We represent the program as an alignment-distribution graph. We propose a divide-and-conquer algorithm for distribution that initially assigns a common distribution to each node of the graph and successively refines this assignment, taking computation, realignment, and redistribution costs into account. We explain how to estimate the effect of distribution on computation cost and how to choose a candidate set of distributions. We present the results of an implementation of our algorithms on several test problems.
Performance of Grey Wolf Optimizer on large scale problems
NASA Astrophysics Data System (ADS)
Gupta, Shubham; Deep, Kusum
2017-01-01
For solving nonlinear continuous problems of optimization numerous nature inspired optimization techniques are being proposed in literature which can be implemented to solve real life problems wherein the conventional techniques cannot be applied. Grey Wolf Optimizer is one of such technique which is gaining popularity since the last two years. The objective of this paper is to investigate the performance of Grey Wolf Optimization Algorithm on large scale optimization problems. The Algorithm is implemented on 5 common scalable problems appearing in literature namely Sphere, Rosenbrock, Rastrigin, Ackley and Griewank Functions. The dimensions of these problems are varied from 50 to 1000. The results indicate that Grey Wolf Optimizer is a powerful nature inspired Optimization Algorithm for large scale problems, except Rosenbrock which is a unimodal function.
Holographic near-eye display system based on double-convergence light Gerchberg-Saxton algorithm.
Sun, Peng; Chang, Shengqian; Liu, Siqi; Tao, Xiao; Wang, Chang; Zheng, Zhenrong
2018-04-16
In this paper, a method is proposed to implement noises reduced three-dimensional (3D) holographic near-eye display by phase-only computer-generated hologram (CGH). The CGH is calculated from a double-convergence light Gerchberg-Saxton (GS) algorithm, in which the phases of two virtual convergence lights are introduced into GS algorithm simultaneously. The first phase of convergence light is a replacement of random phase as the iterative initial value and the second phase of convergence light will modulate the phase distribution calculated by GS algorithm. Both simulations and experiments are carried out to verify the feasibility of the proposed method. The results indicate that this method can effectively reduce the noises in the reconstruction. Field of view (FOV) of the reconstructed image reaches 40 degrees and experimental light path in the 4-f system is shortened. As for 3D experiments, the results demonstrate that the proposed algorithm can present 3D images with 180cm zooming range and continuous depth cues. This method may provide a promising solution in future 3D augmented reality (AR) realization.
NASA Technical Reports Server (NTRS)
Maine, R. E.; Iliff, K. W.
1980-01-01
A new formulation is proposed for the problem of parameter estimation of dynamic systems with both process and measurement noise. The formulation gives estimates that are maximum likelihood asymptotically in time. The means used to overcome the difficulties encountered by previous formulations are discussed. It is then shown how the proposed formulation can be efficiently implemented in a computer program. A computer program using the proposed formulation is available in a form suitable for routine application. Examples with simulated and real data are given to illustrate that the program works well.
Adaptive recurrence quantum entanglement distillation for two-Kraus-operator channels
NASA Astrophysics Data System (ADS)
Ruan, Liangzhong; Dai, Wenhan; Win, Moe Z.
2018-05-01
Quantum entanglement serves as a valuable resource for many important quantum operations. A pair of entangled qubits can be shared between two agents by first preparing a maximally entangled qubit pair at one agent, and then sending one of the qubits to the other agent through a quantum channel. In this process, the deterioration of entanglement is inevitable since the noise inherent in the channel contaminates the qubit. To address this challenge, various quantum entanglement distillation (QED) algorithms have been developed. Among them, recurrence algorithms have advantages in terms of implementability and robustness. However, the efficiency of recurrence QED algorithms has not been investigated thoroughly in the literature. This paper puts forth two recurrence QED algorithms that adapt to the quantum channel to tackle the efficiency issue. The proposed algorithms have guaranteed convergence for quantum channels with two Kraus operators, which include phase-damping and amplitude-damping channels. Analytical results show that the convergence speed of these algorithms is improved from linear to quadratic and one of the algorithms achieves the optimal speed. Numerical results confirm that the proposed algorithms significantly improve the efficiency of QED.
NASA Astrophysics Data System (ADS)
Bruschetta, M.; Maran, F.; Beghi, A.
2017-06-01
The use of dynamic driving simulators is constantly increasing in the automotive community, with applications ranging from vehicle development to rehab and driver training. The effectiveness of such devices is related to their capabilities of well reproducing the driving sensations, hence it is crucial that the motion control strategies generate both realistic and feasible inputs to the platform. Such strategies are called motion cueing algorithms (MCAs). In recent years several MCAs based on model predictive control (MPC) techniques have been proposed. The main drawback associated with the use of MPC is its computational burden, that may limit their application to high performance dynamic simulators. In the paper, a fast, real-time implementation of an MPC-based MCA for 9 DOF, high performance platform is proposed. Effectiveness of the approach in managing the available working area is illustrated by presenting experimental results from an implementation on a real device with a 200 Hz control frequency.
A Kinect-Based Real-Time Compressive Tracking Prototype System for Amphibious Spherical Robots
Pan, Shaowu; Shi, Liwei; Guo, Shuxiang
2015-01-01
A visual tracking system is essential as a basis for visual servoing, autonomous navigation, path planning, robot-human interaction and other robotic functions. To execute various tasks in diverse and ever-changing environments, a mobile robot requires high levels of robustness, precision, environmental adaptability and real-time performance of the visual tracking system. In keeping with the application characteristics of our amphibious spherical robot, which was proposed for flexible and economical underwater exploration in 2012, an improved RGB-D visual tracking algorithm is proposed and implemented. Given the limited power source and computational capabilities of mobile robots, compressive tracking (CT), which is the effective and efficient algorithm that was proposed in 2012, was selected as the basis of the proposed algorithm to process colour images. A Kalman filter with a second-order motion model was implemented to predict the state of the target and select candidate patches or samples for the CT tracker. In addition, a variance ratio features shift (VR-V) tracker with a Kalman estimation mechanism was used to process depth images. Using a feedback strategy, the depth tracking results were used to assist the CT tracker in updating classifier parameters at an adaptive rate. In this way, most of the deficiencies of CT, including drift and poor robustness to occlusion and high-speed target motion, were partly solved. To evaluate the proposed algorithm, a Microsoft Kinect sensor, which combines colour and infrared depth cameras, was adopted for use in a prototype of the robotic tracking system. The experimental results with various image sequences demonstrated the effectiveness, robustness and real-time performance of the tracking system. PMID:25856331
A Kinect-based real-time compressive tracking prototype system for amphibious spherical robots.
Pan, Shaowu; Shi, Liwei; Guo, Shuxiang
2015-04-08
A visual tracking system is essential as a basis for visual servoing, autonomous navigation, path planning, robot-human interaction and other robotic functions. To execute various tasks in diverse and ever-changing environments, a mobile robot requires high levels of robustness, precision, environmental adaptability and real-time performance of the visual tracking system. In keeping with the application characteristics of our amphibious spherical robot, which was proposed for flexible and economical underwater exploration in 2012, an improved RGB-D visual tracking algorithm is proposed and implemented. Given the limited power source and computational capabilities of mobile robots, compressive tracking (CT), which is the effective and efficient algorithm that was proposed in 2012, was selected as the basis of the proposed algorithm to process colour images. A Kalman filter with a second-order motion model was implemented to predict the state of the target and select candidate patches or samples for the CT tracker. In addition, a variance ratio features shift (VR-V) tracker with a Kalman estimation mechanism was used to process depth images. Using a feedback strategy, the depth tracking results were used to assist the CT tracker in updating classifier parameters at an adaptive rate. In this way, most of the deficiencies of CT, including drift and poor robustness to occlusion and high-speed target motion, were partly solved. To evaluate the proposed algorithm, a Microsoft Kinect sensor, which combines colour and infrared depth cameras, was adopted for use in a prototype of the robotic tracking system. The experimental results with various image sequences demonstrated the effectiveness, robustness and real-time performance of the tracking system.
NASA Astrophysics Data System (ADS)
Foronda, Augusto; Ohta, Chikara; Tamaki, Hisashi
Dirty paper coding (DPC) is a strategy to achieve the region capacity of multiple input multiple output (MIMO) downlink channels and a DPC scheduler is throughput optimal if users are selected according to their queue states and current rates. However, DPC is difficult to implement in practical systems. One solution, zero-forcing beamforming (ZFBF) strategy has been proposed to achieve the same asymptotic sum rate capacity as that of DPC with an exhaustive search over the entire user set. Some suboptimal user group selection schedulers with reduced complexity based on ZFBF strategy (ZFBF-SUS) and proportional fair (PF) scheduling algorithm (PF-ZFBF) have also been proposed to enhance the throughput and fairness among the users, respectively. However, they are not throughput optimal, fairness and throughput decrease if each user queue length is different due to different users channel quality. Therefore, we propose two different scheduling algorithms: a throughput optimal scheduling algorithm (ZFBF-TO) and a reduced complexity scheduling algorithm (ZFBF-RC). Both are based on ZFBF strategy and, at every time slot, the scheduling algorithms have to select some users based on user channel quality, user queue length and orthogonality among users. Moreover, the proposed algorithms have to produce the rate allocation and power allocation for the selected users based on a modified water filling method. We analyze the schedulers complexity and numerical results show that ZFBF-RC provides throughput and fairness improvements compared to the ZFBF-SUS and PF-ZFBF scheduling algorithms.
NASA Astrophysics Data System (ADS)
Liu, Wei; Chen, Shu-Ming; Zhang, Jian; Wu, Chun-Wang; Wu, Wei; Chen, Ping-Xing
2015-03-01
It is widely believed that Shor’s factoring algorithm provides a driving force to boost the quantum computing research. However, a serious obstacle to its binary implementation is the large number of quantum gates. Non-binary quantum computing is an efficient way to reduce the required number of elemental gates. Here, we propose optimization schemes for Shor’s algorithm implementation and take a ternary version for factorizing 21 as an example. The optimized factorization is achieved by a two-qutrit quantum circuit, which consists of only two single qutrit gates and one ternary controlled-NOT gate. This two-qutrit quantum circuit is then encoded into the nine lower vibrational states of an ion trapped in a weakly anharmonic potential. Optimal control theory (OCT) is employed to derive the manipulation electric field for transferring the encoded states. The ternary Shor’s algorithm can be implemented in one single step. Numerical simulation results show that the accuracy of the state transformations is about 0.9919. Project supported by the National Natural Science Foundation of China (Grant No. 61205108) and the High Performance Computing (HPC) Foundation of National University of Defense Technology, China.
NASA Astrophysics Data System (ADS)
Piretzidis, Dimitrios; Sra, Gurveer; Karantaidis, George; Sideris, Michael G.
2017-04-01
A new method for identifying correlated errors in Gravity Recovery and Climate Experiment (GRACE) monthly harmonic coefficients has been developed and tested. Correlated errors are present in the differences between monthly GRACE solutions, and can be suppressed using a de-correlation filter. In principle, the de-correlation filter should be implemented only on coefficient series with correlated errors to avoid losing useful geophysical information. In previous studies, two main methods of implementing the de-correlation filter have been utilized. In the first one, the de-correlation filter is implemented starting from a specific minimum order until the maximum order of the monthly solution examined. In the second one, the de-correlation filter is implemented only on specific coefficient series, the selection of which is based on statistical testing. The method proposed in the present study exploits the capabilities of supervised machine learning algorithms such as neural networks and support vector machines (SVMs). The pattern of correlated errors can be described by several numerical and geometric features of the harmonic coefficient series. The features of extreme cases of both correlated and uncorrelated coefficients are extracted and used for the training of the machine learning algorithms. The trained machine learning algorithms are later used to identify correlated errors and provide the probability of a coefficient series to be correlated. Regarding SVMs algorithms, an extensive study is performed with various kernel functions in order to find the optimal training model for prediction. The selection of the optimal training model is based on the classification accuracy of the trained SVM algorithm on the same samples used for training. Results show excellent performance of all algorithms with a classification accuracy of 97% - 100% on a pre-selected set of training samples, both in the validation stage of the training procedure and in the subsequent use of the trained algorithms to classify independent coefficients. This accuracy is also confirmed by the external validation of the trained algorithms using the hydrology model GLDAS NOAH. The proposed method meet the requirement of identifying and de-correlating only coefficients with correlated errors. Also, there is no need of applying statistical testing or other techniques that require prior de-correlation of the harmonic coefficients.
Novel structures for Discrete Hartley Transform based on first-order moments
NASA Astrophysics Data System (ADS)
Xiong, Jun; Zheng, Wenjuan; Wang, Hao; Liu, Jianguo
2018-03-01
Discrete Hartley Transform (DHT) is an important tool in digital signal processing. In the present paper, the DHT is firstly transformed into the first-order moments-based form, then a new fast algorithm is proposed to calculate the first-order moments without multiplication. Based on the algorithm theory, the corresponding hardware architecture for DHT is proposed, which only contains shift operations and additions with no need for multipliers and large memory. To verify the availability and effectiveness, the proposed design is implemented with hardware description language and synthesized by Synopsys Design Compiler with 0.18-μm SMIC library. A series of experiments have proved that the proposed architecture has better performance in terms of the product of the hardware consumption and computation time.
Hassan, Ahnaf Rashik; Bhuiyan, Mohammed Imamul Hassan
2017-03-01
Automatic sleep staging is essential for alleviating the burden of the physicians of analyzing a large volume of data by visual inspection. It is also a precondition for making an automated sleep monitoring system feasible. Further, computerized sleep scoring will expedite large-scale data analysis in sleep research. Nevertheless, most of the existing works on sleep staging are either multichannel or multiple physiological signal based which are uncomfortable for the user and hinder the feasibility of an in-home sleep monitoring device. So, a successful and reliable computer-assisted sleep staging scheme is yet to emerge. In this work, we propose a single channel EEG based algorithm for computerized sleep scoring. In the proposed algorithm, we decompose EEG signal segments using Ensemble Empirical Mode Decomposition (EEMD) and extract various statistical moment based features. The effectiveness of EEMD and statistical features are investigated. Statistical analysis is performed for feature selection. A newly proposed classification technique, namely - Random under sampling boosting (RUSBoost) is introduced for sleep stage classification. This is the first implementation of EEMD in conjunction with RUSBoost to the best of the authors' knowledge. The proposed feature extraction scheme's performance is investigated for various choices of classification models. The algorithmic performance of our scheme is evaluated against contemporary works in the literature. The performance of the proposed method is comparable or better than that of the state-of-the-art ones. The proposed algorithm gives 88.07%, 83.49%, 92.66%, 94.23%, and 98.15% for 6-state to 2-state classification of sleep stages on Sleep-EDF database. Our experimental outcomes reveal that RUSBoost outperforms other classification models for the feature extraction framework presented in this work. Besides, the algorithm proposed in this work demonstrates high detection accuracy for the sleep states S1 and REM. Statistical moment based features in the EEMD domain distinguish the sleep states successfully and efficaciously. The automated sleep scoring scheme propounded herein can eradicate the onus of the clinicians, contribute to the device implementation of a sleep monitoring system, and benefit sleep research. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Akil, Mohamed
2017-05-01
The real-time processing is getting more and more important in many image processing applications. Image segmentation is one of the most fundamental tasks image analysis. As a consequence, many different approaches for image segmentation have been proposed. The watershed transform is a well-known image segmentation tool. The watershed transform is a very data intensive task. To achieve acceleration and obtain real-time processing of watershed algorithms, parallel architectures and programming models for multicore computing have been developed. This paper focuses on the survey of the approaches for parallel implementation of sequential watershed algorithms on multicore general purpose CPUs: homogeneous multicore processor with shared memory. To achieve an efficient parallel implementation, it's necessary to explore different strategies (parallelization/distribution/distributed scheduling) combined with different acceleration and optimization techniques to enhance parallelism. In this paper, we give a comparison of various parallelization of sequential watershed algorithms on shared memory multicore architecture. We analyze the performance measurements of each parallel implementation and the impact of the different sources of overhead on the performance of the parallel implementations. In this comparison study, we also discuss the advantages and disadvantages of the parallel programming models. Thus, we compare the OpenMP (an application programming interface for multi-Processing) with Ptheads (POSIX Threads) to illustrate the impact of each parallel programming model on the performance of the parallel implementations.
NASA Astrophysics Data System (ADS)
Lhamon, Michael Earl
A pattern recognition system which uses complex correlation filter banks requires proportionally more computational effort than single-real valued filters. This introduces increased computation burden but also introduces a higher level of parallelism, that common computing platforms fail to identify. As a result, we consider algorithm mapping to both optical and digital processors. For digital implementation, we develop computationally efficient pattern recognition algorithms, referred to as, vector inner product operators that require less computational effort than traditional fast Fourier methods. These algorithms do not need correlation and they map readily onto parallel digital architectures, which imply new architectures for optical processors. These filters exploit circulant-symmetric matrix structures of the training set data representing a variety of distortions. By using the same mathematical basis as with the vector inner product operations, we are able to extend the capabilities of more traditional correlation filtering to what we refer to as "Super Images". These "Super Images" are used to morphologically transform a complicated input scene into a predetermined dot pattern. The orientation of the dot pattern is related to the rotational distortion of the object of interest. The optical implementation of "Super Images" yields feature reduction necessary for using other techniques, such as artificial neural networks. We propose a parallel digital signal processor architecture based on specific pattern recognition algorithms but general enough to be applicable to other similar problems. Such an architecture is classified as a data flow architecture. Instead of mapping an algorithm to an architecture, we propose mapping the DSP architecture to a class of pattern recognition algorithms. Today's optical processing systems have difficulties implementing full complex filter structures. Typically, optical systems (like the 4f correlators) are limited to phase-only implementation with lower detection performance than full complex electronic systems. Our study includes pseudo-random pixel encoding techniques for approximating full complex filtering. Optical filter bank implementation is possible and they have the advantage of time averaging the entire filter bank at real time rates. Time-averaged optical filtering is computational comparable to billions of digital operations-per-second. For this reason, we believe future trends in high speed pattern recognition will involve hybrid architectures of both optical and DSP elements.
Parallelized seeded region growing using CUDA.
Park, Seongjin; Lee, Jeongjin; Lee, Hyunna; Shin, Juneseuk; Seo, Jinwook; Lee, Kyoung Ho; Shin, Yeong-Gil; Kim, Bohyoung
2014-01-01
This paper presents a novel method for parallelizing the seeded region growing (SRG) algorithm using Compute Unified Device Architecture (CUDA) technology, with intention to overcome the theoretical weakness of SRG algorithm of its computation time being directly proportional to the size of a segmented region. The segmentation performance of the proposed CUDA-based SRG is compared with SRG implementations on single-core CPUs, quad-core CPUs, and shader language programming, using synthetic datasets and 20 body CT scans. Based on the experimental results, the CUDA-based SRG outperforms the other three implementations, advocating that it can substantially assist the segmentation during massive CT screening tests.
Fu, Haohao; Shao, Xueguang; Chipot, Christophe; Cai, Wensheng
2016-08-09
Proper use of the adaptive biasing force (ABF) algorithm in free-energy calculations needs certain prerequisites to be met, namely, that the Jacobian for the metric transformation and its first derivative be available and the coarse variables be independent and fully decoupled from any holonomic constraint or geometric restraint, thereby limiting singularly the field of application of the approach. The extended ABF (eABF) algorithm circumvents these intrinsic limitations by applying the time-dependent bias onto a fictitious particle coupled to the coarse variable of interest by means of a stiff spring. However, with the current implementation of eABF in the popular molecular dynamics engine NAMD, a trajectory-based post-treatment is necessary to derive the underlying free-energy change. Usually, such a posthoc analysis leads to a decrease in the reliability of the free-energy estimates due to the inevitable loss of information, as well as to a drop in efficiency, which stems from substantial read-write accesses to file systems. We have developed a user-friendly, on-the-fly code for performing eABF simulations within NAMD. In the present contribution, this code is probed in eight illustrative examples. The performance of the algorithm is compared with traditional ABF, on the one hand, and the original eABF implementation combined with a posthoc analysis, on the other hand. Our results indicate that the on-the-fly eABF algorithm (i) supplies the correct free-energy landscape in those critical cases where the coarse variables at play are coupled to either each other or to geometric restraints or holonomic constraints, (ii) greatly improves the reliability of the free-energy change, compared to the outcome of a posthoc analysis, and (iii) represents a negligible additional computational effort compared to regular ABF. Moreover, in the proposed implementation, guidelines for choosing two parameters of the eABF algorithm, namely the stiffness of the spring and the mass of the fictitious particles, are proposed. The present on-the-fly eABF implementation can be viewed as the second generation of the ABF algorithm, expected to be widely utilized in the theoretical investigation of recognition and association phenomena relevant to physics, chemistry, and biology.
Three-dimensional near-field MIMO array imaging using range migration techniques.
Zhuge, Xiaodong; Yarovoy, Alexander G
2012-06-01
This paper presents a 3-D near-field imaging algorithm that is formulated for 2-D wideband multiple-input-multiple-output (MIMO) imaging array topology. The proposed MIMO range migration technique performs the image reconstruction procedure in the frequency-wavenumber domain. The algorithm is able to completely compensate the curvature of the wavefront in the near-field through a specifically defined interpolation process and provides extremely high computational efficiency by the application of the fast Fourier transform. The implementation aspects of the algorithm and the sampling criteria of a MIMO aperture are discussed. The image reconstruction performance and computational efficiency of the algorithm are demonstrated both with numerical simulations and measurements using 2-D MIMO arrays. Real-time 3-D near-field imaging can be achieved with a real-aperture array by applying the proposed MIMO range migration techniques.
PSO Algorithm for an Optimal Power Controller in a Microgrid
NASA Astrophysics Data System (ADS)
Al-Saedi, W.; Lachowicz, S.; Habibi, D.; Bass, O.
2017-07-01
This paper presents the Particle Swarm Optimization (PSO) algorithm to improve the quality of the power supply in a microgrid. This algorithm is proposed for a real-time selftuning method that used in a power controller for an inverter based Distributed Generation (DG) unit. In such system, the voltage and frequency are the main control objectives, particularly when the microgrid is islanded or during load change. In this work, the PSO algorithm is implemented to find the optimal controller parameters to satisfy the control objectives. The results show high performance of the applied PSO algorithm of regulating the microgrid voltage and frequency.
Parametric dense stereovision implementation on a system-on chip (SoC).
Gardel, Alfredo; Montejo, Pablo; García, Jorge; Bravo, Ignacio; Lázaro, José L
2012-01-01
This paper proposes a novel hardware implementation of a dense recovery of stereovision 3D measurements. Traditionally 3D stereo systems have imposed the maximum number of stereo correspondences, introducing a large restriction on artificial vision algorithms. The proposed system-on-chip (SoC) provides great performance and efficiency, with a scalable architecture available for many different situations, addressing real time processing of stereo image flow. Using double buffering techniques properly combined with pipelined processing, the use of reconfigurable hardware achieves a parametrisable SoC which gives the designer the opportunity to decide its right dimension and features. The proposed architecture does not need any external memory because the processing is done as image flow arrives. Our SoC provides 3D data directly without the storage of whole stereo images. Our goal is to obtain high processing speed while maintaining the accuracy of 3D data using minimum resources. Configurable parameters may be controlled by later/parallel stages of the vision algorithm executed on an embedded processor. Considering hardware FPGA clock of 100 MHz, image flows up to 50 frames per second (fps) of dense stereo maps of more than 30,000 depth points could be obtained considering 2 Mpix images, with a minimum initial latency. The implementation of computer vision algorithms on reconfigurable hardware, explicitly low level processing, opens up the prospect of its use in autonomous systems, and they can act as a coprocessor to reconstruct 3D images with high density information in real time.
Quantum exhaustive key search with simplified-DES as a case study.
Almazrooie, Mishal; Samsudin, Azman; Abdullah, Rosni; Mutter, Kussay N
2016-01-01
To evaluate the security of a symmetric cryptosystem against any quantum attack, the symmetric algorithm must be first implemented on a quantum platform. In this study, a quantum implementation of a classical block cipher is presented. A quantum circuit for a classical block cipher of a polynomial size of quantum gates is proposed. The entire work has been tested on a quantum mechanics simulator called libquantum. First, the functionality of the proposed quantum cipher is verified and the experimental results are compared with those of the original classical version. Then, quantum attacks are conducted by using Grover's algorithm to recover the secret key. The proposed quantum cipher is used as a black box for the quantum search. The quantum oracle is then queried over the produced ciphertext to mark the quantum state, which consists of plaintext and key qubits. The experimental results show that for a key of n-bit size and key space of N such that [Formula: see text], the key can be recovered in [Formula: see text] computational steps.
NASA Astrophysics Data System (ADS)
Chun, Tae Yoon; Lee, Jae Young; Park, Jin Bae; Choi, Yoon Ho
2018-06-01
In this paper, we propose two multirate generalised policy iteration (GPI) algorithms applied to discrete-time linear quadratic regulation problems. The proposed algorithms are extensions of the existing GPI algorithm that consists of the approximate policy evaluation and policy improvement steps. The two proposed schemes, named heuristic dynamic programming (HDP) and dual HDP (DHP), based on multirate GPI, use multi-step estimation (M-step Bellman equation) at the approximate policy evaluation step for estimating the value function and its gradient called costate, respectively. Then, we show that these two methods with the same update horizon can be considered equivalent in the iteration domain. Furthermore, monotonically increasing and decreasing convergences, so called value iteration (VI)-mode and policy iteration (PI)-mode convergences, are proved to hold for the proposed multirate GPIs. Further, general convergence properties in terms of eigenvalues are also studied. The data-driven online implementation methods for the proposed HDP and DHP are demonstrated and finally, we present the results of numerical simulations performed to verify the effectiveness of the proposed methods.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tarek Haddadin; Stephen Andrew Laraway; Arslan Majid
This paper proposes and presents the design and implementation of an underlay communication channel (UCC) for 5G cognitive mesh networks. The UCC builds its waveform based on filter bank multicarrier spread spectrum (FB-MCSS) signaling. The use of this novel spread spectrum signaling allows the device-to-device (D2D) user equipments (UEs) to communicate at a level well below noise temperature and hence, minimize taxation on macro-cell/small-cell base stations and their UEs in 5G wireless systems. Moreover, the use of filter banks allows us to avoid those portions of the spectrum that are in use by macro-cell and small-cell users. Hence, both D2D-to-cellularmore » and cellular-to-D2D interference will be very close to none. We propose a specific packet for UCC and develop algorithms for packet detection, timing acquisition and tracking, as well as channel estimation and equalization. We also present the detail of an implementation of the proposed transceiver on a software radio platform and compare our experimental results with those from a theoretical analysis of our packet detection algorithm.« less
Acceleration and torque feedback for robotic control - Experimental results
NASA Technical Reports Server (NTRS)
Mclnroy, John E.; Saridis, George N.
1990-01-01
Gross motion control of robotic manipulators typically requires significant on-line computations to compensate for nonlinear dynamics due to gravity, Coriolis, centripetal, and friction nonlinearities. One controller proposed by Luo and Saridis avoids these computations by feeding back joint acceleration and torque. This study implements the controller on a Puma 600 robotic manipulator. Joint acceleration measurement is obtained by measuring linear accelerations of each joint, and deriving a computationally efficient transformation from the linear measurements to the angular accelerations. Torque feedback is obtained by using the previous torque sent to the joints. The implementation has stability problems on the Puma 600 due to the extremely high gains inherent in the feedback structure. Since these high gains excite frequency modes in the Puma 600, the algorithm is modified to decrease the gain inherent in the feedback structure. The resulting compensator is stable and insensitive to high frequency unmodeled dynamics. Moreover, a second compensator is proposed which uses acceleration and torque feedback, but still allows nonlinear terms to be fed forward. Thus, by feeding the increment in the easily calculated gravity terms forward, improved responses are obtained. Both proposed compensators are implemented, and the real time results are compared to those obtained with the computed torque algorithm.
NASA Technical Reports Server (NTRS)
Lee, C. S. G.; Chen, C. L.
1989-01-01
Two efficient mapping algorithms for scheduling the robot inverse dynamics computation consisting of m computational modules with precedence relationship to be executed on a multiprocessor system consisting of p identical homogeneous processors with processor and communication costs to achieve minimum computation time are presented. An objective function is defined in terms of the sum of the processor finishing time and the interprocessor communication time. The minimax optimization is performed on the objective function to obtain the best mapping. This mapping problem can be formulated as a combination of the graph partitioning and the scheduling problems; both have been known to be NP-complete. Thus, to speed up the searching for a solution, two heuristic algorithms were proposed to obtain fast but suboptimal mapping solutions. The first algorithm utilizes the level and the communication intensity of the task modules to construct an ordered priority list of ready modules and the module assignment is performed by a weighted bipartite matching algorithm. For a near-optimal mapping solution, the problem can be solved by the heuristic algorithm with simulated annealing. These proposed optimization algorithms can solve various large-scale problems within a reasonable time. Computer simulations were performed to evaluate and verify the performance and the validity of the proposed mapping algorithms. Finally, experiments for computing the inverse dynamics of a six-jointed PUMA-like manipulator based on the Newton-Euler dynamic equations were implemented on an NCUBE/ten hypercube computer to verify the proposed mapping algorithms. Computer simulation and experimental results are compared and discussed.
NASA Astrophysics Data System (ADS)
Foreman-Mackey, Daniel; Hogg, David W.; Lang, Dustin; Goodman, Jonathan
2013-03-01
We introduce a stable, well tested Python implementation of the affine-invariant ensemble sampler for Markov chain Monte Carlo (MCMC) proposed by Goodman & Weare (2010). The code is open source and has already been used in several published projects in the astrophysics literature. The algorithm behind emcee has several advantages over traditional MCMC sampling methods and it has excellent performance as measured by the autocorrelation time (or function calls per independent sample). One major advantage of the algorithm is that it requires hand-tuning of only 1 or 2 parameters compared to ˜N2 for a traditional algorithm in an N-dimensional parameter space. In this document, we describe the algorithm and the details of our implementation. Exploiting the parallelism of the ensemble method, emcee permits any user to take advantage of multiple CPU cores without extra effort. The code is available online at http://dan.iel.fm/emcee under the GNU General Public License v2.
Crypto-Watermarking of Transmitted Medical Images.
Al-Haj, Ali; Mohammad, Ahmad; Amer, Alaa'
2017-02-01
Telemedicine is a booming healthcare practice that has facilitated the exchange of medical data and expertise between healthcare entities. However, the widespread use of telemedicine applications requires a secured scheme to guarantee confidentiality and verify authenticity and integrity of exchanged medical data. In this paper, we describe a region-based, crypto-watermarking algorithm capable of providing confidentiality, authenticity, and integrity for medical images of different modalities. The proposed algorithm provides authenticity by embedding robust watermarks in images' region of non-interest using SVD in the DWT domain. Integrity is provided in two levels: strict integrity implemented by a cryptographic hash watermark, and content-based integrity implemented by a symmetric encryption-based tamper localization scheme. Confidentiality is achieved as a byproduct of hiding patient's data in the image. Performance of the algorithm was evaluated with respect to imperceptibility, robustness, capacity, and tamper localization, using different medical images. The results showed the effectiveness of the algorithm in providing security for telemedicine applications.
Scheduling nursing personnel on a microcomputer.
Liao, C J; Kao, C Y
1997-01-01
Suggests that with the shortage of nursing personnel, hospital administrators have to pay more attention to the needs of nurses to retain and recruit them. Also asserts that improving nurses' schedules is one of the most economic ways for the hospital administration to create a better working environment for nurses. Develops an algorithm for scheduling nursing personnel. Contrary to the current hospital approach, which schedules nurses on a person-by-person basis, the proposed algorithm constructs schedules on a day-by-day basis. The algorithm has inherent flexibility in handling a variety of possible constraints and goals, similar to other non-cyclical approaches. But, unlike most other non-cyclical approaches, it can also generate a quality schedule in a short time on a microcomputer. The algorithm was coded in C language and run on a microcomputer. The developed software is currently implemented at a leading hospital in Taiwan. The response to the initial implementation is quite promising.
Heuristic-based scheduling algorithm for high level synthesis
NASA Technical Reports Server (NTRS)
Mohamed, Gulam; Tan, Han-Ngee; Chng, Chew-Lye
1992-01-01
A new scheduling algorithm is proposed which uses a combination of a resource utilization chart, a heuristic algorithm to estimate the minimum number of hardware units based on operator mobilities, and a list-scheduling technique to achieve fast and near optimal schedules. The schedule time of this algorithm is almost independent of the length of mobilities of operators as can be seen from the benchmark example (fifth order digital elliptical wave filter) presented when the cycle time was increased from 17 to 18 and then to 21 cycles. It is implemented in C on a SUN3/60 workstation.
Simple Common Plane contact algorithm for explicit FE/FD methods
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vorobiev, O
2006-12-18
Common-plane (CP) algorithm is widely used in Discrete Element Method (DEM) to model contact forces between interacting particles or blocks. A new simple contact algorithm is proposed to model contacts in FE/FD methods which is similar to the CP algorithm. The CP is defined as a plane separating interacting faces of FE/FD mesh instead of blocks or particles used in the original CP method. The new method does not require iterations even for very stiff contacts. It is very robust and easy to implement both in 2D and 3D parallel codes.
Algorithmic Perspectives on Problem Formulations in MDO
NASA Technical Reports Server (NTRS)
Alexandrov, Natalia M.; Lewis, Robert Michael
2000-01-01
This work is concerned with an approach to formulating the multidisciplinary optimization (MDO) problem that reflects an algorithmic perspective on MDO problem solution. The algorithmic perspective focuses on formulating the problem in light of the abilities and inabilities of optimization algorithms, so that the resulting nonlinear programming problem can be solved reliably and efficiently by conventional optimization techniques. We propose a modular approach to formulating MDO problems that takes advantage of the problem structure, maximizes the autonomy of implementation, and allows for multiple easily interchangeable problem statements to be used depending on the available resources and the characteristics of the application problem.
An image compression survey and algorithm switching based on scene activity
NASA Technical Reports Server (NTRS)
Hart, M. M.
1985-01-01
Data compression techniques are presented. A description of these techniques is provided along with a performance evaluation. The complexity of the hardware resulting from their implementation is also addressed. The compression effect on channel distortion and the applicability of these algorithms to real-time processing are presented. Also included is a proposed new direction for an adaptive compression technique for real-time processing.
Generalization Analysis of Fredholm Kernel Regularized Classifiers.
Gong, Tieliang; Xu, Zongben; Chen, Hong
2017-07-01
Recently, a new framework, Fredholm learning, was proposed for semisupervised learning problems based on solving a regularized Fredholm integral equation. It allows a natural way to incorporate unlabeled data into learning algorithms to improve their prediction performance. Despite rapid progress on implementable algorithms with theoretical guarantees, the generalization ability of Fredholm kernel learning has not been studied. In this letter, we focus on investigating the generalization performance of a family of classification algorithms, referred to as Fredholm kernel regularized classifiers. We prove that the corresponding learning rate can achieve [Formula: see text] ([Formula: see text] is the number of labeled samples) in a limiting case. In addition, a representer theorem is provided for the proposed regularized scheme, which underlies its applications.
Implementation of a Space Communications Cognitive Engine
NASA Technical Reports Server (NTRS)
Hackett, Timothy M.; Bilen, Sven G.; Ferreira, Paulo Victor R.; Wyglinski, Alexander M.; Reinhart, Richard C.
2017-01-01
Although communications-based cognitive engines have been proposed, very few have been implemented in a full system, especially in a space communications system. In this paper, we detail the implementation of a multi-objective reinforcement-learning algorithm and deep artificial neural networks for the use as a radio-resource-allocation controller. The modular software architecture presented encourages re-use and easy modification for trying different algorithms. Various trade studies involved with the system implementation and integration are discussed. These include the choice of software libraries that provide platform flexibility and promote reusability, choices regarding the deployment of this cognitive engine within a system architecture using the DVB-S2 standard and commercial hardware, and constraints placed on the cognitive engine caused by real-world radio constraints. The implemented radio-resource allocation-management controller was then integrated with the larger spaceground system developed by NASA Glenn Research Center (GRC).
Abid, Abdulbasit
2013-03-01
This paper presents a thorough discussion of the proposed field-programmable gate array (FPGA) implementation for fringe pattern demodulation using the one-dimensional continuous wavelet transform (1D-CWT) algorithm. This algorithm is also known as wavelet transform profilometry. Initially, the 1D-CWT is programmed using the C programming language and compiled into VHDL using the ImpulseC tool. This VHDL code is implemented on the Altera Cyclone IV GX EP4CGX150DF31C7 FPGA. A fringe pattern image with a size of 512×512 pixels is presented to the FPGA, which processes the image using the 1D-CWT algorithm. The FPGA requires approximately 100 ms to process the image and produce a wrapped phase map. For performance comparison purposes, the 1D-CWT algorithm is programmed using the C language. The C code is then compiled using the Intel compiler version 13.0. The compiled code is run on a Dell Precision state-of-the-art workstation. The time required to process the fringe pattern image is approximately 1 s. In order to further reduce the execution time, the 1D-CWT is reprogramed using Intel Integrated Primitive Performance (IPP) Library Version 7.1. The execution time was reduced to approximately 650 ms. This confirms that at least sixfold speedup was gained using FPGA implementation over a state-of-the-art workstation that executes heavily optimized implementation of the 1D-CWT algorithm.
NASA Astrophysics Data System (ADS)
Hernandez, Monica
2017-12-01
This paper proposes a method for primal-dual convex optimization in variational large deformation diffeomorphic metric mapping problems formulated with robust regularizers and robust image similarity metrics. The method is based on Chambolle and Pock primal-dual algorithm for solving general convex optimization problems. Diagonal preconditioning is used to ensure the convergence of the algorithm to the global minimum. We consider three robust regularizers liable to provide acceptable results in diffeomorphic registration: Huber, V-Huber and total generalized variation. The Huber norm is used in the image similarity term. The primal-dual equations are derived for the stationary and the non-stationary parameterizations of diffeomorphisms. The resulting algorithms have been implemented for running in the GPU using Cuda. For the most memory consuming methods, we have developed a multi-GPU implementation. The GPU implementations allowed us to perform an exhaustive evaluation study in NIREP and LPBA40 databases. The experiments showed that, for all the considered regularizers, the proposed method converges to diffeomorphic solutions while better preserving discontinuities at the boundaries of the objects compared to baseline diffeomorphic registration methods. In most cases, the evaluation showed a competitive performance for the robust regularizers, close to the performance of the baseline diffeomorphic registration methods.
Implementation Issues of Adaptive Energy Detection in Heterogeneous Wireless Networks
Sobron, Iker; Eizmendi, Iñaki; Martins, Wallace A.; Diniz, Paulo S. R.; Ordiales, Juan Luis; Velez, Manuel
2017-01-01
Spectrum sensing (SS) enables the coexistence of non-coordinated heterogeneous wireless systems operating in the same band. Due to its computational simplicity, energy detection (ED) technique has been widespread employed in SS applications; nonetheless, the conventional ED may be unreliable under environmental impairments, justifying the use of ED-based variants. Assessing ED algorithms from theoretical and simulation viewpoints relies on several assumptions and simplifications which, eventually, lead to conclusions that do not necessarily meet the requirements imposed by real propagation environments. This work addresses those problems by dealing with practical implementation issues of adaptive least mean square (LMS)-based ED algorithms. The paper proposes a new adaptive ED algorithm that uses a variable step-size guaranteeing the LMS convergence in time-varying environments. Several implementation guidelines are provided and, additionally, an empirical assessment and validation with a software defined radio-based hardware is carried out. Experimental results show good performance in terms of probabilities of detection (Pd>0.9) and false alarm (Pf∼0.05) in a range of low signal-to-noise ratios around [-4,1] dB, in both single-node and cooperative modes. The proposed sensing methodology enables a seamless monitoring of the radio electromagnetic spectrum in order to provide band occupancy information for an efficient usage among several wireless communications systems. PMID:28441751
Experiences with serial and parallel algorithms for channel routing using simulated annealing
NASA Technical Reports Server (NTRS)
Brouwer, Randall Jay
1988-01-01
Two algorithms for channel routing using simulated annealing are presented. Simulated annealing is an optimization methodology which allows the solution process to back up out of local minima that may be encountered by inappropriate selections. By properly controlling the annealing process, it is very likely that the optimal solution to an NP-complete problem such as channel routing may be found. The algorithm presented proposes very relaxed restrictions on the types of allowable transformations, including overlapping nets. By freeing that restriction and controlling overlap situations with an appropriate cost function, the algorithm becomes very flexible and can be applied to many extensions of channel routing. The selection of the transformation utilizes a number of heuristics, still retaining the pseudorandom nature of simulated annealing. The algorithm was implemented as a serial program for a workstation, and a parallel program designed for a hypercube computer. The details of the serial implementation are presented, including many of the heuristics used and some of the resulting solutions.
Parallel algorithm of real-time infrared image restoration based on total variation theory
NASA Astrophysics Data System (ADS)
Zhu, Ran; Li, Miao; Long, Yunli; Zeng, Yaoyuan; An, Wei
2015-10-01
Image restoration is a necessary preprocessing step for infrared remote sensing applications. Traditional methods allow us to remove the noise but penalize too much the gradients corresponding to edges. Image restoration techniques based on variational approaches can solve this over-smoothing problem for the merits of their well-defined mathematical modeling of the restore procedure. The total variation (TV) of infrared image is introduced as a L1 regularization term added to the objective energy functional. It converts the restoration process to an optimization problem of functional involving a fidelity term to the image data plus a regularization term. Infrared image restoration technology with TV-L1 model exploits the remote sensing data obtained sufficiently and preserves information at edges caused by clouds. Numerical implementation algorithm is presented in detail. Analysis indicates that the structure of this algorithm can be easily implemented in parallelization. Therefore a parallel implementation of the TV-L1 filter based on multicore architecture with shared memory is proposed for infrared real-time remote sensing systems. Massive computation of image data is performed in parallel by cooperating threads running simultaneously on multiple cores. Several groups of synthetic infrared image data are used to validate the feasibility and effectiveness of the proposed parallel algorithm. Quantitative analysis of measuring the restored image quality compared to input image is presented. Experiment results show that the TV-L1 filter can restore the varying background image reasonably, and that its performance can achieve the requirement of real-time image processing.
NASA Astrophysics Data System (ADS)
Zhao, Liang; Huang, Shoudong; Dissanayake, Gamini
2018-07-01
This paper presents a novel hierarchical approach to solving structure-from-motion (SFM) problems. The algorithm begins with small local reconstructions based on nonlinear bundle adjustment (BA). These are then joined in a hierarchical manner using a strategy that requires solving a linear least squares optimization problem followed by a nonlinear transform. The algorithm can handle ordered monocular and stereo image sequences. Two stereo images or three monocular images are adequate for building each initial reconstruction. The bulk of the computation involves solving a linear least squares problem and, therefore, the proposed algorithm avoids three major issues associated with most of the nonlinear optimization algorithms currently used for SFM: the need for a reasonably accurate initial estimate, the need for iterations, and the possibility of being trapped in a local minimum. Also, by summarizing all the original observations into the small local reconstructions with associated information matrices, the proposed Linear SFM manages to preserve all the information contained in the observations. The paper also demonstrates that the proposed problem formulation results in a sparse structure that leads to an efficient numerical implementation. The experimental results using publicly available datasets show that the proposed algorithm yields solutions that are very close to those obtained using a global BA starting with an accurate initial estimate. The C/C++ source code of the proposed algorithm is publicly available at https://github.com/LiangZhaoPKUImperial/LinearSFM.
Biyikli, Emre; To, Albert C.
2015-01-01
A new topology optimization method called the Proportional Topology Optimization (PTO) is presented. As a non-sensitivity method, PTO is simple to understand, easy to implement, and is also efficient and accurate at the same time. It is implemented into two MATLAB programs to solve the stress constrained and minimum compliance problems. Descriptions of the algorithm and computer programs are provided in detail. The method is applied to solve three numerical examples for both types of problems. The method shows comparable efficiency and accuracy with an existing optimality criteria method which computes sensitivities. Also, the PTO stress constrained algorithm and minimum compliance algorithm are compared by feeding output from one algorithm to the other in an alternative manner, where the former yields lower maximum stress and volume fraction but higher compliance compared to the latter. Advantages and disadvantages of the proposed method and future works are discussed. The computer programs are self-contained and publicly shared in the website www.ptomethod.org. PMID:26678849
Escalated convergent artificial bee colony
NASA Astrophysics Data System (ADS)
Jadon, Shimpi Singh; Bansal, Jagdish Chand; Tiwari, Ritu
2016-03-01
Artificial bee colony (ABC) optimisation algorithm is a recent, fast and easy-to-implement population-based meta heuristic for optimisation. ABC has been proved a rival algorithm with some popular swarm intelligence-based algorithms such as particle swarm optimisation, firefly algorithm and ant colony optimisation. The solution search equation of ABC is influenced by a random quantity which helps its search process in exploration at the cost of exploitation. In order to find a fast convergent behaviour of ABC while exploitation capability is maintained, in this paper basic ABC is modified in two ways. First, to improve exploitation capability, two local search strategies, namely classical unidimensional local search and levy flight random walk-based local search are incorporated with ABC. Furthermore, a new solution search strategy, namely stochastic diffusion scout search is proposed and incorporated into the scout bee phase to provide more chance to abandon solution to improve itself. Efficiency of the proposed algorithm is tested on 20 benchmark test functions of different complexities and characteristics. Results are very promising and they prove it to be a competitive algorithm in the field of swarm intelligence-based algorithms.
A fast bilinear structure from motion algorithm using a video sequence and inertial sensors.
Ramachandran, Mahesh; Veeraraghavan, Ashok; Chellappa, Rama
2011-01-01
In this paper, we study the benefits of the availability of a specific form of additional information—the vertical direction (gravity) and the height of the camera, both of which can be conveniently measured using inertial sensors and a monocular video sequence for 3D urban modeling. We show that in the presence of this information, the SfM equations can be rewritten in a bilinear form. This allows us to derive a fast, robust, and scalable SfM algorithm for large scale applications. The SfM algorithm developed in this paper is experimentally demonstrated to have favorable properties compared to the sparse bundle adjustment algorithm. We provide experimental evidence indicating that the proposed algorithm converges in many cases to solutions with lower error than state-of-art implementations of bundle adjustment. We also demonstrate that for the case of large reconstruction problems, the proposed algorithm takes lesser time to reach its solution compared to bundle adjustment. We also present SfM results using our algorithm on the Google StreetView research data set.
Epidemic failure detection and consensus for extreme parallelism
Katti, Amogh; Di Fatta, Giuseppe; Naughton, Thomas; ...
2017-02-01
Future extreme-scale high-performance computing systems will be required to work under frequent component failures. The MPI Forum s User Level Failure Mitigation proposal has introduced an operation, MPI Comm shrink, to synchronize the alive processes on the list of failed processes, so that applications can continue to execute even in the presence of failures by adopting algorithm-based fault tolerance techniques. This MPI Comm shrink operation requires a failure detection and consensus algorithm. This paper presents three novel failure detection and consensus algorithms using Gossiping. The proposed algorithms were implemented and tested using the Extreme-scale Simulator. The results show that inmore » all algorithms the number of Gossip cycles to achieve global consensus scales logarithmically with system size. The second algorithm also shows better scalability in terms of memory and network bandwidth usage and a perfect synchronization in achieving global consensus. The third approach is a three-phase distributed failure detection and consensus algorithm and provides consistency guarantees even in very large and extreme-scale systems while at the same time being memory and bandwidth efficient.« less
Log-linear model based behavior selection method for artificial fish swarm algorithm.
Huang, Zhehuang; Chen, Yidong
2015-01-01
Artificial fish swarm algorithm (AFSA) is a population based optimization technique inspired by social behavior of fishes. In past several years, AFSA has been successfully applied in many research and application areas. The behavior of fishes has a crucial impact on the performance of AFSA, such as global exploration ability and convergence speed. How to construct and select behaviors of fishes are an important task. To solve these problems, an improved artificial fish swarm algorithm based on log-linear model is proposed and implemented in this paper. There are three main works. Firstly, we proposed a new behavior selection algorithm based on log-linear model which can enhance decision making ability of behavior selection. Secondly, adaptive movement behavior based on adaptive weight is presented, which can dynamically adjust according to the diversity of fishes. Finally, some new behaviors are defined and introduced into artificial fish swarm algorithm at the first time to improve global optimization capability. The experiments on high dimensional function optimization showed that the improved algorithm has more powerful global exploration ability and reasonable convergence speed compared with the standard artificial fish swarm algorithm.
FPGA implementation of sparse matrix algorithm for information retrieval
NASA Astrophysics Data System (ADS)
Bojanic, Slobodan; Jevtic, Ruzica; Nieto-Taladriz, Octavio
2005-06-01
Information text data retrieval requires a tremendous amount of processing time because of the size of the data and the complexity of information retrieval algorithms. In this paper the solution to this problem is proposed via hardware supported information retrieval algorithms. Reconfigurable computing may adopt frequent hardware modifications through its tailorable hardware and exploits parallelism for a given application through reconfigurable and flexible hardware units. The degree of the parallelism can be tuned for data. In this work we implemented standard BLAS (basic linear algebra subprogram) sparse matrix algorithm named Compressed Sparse Row (CSR) that is showed to be more efficient in terms of storage space requirement and query-processing timing over the other sparse matrix algorithms for information retrieval application. Although inverted index algorithm is treated as the de facto standard for information retrieval for years, an alternative approach to store the index of text collection in a sparse matrix structure gains more attention. This approach performs query processing using sparse matrix-vector multiplication and due to parallelization achieves a substantial efficiency over the sequential inverted index. The parallel implementations of information retrieval kernel are presented in this work targeting the Virtex II Field Programmable Gate Arrays (FPGAs) board from Xilinx. A recent development in scientific applications is the use of FPGA to achieve high performance results. Computational results are compared to implementations on other platforms. The design achieves a high level of parallelism for the overall function while retaining highly optimised hardware within processing unit.
RACER: Effective Race Detection Using AspectJ
NASA Technical Reports Server (NTRS)
Bodden, Eric; Havelund, Klaus
2008-01-01
The limits of coding with joint constraints on detected and undetected error rates Programming errors occur frequently in large software systems, and even more so if these systems are concurrent. In the past, researchers have developed specialized programs to aid programmers detecting concurrent programming errors such as deadlocks, livelocks, starvation and data races. In this work we propose a language extension to the aspect-oriented programming language AspectJ, in the form of three new built-in pointcuts, lock(), unlock() and may be Shared(), which allow programmers to monitor program events where locks are granted or handed back, and where values are accessed that may be shared amongst multiple Java threads. We decide thread-locality using a static thread-local objects analysis developed by others. Using the three new primitive pointcuts, researchers can directly implement efficient monitoring algorithms to detect concurrent programming errors online. As an example, we expose a new algorithm which we call RACER, an adoption of the well-known ERASER algorithm to the memory model of Java. We implemented the new pointcuts as an extension to the Aspect Bench Compiler, implemented the RACER algorithm using this language extension and then applied the algorithm to the NASA K9 Rover Executive. Our experiments proved our implementation very effective. In the Rover Executive RACER finds 70 data races. Only one of these races was previously known.We further applied the algorithm to two other multi-threaded programs written by Computer Science researchers, in which we found races as well.
Intelligent deflection routing in buffer-less networks.
Haeri, Soroush; Trajković, Ljiljana
2015-02-01
Deflection routing is employed to ameliorate packet loss caused by contention in buffer-less architectures such as optical burst-switched networks. The main goal of deflection routing is to successfully deflect a packet based only on a limited knowledge that network nodes possess about their environment. In this paper, we present a framework that introduces intelligence to deflection routing (iDef). iDef decouples the design of the signaling infrastructure from the underlying learning algorithm. It consists of a signaling and a decision-making module. Signaling module implements a feedback management protocol while the decision-making module implements a reinforcement learning algorithm. We also propose several learning-based deflection routing protocols, implement them in iDef using the ns-3 network simulator, and compare their performance.
Hardware friendly probabilistic spiking neural network with long-term and short-term plasticity.
Hsieh, Hung-Yi; Tang, Kea-Tiong
2013-12-01
This paper proposes a probabilistic spiking neural network (PSNN) with unimodal weight distribution, possessing long- and short-term plasticity. The proposed algorithm is derived by both the arithmetic gradient decent calculation and bioinspired algorithms. The algorithm is benchmarked by the Iris and Wisconsin breast cancer (WBC) data sets. The network features fast convergence speed and high accuracy. In the experiment, the PSNN took not more than 40 epochs for convergence. The average testing accuracy for Iris and WBC data is 96.7% and 97.2%, respectively. To test the usefulness of the PSNN to real world application, the PSNN was also tested with the odor data, which was collected by our self-developed electronic nose (e-nose). Compared with the algorithm (K-nearest neighbor) that has the highest classification accuracy in the e-nose for the same odor data, the classification accuracy of the PSNN is only 1.3% less but the memory requirement can be reduced at least 40%. All the experiments suggest that the PSNN is hardware friendly. First, it requires only nine-bits weight resolution for training and testing. Second, the PSNN can learn complex data sets with a little number of neurons that in turn reduce the cost of VLSI implementation. In addition, the algorithm is insensitive to synaptic noise and the parameter variation induced by the VLSI fabrication. Therefore, the algorithm can be implemented by either software or hardware, making it suitable for wider application.
Low complexity pixel-based halftone detection
NASA Astrophysics Data System (ADS)
Ok, Jiheon; Han, Seong Wook; Jarno, Mielikainen; Lee, Chulhee
2011-10-01
With the rapid advances of the internet and other multimedia technologies, the digital document market has been growing steadily. Since most digital images use halftone technologies, quality degradation occurs when one tries to scan and reprint them. Therefore, it is necessary to extract the halftone areas to produce high quality printing. In this paper, we propose a low complexity pixel-based halftone detection algorithm. For each pixel, we considered a surrounding block. If the block contained any flat background regions, text, thin lines, or continuous or non-homogeneous regions, the pixel was classified as a non-halftone pixel. After excluding those non-halftone pixels, the remaining pixels were considered to be halftone pixels. Finally, documents were classified as pictures or photo documents by calculating the halftone pixel ratio. The proposed algorithm proved to be memory-efficient and required low computation costs. The proposed algorithm was easily implemented using GPU.
Automatic identification of epileptic seizures from EEG signals using linear programming boosting.
Hassan, Ahnaf Rashik; Subasi, Abdulhamit
2016-11-01
Computerized epileptic seizure detection is essential for expediting epilepsy diagnosis and research and for assisting medical professionals. Moreover, the implementation of an epilepsy monitoring device that has low power and is portable requires a reliable and successful seizure detection scheme. In this work, the problem of automated epilepsy seizure detection using singe-channel EEG signals has been addressed. At first, segments of EEG signals are decomposed using a newly proposed signal processing scheme, namely complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN). Six spectral moments are extracted from the CEEMDAN mode functions and train and test matrices are formed afterward. These matrices are fed into the classifier to identify epileptic seizures from EEG signal segments. In this work, we implement an ensemble learning based machine learning algorithm, namely linear programming boosting (LPBoost) to perform classification. The efficacy of spectral features in the CEEMDAN domain is validated by graphical and statistical analyses. The performance of CEEMDAN is compared to those of its predecessors to further inspect its suitability. The effectiveness and the appropriateness of LPBoost are demonstrated as opposed to the commonly used classification models. Resubstitution and 10 fold cross-validation error analyses confirm the superior algorithm performance of the proposed scheme. The algorithmic performance of our epilepsy seizure identification scheme is also evaluated against state-of-the-art works in the literature. Experimental outcomes manifest that the proposed seizure detection scheme performs better than the existing works in terms of accuracy, sensitivity, specificity, and Cohen's Kappa coefficient. It can be anticipated that owing to its use of only one channel of EEG signal, the proposed method will be suitable for device implementation, eliminate the onus of clinicians for analyzing a large bulk of data manually, and expedite epilepsy diagnosis. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
A splay tree-based approach for efficient resource location in P2P networks.
Zhou, Wei; Tan, Zilong; Yao, Shaowen; Wang, Shipu
2014-01-01
Resource location in structured P2P system has a critical influence on the system performance. Existing analytical studies of Chord protocol have shown some potential improvements in performance. In this paper a splay tree-based new Chord structure called SChord is proposed to improve the efficiency of locating resources. We consider a novel implementation of the Chord finger table (routing table) based on the splay tree. This approach extends the Chord finger table with additional routing entries. Adaptive routing algorithm is proposed for implementation, and it can be shown that hop count is significantly minimized without introducing any other protocol overheads. We analyze the hop count of the adaptive routing algorithm, as compared to Chord variants, and demonstrate sharp upper and lower bounds for both worst-case and average case settings. In addition, we theoretically analyze the hop reducing in SChord and derive the fact that SChord can significantly reduce the routing hops as compared to Chord. Several simulations are presented to evaluate the performance of the algorithm and support our analytical findings. The simulation results show the efficiency of SChord.
Li, Hong; Liu, Mingyong; Liu, Kun; Zhang, Feihu
2017-12-25
By simulating the geomagnetic fields and analyzing thevariation of intensities, this paper presents a model for calculating the objective function ofan Autonomous Underwater Vehicle (AUV)geomagnetic navigation task. By investigating the biologically inspired strategies, the AUV successfullyreachesthe destination duringgeomagnetic navigation without using the priori geomagnetic map. Similar to the pattern of a flatworm, the proposed algorithm relies on a motion pattern to trigger a local searching strategy by detecting the real-time geomagnetic intensity. An adapted strategy is then implemented, which is biased on the specific target. The results show thereliabilityandeffectivenessofthe proposed algorithm.
The controlled growth method - A tool for structural optimization
NASA Technical Reports Server (NTRS)
Hajela, P.; Sobieszczanski-Sobieski, J.
1981-01-01
An adaptive design variable linking scheme in a NLP based optimization algorithm is proposed and evaluated for feasibility of application. The present scheme, based on an intuitive effectiveness measure for each variable, differs from existing methodology in that a single dominant variable controls the growth of all others in a prescribed optimization cycle. The proposed method is implemented for truss assemblies and a wing box structure for stress, displacement and frequency constraints. Substantial reduction in computational time, even more so for structures under multiple load conditions, coupled with a minimal accompanying loss in accuracy, vindicates the algorithm.
A Two-Stage Reconstruction Processor for Human Detection in Compressive Sensing CMOS Radar.
Tsao, Kuei-Chi; Lee, Ling; Chu, Ta-Shun; Huang, Yuan-Hao
2018-04-05
Complementary metal-oxide-semiconductor (CMOS) radar has recently gained much research attraction because small and low-power CMOS devices are very suitable for deploying sensing nodes in a low-power wireless sensing system. This study focuses on the signal processing of a wireless CMOS impulse radar system that can detect humans and objects in the home-care internet-of-things sensing system. The challenges of low-power CMOS radar systems are the weakness of human signals and the high computational complexity of the target detection algorithm. The compressive sensing-based detection algorithm can relax the computational costs by avoiding the utilization of matched filters and reducing the analog-to-digital converter bandwidth requirement. The orthogonal matching pursuit (OMP) is one of the popular signal reconstruction algorithms for compressive sensing radar; however, the complexity is still very high because the high resolution of human respiration leads to high-dimension signal reconstruction. Thus, this paper proposes a two-stage reconstruction algorithm for compressive sensing radar. The proposed algorithm not only has lower complexity than the OMP algorithm by 75% but also achieves better positioning performance than the OMP algorithm especially in noisy environments. This study also designed and implemented the algorithm by using Vertex-7 FPGA chip (Xilinx, San Jose, CA, USA). The proposed reconstruction processor can support the 256 × 13 real-time radar image display with a throughput of 28.2 frames per second.
NASA Astrophysics Data System (ADS)
Suja Priyadharsini, S.; Edward Rajan, S.; Femilin Sheniha, S.
2016-03-01
Electroencephalogram (EEG) is the recording of electrical activities of the brain. It is contaminated by other biological signals, such as cardiac signal (electrocardiogram), signals generated by eye movement/eye blinks (electrooculogram) and muscular artefact signal (electromyogram), called artefacts. Optimisation is an important tool for solving many real-world problems. In the proposed work, artefact removal, based on the adaptive neuro-fuzzy inference system (ANFIS) is employed, by optimising the parameters of ANFIS. Artificial Immune System (AIS) algorithm is used to optimise the parameters of ANFIS (ANFIS-AIS). Implementation results depict that ANFIS-AIS is effective in removing artefacts from EEG signal than ANFIS. Furthermore, in the proposed work, improved AIS (IAIS) is developed by including suitable selection processes in the AIS algorithm. The performance of the proposed method IAIS is compared with AIS and with genetic algorithm (GA). Measures such as signal-to-noise ratio, mean square error (MSE) value, correlation coefficient, power spectrum density plot and convergence time are used for analysing the performance of the proposed method. From the results, it is found that the IAIS algorithm converges faster than the AIS and performs better than the AIS and GA. Hence, IAIS tuned ANFIS (ANFIS-IAIS) is effective in removing artefacts from EEG signals.
Kramers-Kronig receiver operable without digital upsampling.
Bo, Tianwai; Kim, Hoon
2018-05-28
The Kramers-Kronig (KK) receiver is capable of retrieving the phase information of optical single-sideband (SSB) signal from the optical intensity when the optical signal satisfies the minimum phase condition. Thus, it is possible to direct-detect the optical SSB signal without suffering from the signal-signal beat interference and linear transmission impairments. However, due to the spectral broadening induced by nonlinear operations in the conventional KK algorithm, it is necessary to employ the digital upsampling at the beginning of the digital signal processing (DSP). The increased number of samples at the DSP would hinder the real-time implementation of this attractive receiver. Hence, we propose a new DSP algorithm for KK receiver operable at 2 samples per symbol. We adopt a couple of mathematical approximations to avoid the use of nonlinear operations such as logarithm and exponential functions. By using the proposed algorithm, we demonstrate the transmission of 112-Gb/s SSB orthogonal frequency-division-multiplexed signal over an 80-km fiber link. The results show that the proposed algorithm operating at 2 samples per symbol exhibits similar performance to the conventional KK one operating at 6 samples per symbol. We also present the error analysis of the proposed algorithm for KK receiver in comparison with the conventional one.
Using MaxCompiler for the high level synthesis of trigger algorithms
NASA Astrophysics Data System (ADS)
Summers, S.; Rose, A.; Sanders, P.
2017-02-01
Firmware for FPGA trigger applications at the CMS experiment is conventionally written using hardware description languages such as Verilog and VHDL. MaxCompiler is an alternative, Java based, tool for developing FPGA applications which uses a higher level of abstraction from the hardware than a hardware description language. An implementation of the jet and energy sum algorithms for the CMS Level-1 calorimeter trigger has been written using MaxCompiler to benchmark against the VHDL implementation in terms of accuracy, latency, resource usage, and code size. A Kalman Filter track fitting algorithm has been developed using MaxCompiler for a proposed CMS Level-1 track trigger for the High-Luminosity LHC upgrade. The design achieves a low resource usage, and has a latency of 187.5 ns per iteration.
Modular multiplication in GF(p) for public-key cryptography
NASA Astrophysics Data System (ADS)
Olszyna, Jakub
Modular multiplication forms the basis of modular exponentiation which is the core operation of the RSA cryptosystem. It is also present in many other cryptographic algorithms including those based on ECC and HECC. Hence, an efficient implementation of PKC relies on efficient implementation of modular multiplication. The paper presents a survey of most common algorithms for modular multiplication along with hardware architectures especially suitable for cryptographic applications in energy constrained environments. The motivation for studying low-power and areaefficient modular multiplication algorithms comes from enabling public-key security for ultra-low power devices that can perform under constrained environments like wireless sensor networks. Serial architectures for GF(p) are analyzed and presented. Finally proposed architectures are verified and compared according to the amount of power dissipated throughout the operation.
Layer-oriented multigrid wavefront reconstruction algorithms for multi-conjugate adaptive optics
NASA Astrophysics Data System (ADS)
Gilles, Luc; Ellerbroek, Brent L.; Vogel, Curtis R.
2003-02-01
Multi-conjugate adaptive optics (MCAO) systems with 104-105 degrees of freedom have been proposed for future giant telescopes. Using standard matrix methods to compute, optimize, and implement wavefront control algorithms for these systems is impractical, since the number of calculations required to compute and apply the reconstruction matrix scales respectively with the cube and the square of the number of AO degrees of freedom. In this paper, we develop an iterative sparse matrix implementation of minimum variance wavefront reconstruction for telescope diameters up to 32m with more than 104 actuators. The basic approach is the preconditioned conjugate gradient method, using a multigrid preconditioner incorporating a layer-oriented (block) symmetric Gauss-Seidel iterative smoothing operator. We present open-loop numerical simulation results to illustrate algorithm convergence.
A Parallel Saturation Algorithm on Shared Memory Architectures
NASA Technical Reports Server (NTRS)
Ezekiel, Jonathan; Siminiceanu
2007-01-01
Symbolic state-space generators are notoriously hard to parallelize. However, the Saturation algorithm implemented in the SMART verification tool differs from other sequential symbolic state-space generators in that it exploits the locality of ring events in asynchronous system models. This paper explores whether event locality can be utilized to efficiently parallelize Saturation on shared-memory architectures. Conceptually, we propose to parallelize the ring of events within a decision diagram node, which is technically realized via a thread pool. We discuss the challenges involved in our parallel design and conduct experimental studies on its prototypical implementation. On a dual-processor dual core PC, our studies show speed-ups for several example models, e.g., of up to 50% for a Kanban model, when compared to running our algorithm only on a single core.
Asquith, William H.
2014-01-01
The implementation characteristics of two method of L-moments (MLM) algorithms for parameter estimation of the 4-parameter Asymmetric Exponential Power (AEP4) distribution are studied using the R environment for statistical computing. The objective is to validate the algorithms for general application of the AEP4 using R. An algorithm was introduced in the original study of the L-moments for the AEP4. A second or alternative algorithm is shown to have a larger L-moment-parameter domain than the original. The alternative algorithm is shown to provide reliable parameter production and recovery of L-moments from fitted parameters. A proposal is made for AEP4 implementation in conjunction with the 4-parameter Kappa distribution to create a mixed-distribution framework encompassing the joint L-skew and L-kurtosis domains. The example application provides a demonstration of pertinent algorithms with L-moment statistics and two 4-parameter distributions (AEP4 and the Generalized Lambda) for MLM fitting to a modestly asymmetric and heavy-tailed dataset using R.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Boyer, M. D.; Andre, R.; Gates, D. A.
The high-performance operational goals of NSTX-U will require development of advanced feedback control algorithms, including control of ßN and the safety factor profile. In this work, a novel approach to simultaneously controlling ßN and the value of the safety factor on the magnetic axis, q0, through manipulation of the plasma boundary shape and total beam power, is proposed. Simulations of the proposed scheme show promising results and motivate future experimental implementation and eventual integration into a more complex current profile control scheme planned to include actuation of individual beam powers, density, and loop voltage. As part of this work, amore » flexible framework for closed loop simulations within the high-fidelity code TRANSP was developed. The framework, used here to identify control-design-oriented models and to tune and test the proposed controller, exploits many of the predictive capabilities of TRANSP and provides a means for performing control calculations based on user-supplied data (controller matrices, target waveforms, etc.). The flexible framework should enable high-fidelity testing of a variety of control algorithms, thereby reducing the amount of expensive experimental time needed to implement new control algorithms on NSTX-U and other devices.« less
NASA Astrophysics Data System (ADS)
Boyer, M. D.; Andre, R.; Gates, D. A.; Gerhardt, S.; Goumiri, I. R.; Menard, J.
2015-05-01
The high-performance operational goals of NSTX-U will require development of advanced feedback control algorithms, including control of βN and the safety factor profile. In this work, a novel approach to simultaneously controlling βN and the value of the safety factor on the magnetic axis, q0, through manipulation of the plasma boundary shape and total beam power, is proposed. Simulations of the proposed scheme show promising results and motivate future experimental implementation and eventual integration into a more complex current profile control scheme planned to include actuation of individual beam powers, density, and loop voltage. As part of this work, a flexible framework for closed loop simulations within the high-fidelity code TRANSP was developed. The framework, used here to identify control-design-oriented models and to tune and test the proposed controller, exploits many of the predictive capabilities of TRANSP and provides a means for performing control calculations based on user-supplied data (controller matrices, target waveforms, etc). The flexible framework should enable high-fidelity testing of a variety of control algorithms, thereby reducing the amount of expensive experimental time needed to implement new control algorithms on NSTX-U and other devices.
Design of high-performance parallelized gene predictors in MATLAB.
Rivard, Sylvain Robert; Mailloux, Jean-Gabriel; Beguenane, Rachid; Bui, Hung Tien
2012-04-10
This paper proposes a method of implementing parallel gene prediction algorithms in MATLAB. The proposed designs are based on either Goertzel's algorithm or on FFTs and have been implemented using varying amounts of parallelism on a central processing unit (CPU) and on a graphics processing unit (GPU). Results show that an implementation using a straightforward approach can require over 4.5 h to process 15 million base pairs (bps) whereas a properly designed one could perform the same task in less than five minutes. In the best case, a GPU implementation can yield these results in 57 s. The present work shows how parallelism can be used in MATLAB for gene prediction in very large DNA sequences to produce results that are over 270 times faster than a conventional approach. This is significant as MATLAB is typically overlooked due to its apparent slow processing time even though it offers a convenient environment for bioinformatics. From a practical standpoint, this work proposes two strategies for accelerating genome data processing which rely on different parallelization mechanisms. Using a CPU, the work shows that direct access to the MEX function increases execution speed and that the PARFOR construct should be used in order to take full advantage of the parallelizable Goertzel implementation. When the target is a GPU, the work shows that data needs to be segmented into manageable sizes within the GFOR construct before processing in order to minimize execution time.
Fast and Accurate Support Vector Machines on Large Scale Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vishnu, Abhinav; Narasimhan, Jayenthi; Holder, Larry
Support Vector Machines (SVM) is a supervised Machine Learning and Data Mining (MLDM) algorithm, which has become ubiquitous largely due to its high accuracy and obliviousness to dimensionality. The objective of SVM is to find an optimal boundary --- also known as hyperplane --- which separates the samples (examples in a dataset) of different classes by a maximum margin. Usually, very few samples contribute to the definition of the boundary. However, existing parallel algorithms use the entire dataset for finding the boundary, which is sub-optimal for performance reasons. In this paper, we propose a novel distributed memory algorithm to eliminatemore » the samples which do not contribute to the boundary definition in SVM. We propose several heuristics, which range from early (aggressive) to late (conservative) elimination of the samples, such that the overall time for generating the boundary is reduced considerably. In a few cases, a sample may be eliminated (shrunk) pre-emptively --- potentially resulting in an incorrect boundary. We propose a scalable approach to synchronize the necessary data structures such that the proposed algorithm maintains its accuracy. We consider the necessary trade-offs of single/multiple synchronization using in-depth time-space complexity analysis. We implement the proposed algorithm using MPI and compare it with libsvm--- de facto sequential SVM software --- which we enhance with OpenMP for multi-core/many-core parallelism. Our proposed approach shows excellent efficiency using up to 4096 processes on several large datasets such as UCI HIGGS Boson dataset and Offending URL dataset.« less
A Low-Complexity and High-Performance 2D Look-Up Table for LDPC Hardware Implementation
NASA Astrophysics Data System (ADS)
Chen, Jung-Chieh; Yang, Po-Hui; Lain, Jenn-Kaie; Chung, Tzu-Wen
In this paper, we propose a low-complexity, high-efficiency two-dimensional look-up table (2D LUT) for carrying out the sum-product algorithm in the decoding of low-density parity-check (LDPC) codes. Instead of employing adders for the core operation when updating check node messages, in the proposed scheme, the main term and correction factor of the core operation are successfully merged into a compact 2D LUT. Simulation results indicate that the proposed 2D LUT not only attains close-to-optimal bit error rate performance but also enjoys a low complexity advantage that is suitable for hardware implementation.
NASA Astrophysics Data System (ADS)
Yan, Liang; Zhang, Lu; Zhu, Bo; Zhang, Jingying; Jiao, Zongxia
2017-10-01
Permanent magnet spherical actuator (PMSA) is a multi-variable featured and inter-axis coupled nonlinear system, which unavoidably compromises its motion control implementation. Uncertainties such as external load and friction torque of ball bearing and manufacturing errors also influence motion performance significantly. Therefore, the objective of this paper is to propose a controller based on a single neural adaptive (SNA) algorithm and a neural network (NN) identifier optimized with a particle swarm optimization (PSO) algorithm to improve the motion stability of PMSA with three-dimensional magnet arrays. The dynamic model and computed torque model are formulated for the spherical actuator, and a dynamic decoupling control algorithm is developed. By utilizing the global-optimization property of the PSO algorithm, the NN identifier is trained to avoid locally optimal solution and achieve high-precision compensations to uncertainties. The employment of the SNA controller helps to reduce the effect of compensation errors and convert the system to a stable one, even if there is difference between the compensations and uncertainties due to external disturbances. A simulation model is established, and experiments are conducted on the research prototype to validate the proposed control algorithm. The amplitude of the parameter perturbation is set to 5%, 10%, and 15%, respectively. The strong robustness of the proposed hybrid algorithm is validated by the abundant simulation data. It shows that the proposed algorithm can effectively compensate the influence of uncertainties and eliminate the effect of inter-axis couplings of the spherical actuator.
Al-Mayouf, Yusor Rafid Bahar; Ismail, Mahamod; Abdullah, Nor Fadzilah; Wahab, Ainuddin Wahid Abdul; Mahdi, Omar Adil; Khan, Suleman; Choo, Kim-Kwang Raymond
2016-01-01
Vehicular ad hoc networks (VANETs) are considered an emerging technology in the industrial and educational fields. This technology is essential in the deployment of the intelligent transportation system, which is targeted to improve safety and efficiency of traffic. The implementation of VANETs can be effectively executed by transmitting data among vehicles with the use of multiple hops. However, the intrinsic characteristics of VANETs, such as its dynamic network topology and intermittent connectivity, limit data delivery. One particular challenge of this network is the possibility that the contributing node may only remain in the network for a limited time. Hence, to prevent data loss from that node, the information must reach the destination node via multi-hop routing techniques. An appropriate, efficient, and stable routing algorithm must be developed for various VANET applications to address the issues of dynamic topology and intermittent connectivity. Therefore, this paper proposes a novel routing algorithm called efficient and stable routing algorithm based on user mobility and node density (ESRA-MD). The proposed algorithm can adapt to significant changes that may occur in the urban vehicular environment. This algorithm works by selecting an optimal route on the basis of hop count and link duration for delivering data from source to destination, thereby satisfying various quality of service considerations. The validity of the proposed algorithm is investigated by its comparison with ARP-QD protocol, which works on the mechanism of optimal route finding in VANETs in urban environments. Simulation results reveal that the proposed ESRA-MD algorithm shows remarkable improvement in terms of delivery ratio, delivery delay, and communication overhead.
Floating-point scaling technique for sources separation automatic gain control
NASA Astrophysics Data System (ADS)
Fermas, A.; Belouchrani, A.; Ait-Mohamed, O.
2012-07-01
Based on the floating-point representation and taking advantage of scaling factor indetermination in blind source separation (BSS) processing, we propose a scaling technique applied to the separation matrix, to avoid the saturation or the weakness in the recovered source signals. This technique performs an automatic gain control in an on-line BSS environment. We demonstrate the effectiveness of this technique by using the implementation of a division-free BSS algorithm with two inputs, two outputs. The proposed technique is computationally cheaper and efficient for a hardware implementation compared to the Euclidean normalisation.
Brian Hears: Online Auditory Processing Using Vectorization Over Channels
Fontaine, Bertrand; Goodman, Dan F. M.; Benichoux, Victor; Brette, Romain
2011-01-01
The human cochlea includes about 3000 inner hair cells which filter sounds at frequencies between 20 Hz and 20 kHz. This massively parallel frequency analysis is reflected in models of auditory processing, which are often based on banks of filters. However, existing implementations do not exploit this parallelism. Here we propose algorithms to simulate these models by vectorizing computation over frequency channels, which are implemented in “Brian Hears,” a library for the spiking neural network simulator package “Brian.” This approach allows us to use high-level programming languages such as Python, because with vectorized operations, the computational cost of interpretation represents a small fraction of the total cost. This makes it possible to define and simulate complex models in a simple way, while all previous implementations were model-specific. In addition, we show that these algorithms can be naturally parallelized using graphics processing units, yielding substantial speed improvements. We demonstrate these algorithms with several state-of-the-art cochlear models, and show that they compare favorably with existing, less flexible, implementations. PMID:21811453
A joint precoding scheme for indoor downlink multi-user MIMO VLC systems
NASA Astrophysics Data System (ADS)
Zhao, Qiong; Fan, Yangyu; Kang, Bochao
2017-11-01
In this study, we aim to improve the system performance and reduce the implementation complexity of precoding scheme for visible light communication (VLC) systems. By incorporating the power-method algorithm and the block diagonalization (BD) algorithm, we propose a joint precoding scheme for indoor downlink multi-user multi-input-multi-output (MU-MIMO) VLC systems. In this scheme, we apply the BD algorithm to eliminate the co-channel interference (CCI) among users firstly. Secondly, the power-method algorithm is used to search the precoding weight for each user based on the optimal criterion of signal to interference plus noise ratio (SINR) maximization. Finally, the optical power restrictions of VLC systems are taken into account to constrain the precoding weight matrix. Comprehensive computer simulations in two scenarios indicate that the proposed scheme always has better bit error rate (BER) performance and lower computation complexity than that of the traditional scheme.
Bi, Sheng; Zeng, Xiao; Tang, Xin; Qin, Shujia; Lai, King Wai Chiu
2016-01-01
Compressive sensing (CS) theory has opened up new paths for the development of signal processing applications. Based on this theory, a novel single pixel camera architecture has been introduced to overcome the current limitations and challenges of traditional focal plane arrays. However, video quality based on this method is limited by existing acquisition and recovery methods, and the method also suffers from being time-consuming. In this paper, a multi-frame motion estimation algorithm is proposed in CS video to enhance the video quality. The proposed algorithm uses multiple frames to implement motion estimation. Experimental results show that using multi-frame motion estimation can improve the quality of recovered videos. To further reduce the motion estimation time, a block match algorithm is used to process motion estimation. Experiments demonstrate that using the block match algorithm can reduce motion estimation time by 30%. PMID:26950127
A Novel Speed Compensation Method for ISAR Imaging with Low SNR
Liu, Yongxiang; Zhang, Shuanghui; Zhu, Dekang; Li, Xiang
2015-01-01
In this paper, two novel speed compensation algorithms for ISAR imaging under a low signal-to-noise ratio (SNR) condition have been proposed, which are based on the cubic phase function (CPF) and the integrated cubic phase function (ICPF), respectively. These two algorithms can estimate the speed of the target from the wideband radar echo directly, which breaks the limitation of speed measuring in a radar system. With the utilization of non-coherent accumulation, the ICPF-based speed compensation algorithm is robust to noise and can meet the requirement of speed compensation for ISAR imaging under a low SNR condition. Moreover, a fast searching implementation strategy, which consists of coarse search and precise search, has been introduced to decrease the computational burden of speed compensation based on CPF and ICPF. Experimental results based on radar data validate the effectiveness of the proposed algorithms. PMID:26225980
NASA Astrophysics Data System (ADS)
Singh, R.; Verma, H. K.
2013-12-01
This paper presents a teaching-learning-based optimization (TLBO) algorithm to solve parameter identification problems in the designing of digital infinite impulse response (IIR) filter. TLBO based filter modelling is applied to calculate the parameters of unknown plant in simulations. Unlike other heuristic search algorithms, TLBO algorithm is an algorithm-specific parameter-less algorithm. In this paper big bang-big crunch (BB-BC) optimization and PSO algorithms are also applied to filter design for comparison. Unknown filter parameters are considered as a vector to be optimized by these algorithms. MATLAB programming is used for implementation of proposed algorithms. Experimental results show that the TLBO is more accurate to estimate the filter parameters than the BB-BC optimization algorithm and has faster convergence rate when compared to PSO algorithm. TLBO is used where accuracy is more essential than the convergence speed.
Learning Computer Programming: Implementing a Fractal in a Turing Machine
ERIC Educational Resources Information Center
Pereira, Hernane B. de B.; Zebende, Gilney F.; Moret, Marcelo A.
2010-01-01
It is common to start a course on computer programming logic by teaching the algorithm concept from the point of view of natural languages, but in a schematic way. In this sense we note that the students have difficulties in understanding and implementation of the problems proposed by the teacher. The main idea of this paper is to show that the…
Self calibrating monocular camera measurement of traffic parameters.
DOT National Transportation Integrated Search
2009-12-01
This proposed project will extend the work of previous projects that have developed algorithms and software : to measure traffic speed under adverse conditions using un-calibrated cameras. The present implementation : uses the WSDOT CCTV cameras moun...
Design and implementation of intelligent electronic warfare decision making algorithm
NASA Astrophysics Data System (ADS)
Peng, Hsin-Hsien; Chen, Chang-Kuo; Hsueh, Chi-Shun
2017-05-01
Electromagnetic signals and the requirements of timely response have been a rapid growth in modern electronic warfare. Although jammers are limited resources, it is possible to achieve the best electronic warfare efficiency by tactical decisions. This paper proposes the intelligent electronic warfare decision support system. In this work, we develop a novel hybrid algorithm, Digital Pheromone Particle Swarm Optimization, based on Particle Swarm Optimization (PSO), Ant Colony Optimization (ACO) and Shuffled Frog Leaping Algorithm (SFLA). We use PSO to solve the problem and combine the concept of pheromones in ACO to accumulate more useful information in spatial solving process and speed up finding the optimal solution. The proposed algorithm finds the optimal solution in reasonable computation time by using the method of matrix conversion in SFLA. The results indicated that jammer allocation was more effective. The system based on the hybrid algorithm provides electronic warfare commanders with critical information to assist commanders in effectively managing the complex electromagnetic battlefield.
Ring system-based chemical graph generation for de novo molecular design
NASA Astrophysics Data System (ADS)
Miyao, Tomoyuki; Kaneko, Hiromasa; Funatsu, Kimito
2016-05-01
Generating chemical graphs in silico by combining building blocks is important and fundamental in virtual combinatorial chemistry. A premise in this area is that generated structures should be irredundant as well as exhaustive. In this study, we develop structure generation algorithms regarding combining ring systems as well as atom fragments. The proposed algorithms consist of three parts. First, chemical structures are generated through a canonical construction path. During structure generation, ring systems can be treated as reduced graphs having fewer vertices than those in the original ones. Second, diversified structures are generated by a simple rule-based generation algorithm. Third, the number of structures to be generated can be estimated with adequate accuracy without actual exhaustive generation. The proposed algorithms were implemented in structure generator Molgilla. As a practical application, Molgilla generated chemical structures mimicking rosiglitazone in terms of a two dimensional pharmacophore pattern. The strength of the algorithms lies in simplicity and flexibility. Therefore, they may be applied to various computer programs regarding structure generation by combining building blocks.
Enhancement of A5/1 encryption algorithm
NASA Astrophysics Data System (ADS)
Thomas, Ria Elin; Chandhiny, G.; Sharma, Katyayani; Santhi, H.; Gayathri, P.
2017-11-01
Mobiles have become an integral part of today’s world. Various standards have been proposed for the mobile communication, one of them being GSM. With the rising increase of mobile-based crimes, it is necessary to improve the security of the information passed in the form of voice or data. GSM uses A5/1 for its encryption. It is known that various attacks have been implemented, exploiting the vulnerabilities present within the A5/1 algorithm. Thus, in this paper, we proceed to look at what these vulnerabilities are, and propose the enhanced A5/1 (E-A5/1) where, we try to improve the security provided by the A5/1 algorithm by XORing the key stream generated with a pseudo random number, without increasing the time complexity. We need to study what the vulnerabilities of the base algorithm (A5/1) is, and try to improve upon its security. This will help in the future releases of the A5 family of algorithms.
PID controller tuning using metaheuristic optimization algorithms for benchmark problems
NASA Astrophysics Data System (ADS)
Gholap, Vishal; Naik Dessai, Chaitali; Bagyaveereswaran, V.
2017-11-01
This paper contributes to find the optimal PID controller parameters using particle swarm optimization (PSO), Genetic Algorithm (GA) and Simulated Annealing (SA) algorithm. The algorithms were developed through simulation of chemical process and electrical system and the PID controller is tuned. Here, two different fitness functions such as Integral Time Absolute Error and Time domain Specifications were chosen and applied on PSO, GA and SA while tuning the controller. The proposed Algorithms are implemented on two benchmark problems of coupled tank system and DC motor. Finally, comparative study has been done with different algorithms based on best cost, number of iterations and different objective functions. The closed loop process response for each set of tuned parameters is plotted for each system with each fitness function.
Digital micromirror device camera with per-pixel coded exposure for high dynamic range imaging.
Feng, Wei; Zhang, Fumin; Wang, Weijing; Xing, Wei; Qu, Xinghua
2017-05-01
In this paper, we overcome the limited dynamic range of the conventional digital camera, and propose a method of realizing high dynamic range imaging (HDRI) from a novel programmable imaging system called a digital micromirror device (DMD) camera. The unique feature of the proposed new method is that the spatial and temporal information of incident light in our DMD camera can be flexibly modulated, and it enables the camera pixels always to have reasonable exposure intensity by DMD pixel-level modulation. More importantly, it allows different light intensity control algorithms used in our programmable imaging system to achieve HDRI. We implement the optical system prototype, analyze the theory of per-pixel coded exposure for HDRI, and put forward an adaptive light intensity control algorithm to effectively modulate the different light intensity to recover high dynamic range images. Via experiments, we demonstrate the effectiveness of our method and implement the HDRI on different objects.
A Novel and Simple Spike Sorting Implementation.
Petrantonakis, Panagiotis C; Poirazi, Panayiota
2017-04-01
Monitoring the activity of multiple, individual neurons that fire spikes in the vicinity of an electrode, namely perform a Spike Sorting (SS) procedure, comprises one of the most important tools for contemporary neuroscience in order to reverse-engineer the brain. As recording electrodes' technology rabidly evolves by integrating thousands of electrodes in a confined spatial setting, the algorithms that are used to monitor individual neurons from recorded signals have to become even more reliable and computationally efficient. In this work, we propose a novel framework of the SS approach in which a single-step processing of the raw (unfiltered) extracellular signal is sufficient for both the detection and sorting of the activity of individual neurons. Despite its simplicity, the proposed approach exhibits comparable performance with state-of-the-art approaches, especially for spike detection in noisy signals, and paves the way for a new family of SS algorithms with the potential for multi-recording, fast, on-chip implementations.
A genetic algorithm-based job scheduling model for big data analytics.
Lu, Qinghua; Li, Shanshan; Zhang, Weishan; Zhang, Lei
Big data analytics (BDA) applications are a new category of software applications that process large amounts of data using scalable parallel processing infrastructure to obtain hidden value. Hadoop is the most mature open-source big data analytics framework, which implements the MapReduce programming model to process big data with MapReduce jobs. Big data analytics jobs are often continuous and not mutually separated. The existing work mainly focuses on executing jobs in sequence, which are often inefficient and consume high energy. In this paper, we propose a genetic algorithm-based job scheduling model for big data analytics applications to improve the efficiency of big data analytics. To implement the job scheduling model, we leverage an estimation module to predict the performance of clusters when executing analytics jobs. We have evaluated the proposed job scheduling model in terms of feasibility and accuracy.
ASIC implementation of recursive scaled discrete cosine transform algorithm
NASA Astrophysics Data System (ADS)
On, Bill N.; Narasimhan, Sam; Huang, Victor K.
1994-05-01
A program to implement the Recursive Scaled Discrete Cosine Transform (DCT) algorithm as proposed by H. S. Hou has been undertaken at the Institute of Microelectronics. Implementation of the design was done using top-down design methodology with VHDL (VHSIC Hardware Description Language) for chip modeling. When the VHDL simulation has been satisfactorily completed, the design is synthesized into gates using a synthesis tool. The architecture of the design consists of two processing units together with a memory module for data storage and transpose. Each processing unit is composed of four pipelined stages which allow the internal clock to run at one-eighth (1/8) the speed of the pixel clock. Each stage operates on eight pixels in parallel. As the data flows through each stage, there are various adders and multipliers to transform them into the desired coefficients. The Scaled IDCT was implemented in a similar fashion with the adders and multipliers rearranged to perform the inverse DCT algorithm. The chip has been verified using Field Programmable Gate Array devices. The design is operational. The combination of fewer multiplications required and pipelined architecture give Hou's Recursive Scaled DCT good potential of achieving high performance at a low cost in using Very Large Scale Integration implementation.
Proposals for Updating Tai Algorithm
1997-12-01
1997 meeting, the Comiti International des Poids et Mesures (CIPM) decided to change the name of the Comiti Consultatif pour la Difinition de la ...Report of the BIPM Time Section, 1988,1, D1-D22. [2] P. Tavella, C. Thomas, Comparative study of time scale algorithms, Metrologia , 1991, 28, 57...alternative choice for implementing an upper limit of clock weights, Metrologia , 1996, 33, 227-240. [5] C. Thomas, Impact of New Clock Technologies
Annealed Importance Sampling Reversible Jump MCMC algorithms
DOE Office of Scientific and Technical Information (OSTI.GOV)
Karagiannis, Georgios; Andrieu, Christophe
2013-03-20
It will soon be 20 years since reversible jump Markov chain Monte Carlo (RJ-MCMC) algorithms have been proposed. They have significantly extended the scope of Markov chain Monte Carlo simulation methods, offering the promise to be able to routinely tackle transdimensional sampling problems, as encountered in Bayesian model selection problems for example, in a principled and flexible fashion. Their practical efficient implementation, however, still remains a challenge. A particular difficulty encountered in practice is in the choice of the dimension matching variables (both their nature and their distribution) and the reversible transformations which allow one to define the one-to-one mappingsmore » underpinning the design of these algorithms. Indeed, even seemingly sensible choices can lead to algorithms with very poor performance. The focus of this paper is the development and performance evaluation of a method, annealed importance sampling RJ-MCMC (aisRJ), which addresses this problem by mitigating the sensitivity of RJ-MCMC algorithms to the aforementioned poor design. As we shall see the algorithm can be understood as being an “exact approximation” of an idealized MCMC algorithm that would sample from the model probabilities directly in a model selection set-up. Such an idealized algorithm may have good theoretical convergence properties, but typically cannot be implemented, and our algorithms can approximate the performance of such idealized algorithms to an arbitrary degree while not introducing any bias for any degree of approximation. Our approach combines the dimension matching ideas of RJ-MCMC with annealed importance sampling and its Markov chain Monte Carlo implementation. We illustrate the performance of the algorithm with numerical simulations which indicate that, although the approach may at first appear computationally involved, it is in fact competitive.« less
Packing Boxes into Multiple Containers Using Genetic Algorithm
NASA Astrophysics Data System (ADS)
Menghani, Deepak; Guha, Anirban
2016-07-01
Container loading problems have been studied extensively in the literature and various analytical, heuristic and metaheuristic methods have been proposed. This paper presents two different variants of a genetic algorithm framework for the three-dimensional container loading problem for optimally loading boxes into multiple containers with constraints. The algorithms are designed so that it is easy to incorporate various constraints found in real life problems. The algorithms are tested on data of standard test cases from literature and are found to compare well with the benchmark algorithms in terms of utilization of containers. This, along with the ability to easily incorporate a wide range of practical constraints, makes them attractive for implementation in real life scenarios.
NASA Astrophysics Data System (ADS)
Ayyad, Yassid; Mittig, Wolfgang; Bazin, Daniel; Beceiro-Novo, Saul; Cortesi, Marco
2018-02-01
The three-dimensional reconstruction of particle tracks in a time projection chamber is a challenging task that requires advanced classification and fitting algorithms. In this work, we have developed and implemented a novel algorithm based on the Random Sample Consensus Model (RANSAC). The RANSAC is used to classify tracks including pile-up, to remove uncorrelated noise hits, as well as to reconstruct the vertex of the reaction. The algorithm, developed within the Active Target Time Projection Chamber (AT-TPC) framework, was tested and validated by analyzing the 4He+4He reaction. Results, performance and quality of the proposed algorithm are presented and discussed in detail.
NASA Astrophysics Data System (ADS)
Choi, Jae Hyung; Kuk, Jung Gap; Kim, Young Il; Cho, Nam Ik
2012-01-01
This paper proposes an algorithm for the detection of pillars or posts in the video captured by a single camera implemented on the fore side of a room mirror in a car. The main purpose of this algorithm is to complement the weakness of current ultrasonic parking assist system, which does not well find the exact position of pillars or does not recognize narrow posts. The proposed algorithm is consisted of three steps: straight line detection, line tracking, and the estimation of 3D position of pillars. In the first step, the strong lines are found by the Hough transform. Second step is the combination of detection and tracking, and the third is the calculation of 3D position of the line by the analysis of trajectory of relative positions and the parameters of camera. Experiments on synthetic and real images show that the proposed method successfully locates and tracks the position of pillars, which helps the ultrasonic system to correctly locate the edges of pillars. It is believed that the proposed algorithm can also be employed as a basic element for vision based autonomous driving system.
Parallel conjugate gradient algorithms for manipulator dynamic simulation
NASA Technical Reports Server (NTRS)
Fijany, Amir; Scheld, Robert E.
1989-01-01
Parallel conjugate gradient algorithms for the computation of multibody dynamics are developed for the specialized case of a robot manipulator. For an n-dimensional positive-definite linear system, the Classical Conjugate Gradient (CCG) algorithms are guaranteed to converge in n iterations, each with a computation cost of O(n); this leads to a total computational cost of O(n sq) on a serial processor. A conjugate gradient algorithms is presented that provide greater efficiency using a preconditioner, which reduces the number of iterations required, and by exploiting parallelism, which reduces the cost of each iteration. Two Preconditioned Conjugate Gradient (PCG) algorithms are proposed which respectively use a diagonal and a tridiagonal matrix, composed of the diagonal and tridiagonal elements of the mass matrix, as preconditioners. Parallel algorithms are developed to compute the preconditioners and their inversions in O(log sub 2 n) steps using n processors. A parallel algorithm is also presented which, on the same architecture, achieves the computational time of O(log sub 2 n) for each iteration. Simulation results for a seven degree-of-freedom manipulator are presented. Variants of the proposed algorithms are also developed which can be efficiently implemented on the Robot Mathematics Processor (RMP).
NASA Astrophysics Data System (ADS)
Babayan, Pavel; Smirnov, Sergey; Strotov, Valery
2017-10-01
This paper describes the aerial object recognition algorithm for on-board and stationary vision system. Suggested algorithm is intended to recognize the objects of a specific kind using the set of the reference objects defined by 3D models. The proposed algorithm based on the outer contour descriptor building. The algorithm consists of two stages: learning and recognition. Learning stage is devoted to the exploring of reference objects. Using 3D models we can build the database containing training images by rendering the 3D model from viewpoints evenly distributed on a sphere. Sphere points distribution is made by the geosphere principle. Gathered training image set is used for calculating descriptors, which will be used in the recognition stage of the algorithm. The recognition stage is focusing on estimating the similarity of the captured object and the reference objects by matching an observed image descriptor and the reference object descriptors. The experimental research was performed using a set of the models of the aircraft of the different types (airplanes, helicopters, UAVs). The proposed orientation estimation algorithm showed good accuracy in all case studies. The real-time performance of the algorithm in FPGA-based vision system was demonstrated.
Algorithmic Mechanism Design of Evolutionary Computation.
Pei, Yan
2015-01-01
We consider algorithmic design, enhancement, and improvement of evolutionary computation as a mechanism design problem. All individuals or several groups of individuals can be considered as self-interested agents. The individuals in evolutionary computation can manipulate parameter settings and operations by satisfying their own preferences, which are defined by an evolutionary computation algorithm designer, rather than by following a fixed algorithm rule. Evolutionary computation algorithm designers or self-adaptive methods should construct proper rules and mechanisms for all agents (individuals) to conduct their evolution behaviour correctly in order to definitely achieve the desired and preset objective(s). As a case study, we propose a formal framework on parameter setting, strategy selection, and algorithmic design of evolutionary computation by considering the Nash strategy equilibrium of a mechanism design in the search process. The evaluation results present the efficiency of the framework. This primary principle can be implemented in any evolutionary computation algorithm that needs to consider strategy selection issues in its optimization process. The final objective of our work is to solve evolutionary computation design as an algorithmic mechanism design problem and establish its fundamental aspect by taking this perspective. This paper is the first step towards achieving this objective by implementing a strategy equilibrium solution (such as Nash equilibrium) in evolutionary computation algorithm.
Algorithmic Mechanism Design of Evolutionary Computation
2015-01-01
We consider algorithmic design, enhancement, and improvement of evolutionary computation as a mechanism design problem. All individuals or several groups of individuals can be considered as self-interested agents. The individuals in evolutionary computation can manipulate parameter settings and operations by satisfying their own preferences, which are defined by an evolutionary computation algorithm designer, rather than by following a fixed algorithm rule. Evolutionary computation algorithm designers or self-adaptive methods should construct proper rules and mechanisms for all agents (individuals) to conduct their evolution behaviour correctly in order to definitely achieve the desired and preset objective(s). As a case study, we propose a formal framework on parameter setting, strategy selection, and algorithmic design of evolutionary computation by considering the Nash strategy equilibrium of a mechanism design in the search process. The evaluation results present the efficiency of the framework. This primary principle can be implemented in any evolutionary computation algorithm that needs to consider strategy selection issues in its optimization process. The final objective of our work is to solve evolutionary computation design as an algorithmic mechanism design problem and establish its fundamental aspect by taking this perspective. This paper is the first step towards achieving this objective by implementing a strategy equilibrium solution (such as Nash equilibrium) in evolutionary computation algorithm. PMID:26257777
NASA Astrophysics Data System (ADS)
Wei, Hai-Rui; Liu, Ji-Zhen
2017-02-01
It is very important to seek an efficient and robust quantum algorithm demanding less quantum resources. We propose one-photon three-qubit original and refined Deutsch-Jozsa algorithms with polarization and two linear momentums degrees of freedom (DOFs). Our schemes are constructed by solely using linear optics. Compared to the traditional ones with one DOF, our schemes are more economic and robust because the necessary photons are reduced from three to one. Our linear-optic schemes are working in a determinate way, and they are feasible with current experimental technology.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wei, Hai-Rui, E-mail: hrwei@ustb.edu.cn; Liu, Ji-Zhen
2017-02-15
It is very important to seek an efficient and robust quantum algorithm demanding less quantum resources. We propose one-photon three-qubit original and refined Deutsch–Jozsa algorithms with polarization and two linear momentums degrees of freedom (DOFs). Our schemes are constructed by solely using linear optics. Compared to the traditional ones with one DOF, our schemes are more economic and robust because the necessary photons are reduced from three to one. Our linear-optic schemes are working in a determinate way, and they are feasible with current experimental technology.
Wu, Zhenqin; Ramsundar, Bharath; Feinberg, Evan N.; Gomes, Joseph; Geniesse, Caleb; Pappu, Aneesh S.; Leswing, Karl
2017-01-01
Molecular machine learning has been maturing rapidly over the last few years. Improved methods and the presence of larger datasets have enabled machine learning algorithms to make increasingly accurate predictions about molecular properties. However, algorithmic progress has been limited due to the lack of a standard benchmark to compare the efficacy of proposed methods; most new algorithms are benchmarked on different datasets making it challenging to gauge the quality of proposed methods. This work introduces MoleculeNet, a large scale benchmark for molecular machine learning. MoleculeNet curates multiple public datasets, establishes metrics for evaluation, and offers high quality open-source implementations of multiple previously proposed molecular featurization and learning algorithms (released as part of the DeepChem open source library). MoleculeNet benchmarks demonstrate that learnable representations are powerful tools for molecular machine learning and broadly offer the best performance. However, this result comes with caveats. Learnable representations still struggle to deal with complex tasks under data scarcity and highly imbalanced classification. For quantum mechanical and biophysical datasets, the use of physics-aware featurizations can be more important than choice of particular learning algorithm. PMID:29629118
DANoC: An Efficient Algorithm and Hardware Codesign of Deep Neural Networks on Chip.
Zhou, Xichuan; Li, Shengli; Tang, Fang; Hu, Shengdong; Lin, Zhi; Zhang, Lei
2017-07-18
Deep neural networks (NNs) are the state-of-the-art models for understanding the content of images and videos. However, implementing deep NNs in embedded systems is a challenging task, e.g., a typical deep belief network could exhaust gigabytes of memory and result in bandwidth and computational bottlenecks. To address this challenge, this paper presents an algorithm and hardware codesign for efficient deep neural computation. A hardware-oriented deep learning algorithm, named the deep adaptive network, is proposed to explore the sparsity of neural connections. By adaptively removing the majority of neural connections and robustly representing the reserved connections using binary integers, the proposed algorithm could save up to 99.9% memory utility and computational resources without undermining classification accuracy. An efficient sparse-mapping-memory-based hardware architecture is proposed to fully take advantage of the algorithmic optimization. Different from traditional Von Neumann architecture, the deep-adaptive network on chip (DANoC) brings communication and computation in close proximity to avoid power-hungry parameter transfers between on-board memory and on-chip computational units. Experiments over different image classification benchmarks show that the DANoC system achieves competitively high accuracy and efficiency comparing with the state-of-the-art approaches.
Agent Collaborative Target Localization and Classification in Wireless Sensor Networks
Wang, Xue; Bi, Dao-wei; Ding, Liang; Wang, Sheng
2007-01-01
Wireless sensor networks (WSNs) are autonomous networks that have been frequently deployed to collaboratively perform target localization and classification tasks. Their autonomous and collaborative features resemble the characteristics of agents. Such similarities inspire the development of heterogeneous agent architecture for WSN in this paper. The proposed agent architecture views WSN as multi-agent systems and mobile agents are employed to reduce in-network communication. According to the architecture, an energy based acoustic localization algorithm is proposed. In localization, estimate of target location is obtained by steepest descent search. The search algorithm adapts to measurement environments by dynamically adjusting its termination condition. With the agent architecture, target classification is accomplished by distributed support vector machine (SVM). Mobile agents are employed for feature extraction and distributed SVM learning to reduce communication load. Desirable learning performance is guaranteed by combining support vectors and convex hull vectors. Fusion algorithms are designed to merge SVM classification decisions made from various modalities. Real world experiments with MICAz sensor nodes are conducted for vehicle localization and classification. Experimental results show the proposed agent architecture remarkably facilitates WSN designs and algorithm implementation. The localization and classification algorithms also prove to be accurate and energy efficient.
Soft-output decoding algorithms in iterative decoding of turbo codes
NASA Technical Reports Server (NTRS)
Benedetto, S.; Montorsi, G.; Divsalar, D.; Pollara, F.
1996-01-01
In this article, we present two versions of a simplified maximum a posteriori decoding algorithm. The algorithms work in a sliding window form, like the Viterbi algorithm, and can thus be used to decode continuously transmitted sequences obtained by parallel concatenated codes, without requiring code trellis termination. A heuristic explanation is also given of how to embed the maximum a posteriori algorithms into the iterative decoding of parallel concatenated codes (turbo codes). The performances of the two algorithms are compared on the basis of a powerful rate 1/3 parallel concatenated code. Basic circuits to implement the simplified a posteriori decoding algorithm using lookup tables, and two further approximations (linear and threshold), with a very small penalty, to eliminate the need for lookup tables are proposed.
Parallelized Seeded Region Growing Using CUDA
Park, Seongjin; Lee, Hyunna; Seo, Jinwook; Lee, Kyoung Ho; Shin, Yeong-Gil; Kim, Bohyoung
2014-01-01
This paper presents a novel method for parallelizing the seeded region growing (SRG) algorithm using Compute Unified Device Architecture (CUDA) technology, with intention to overcome the theoretical weakness of SRG algorithm of its computation time being directly proportional to the size of a segmented region. The segmentation performance of the proposed CUDA-based SRG is compared with SRG implementations on single-core CPUs, quad-core CPUs, and shader language programming, using synthetic datasets and 20 body CT scans. Based on the experimental results, the CUDA-based SRG outperforms the other three implementations, advocating that it can substantially assist the segmentation during massive CT screening tests. PMID:25309619
Model-based clustering for RNA-seq data.
Si, Yaqing; Liu, Peng; Li, Pinghua; Brutnell, Thomas P
2014-01-15
RNA-seq technology has been widely adopted as an attractive alternative to microarray-based methods to study global gene expression. However, robust statistical tools to analyze these complex datasets are still lacking. By grouping genes with similar expression profiles across treatments, cluster analysis provides insight into gene functions and networks, and hence is an important technique for RNA-seq data analysis. In this manuscript, we derive clustering algorithms based on appropriate probability models for RNA-seq data. An expectation-maximization algorithm and another two stochastic versions of expectation-maximization algorithms are described. In addition, a strategy for initialization based on likelihood is proposed to improve the clustering algorithms. Moreover, we present a model-based hybrid-hierarchical clustering method to generate a tree structure that allows visualization of relationships among clusters as well as flexibility of choosing the number of clusters. Results from both simulation studies and analysis of a maize RNA-seq dataset show that our proposed methods provide better clustering results than alternative methods such as the K-means algorithm and hierarchical clustering methods that are not based on probability models. An R package, MBCluster.Seq, has been developed to implement our proposed algorithms. This R package provides fast computation and is publicly available at http://www.r-project.org
A fast and automatic fusion algorithm for unregistered multi-exposure image sequence
NASA Astrophysics Data System (ADS)
Liu, Yan; Yu, Feihong
2014-09-01
Human visual system (HVS) can visualize all the brightness levels of the scene through visual adaptation. However, the dynamic range of most commercial digital cameras and display devices are smaller than the dynamic range of human eye. This implies low dynamic range (LDR) images captured by normal digital camera may lose image details. We propose an efficient approach to high dynamic (HDR) image fusion that copes with image displacement and image blur degradation in a computationally efficient manner, which is suitable for implementation on mobile devices. The various image registration algorithms proposed in the previous literatures are unable to meet the efficiency and performance requirements in the application of mobile devices. In this paper, we selected Oriented Brief (ORB) detector to extract local image structures. The descriptor selected in multi-exposure image fusion algorithm has to be fast and robust to illumination variations and geometric deformations. ORB descriptor is the best candidate in our algorithm. Further, we perform an improved RANdom Sample Consensus (RANSAC) algorithm to reject incorrect matches. For the fusion of images, a new approach based on Stationary Wavelet Transform (SWT) is used. The experimental results demonstrate that the proposed algorithm generates high quality images at low computational cost. Comparisons with a number of other feature matching methods show that our method gets better performance.
Implementation and evaluation of various demons deformable image registration algorithms on a GPU.
Gu, Xuejun; Pan, Hubert; Liang, Yun; Castillo, Richard; Yang, Deshan; Choi, Dongju; Castillo, Edward; Majumdar, Amitava; Guerrero, Thomas; Jiang, Steve B
2010-01-07
Online adaptive radiation therapy (ART) promises the ability to deliver an optimal treatment in response to daily patient anatomic variation. A major technical barrier for the clinical implementation of online ART is the requirement of rapid image segmentation. Deformable image registration (DIR) has been used as an automated segmentation method to transfer tumor/organ contours from the planning image to daily images. However, the current computational time of DIR is insufficient for online ART. In this work, this issue is addressed by using computer graphics processing units (GPUs). A gray-scale-based DIR algorithm called demons and five of its variants were implemented on GPUs using the compute unified device architecture (CUDA) programming environment. The spatial accuracy of these algorithms was evaluated over five sets of pulmonary 4D CT images with an average size of 256 x 256 x 100 and more than 1100 expert-determined landmark point pairs each. For all the testing scenarios presented in this paper, the GPU-based DIR computation required around 7 to 11 s to yield an average 3D error ranging from 1.5 to 1.8 mm. It is interesting to find out that the original passive force demons algorithms outperform subsequently proposed variants based on the combination of accuracy, efficiency and ease of implementation.
Hrdá, Marcela; Kulich, Tomáš; Repiský, Michal; Noga, Jozef; Malkina, Olga L; Malkin, Vladimir G
2014-09-05
A recently developed Thouless-expansion-based diagonalization-free approach for improving the efficiency of self-consistent field (SCF) methods (Noga and Šimunek, J. Chem. Theory Comput. 2010, 6, 2706) has been adapted to the four-component relativistic scheme and implemented within the program package ReSpect. In addition to the implementation, the method has been thoroughly analyzed, particularly with respect to cases for which it is difficult or computationally expensive to find a good initial guess. Based on this analysis, several modifications of the original algorithm, refining its stability and efficiency, are proposed. To demonstrate the robustness and efficiency of the improved algorithm, we present the results of four-component diagonalization-free SCF calculations on several heavy-metal complexes, the largest of which contains more than 80 atoms (about 6000 4-spinor basis functions). The diagonalization-free procedure is about twice as fast as the corresponding diagonalization. Copyright © 2014 Wiley Periodicals, Inc.
NASA Astrophysics Data System (ADS)
Xiang, Min; Qu, Qinqin; Chen, Cheng; Tian, Li; Zeng, Lingkang
2017-11-01
To improve the reliability of communication service in smart distribution grid (SDG), an access selection algorithm based on dynamic network status and different service types for heterogeneous wireless networks was proposed. The network performance index values were obtained in real time by multimode terminal and the variation trend of index values was analyzed by the growth matrix. The index weights were calculated by entropy-weight and then modified by rough set to get the final weights. Combining the grey relational analysis to sort the candidate networks, and the optimum communication network is selected. Simulation results show that the proposed algorithm can implement dynamically access selection in heterogeneous wireless networks of SDG effectively and reduce the network blocking probability.
Error analysis of stochastic gradient descent ranking.
Chen, Hong; Tang, Yi; Li, Luoqing; Yuan, Yuan; Li, Xuelong; Tang, Yuanyan
2013-06-01
Ranking is always an important task in machine learning and information retrieval, e.g., collaborative filtering, recommender systems, drug discovery, etc. A kernel-based stochastic gradient descent algorithm with the least squares loss is proposed for ranking in this paper. The implementation of this algorithm is simple, and an expression of the solution is derived via a sampling operator and an integral operator. An explicit convergence rate for leaning a ranking function is given in terms of the suitable choices of the step size and the regularization parameter. The analysis technique used here is capacity independent and is novel in error analysis of ranking learning. Experimental results on real-world data have shown the effectiveness of the proposed algorithm in ranking tasks, which verifies the theoretical analysis in ranking error.
a Metadata Based Approach for Analyzing Uav Datasets for Photogrammetric Applications
NASA Astrophysics Data System (ADS)
Dhanda, A.; Remondino, F.; Santana Quintero, M.
2018-05-01
This paper proposes a methodology for pre-processing and analysing Unmanned Aerial Vehicle (UAV) datasets before photogrammetric processing. In cases where images are gathered without a detailed flight plan and at regular acquisition intervals the datasets can be quite large and be time consuming to process. This paper proposes a method to calculate the image overlap and filter out images to reduce large block sizes and speed up photogrammetric processing. The python-based algorithm that implements this methodology leverages the metadata in each image to determine the end and side overlap of grid-based UAV flights. Utilizing user input, the algorithm filters out images that are unneeded for photogrammetric processing. The result is an algorithm that can speed up photogrammetric processing and provide valuable information to the user about the flight path.
Komorkiewicz, Mateusz; Kryjak, Tomasz; Gorgon, Marek
2014-01-01
This article presents an efficient hardware implementation of the Horn-Schunck algorithm that can be used in an embedded optical flow sensor. An architecture is proposed, that realises the iterative Horn-Schunck algorithm in a pipelined manner. This modification allows to achieve data throughput of 175 MPixels/s and makes processing of Full HD video stream (1, 920 × 1, 080 @ 60 fps) possible. The structure of the optical flow module as well as pre- and post-filtering blocks and a flow reliability computation unit is described in details. Three versions of optical flow modules, with different numerical precision, working frequency and obtained results accuracy are proposed. The errors caused by switching from floating- to fixed-point computations are also evaluated. The described architecture was tested on popular sequences from an optical flow dataset of the Middlebury University. It achieves state-of-the-art results among hardware implementations of single scale methods. The designed fixed-point architecture achieves performance of 418 GOPS with power efficiency of 34 GOPS/W. The proposed floating-point module achieves 103 GFLOPS, with power efficiency of 24 GFLOPS/W. Moreover, a 100 times speedup compared to a modern CPU with SIMD support is reported. A complete, working vision system realized on Xilinx VC707 evaluation board is also presented. It is able to compute optical flow for Full HD video stream received from an HDMI camera in real-time. The obtained results prove that FPGA devices are an ideal platform for embedded vision systems. PMID:24526303
Komorkiewicz, Mateusz; Kryjak, Tomasz; Gorgon, Marek
2014-02-12
This article presents an efficient hardware implementation of the Horn-Schunck algorithm that can be used in an embedded optical flow sensor. An architecture is proposed, that realises the iterative Horn-Schunck algorithm in a pipelined manner. This modification allows to achieve data throughput of 175 MPixels/s and makes processing of Full HD video stream (1; 920 × 1; 080 @ 60 fps) possible. The structure of the optical flow module as well as pre- and post-filtering blocks and a flow reliability computation unit is described in details. Three versions of optical flow modules, with different numerical precision, working frequency and obtained results accuracy are proposed. The errors caused by switching from floating- to fixed-point computations are also evaluated. The described architecture was tested on popular sequences from an optical flow dataset of the Middlebury University. It achieves state-of-the-art results among hardware implementations of single scale methods. The designed fixed-point architecture achieves performance of 418 GOPS with power efficiency of 34 GOPS/W. The proposed floating-point module achieves 103 GFLOPS, with power efficiency of 24 GFLOPS/W. Moreover, a 100 times speedup compared to a modern CPU with SIMD support is reported. A complete, working vision system realized on Xilinx VC707 evaluation board is also presented. It is able to compute optical flow for Full HD video stream received from an HDMI camera in real-time. The obtained results prove that FPGA devices are an ideal platform for embedded vision systems.
Efficient Deterministic Finite Automata Minimization Based on Backward Depth Information.
Liu, Desheng; Huang, Zhiping; Zhang, Yimeng; Guo, Xiaojun; Su, Shaojing
2016-01-01
Obtaining a minimal automaton is a fundamental issue in the theory and practical implementation of deterministic finite automatons (DFAs). A minimization algorithm is presented in this paper that consists of two main phases. In the first phase, the backward depth information is built, and the state set of the DFA is partitioned into many blocks. In the second phase, the state set is refined using a hash table. The minimization algorithm has a lower time complexity O(n) than a naive comparison of transitions O(n2). Few states need to be refined by the hash table, because most states have been partitioned by the backward depth information in the coarse partition. This method achieves greater generality than previous methods because building the backward depth information is independent of the topological complexity of the DFA. The proposed algorithm can be applied not only to the minimization of acyclic automata or simple cyclic automata, but also to automata with high topological complexity. Overall, the proposal has three advantages: lower time complexity, greater generality, and scalability. A comparison to Hopcroft's algorithm demonstrates experimentally that the algorithm runs faster than traditional algorithms.
Multirate-based fast parallel algorithms for 2-D DHT-based real-valued discrete Gabor transform.
Tao, Liang; Kwan, Hon Keung
2012-07-01
Novel algorithms for the multirate and fast parallel implementation of the 2-D discrete Hartley transform (DHT)-based real-valued discrete Gabor transform (RDGT) and its inverse transform are presented in this paper. A 2-D multirate-based analysis convolver bank is designed for the 2-D RDGT, and a 2-D multirate-based synthesis convolver bank is designed for the 2-D inverse RDGT. The parallel channels in each of the two convolver banks have a unified structure and can apply the 2-D fast DHT algorithm to speed up their computations. The computational complexity of each parallel channel is low and is independent of the Gabor oversampling rate. All the 2-D RDGT coefficients of an image are computed in parallel during the analysis process and can be reconstructed in parallel during the synthesis process. The computational complexity and time of the proposed parallel algorithms are analyzed and compared with those of the existing fastest algorithms for 2-D discrete Gabor transforms. The results indicate that the proposed algorithms are the fastest, which make them attractive for real-time image processing.
A Degree Distribution Optimization Algorithm for Image Transmission
NASA Astrophysics Data System (ADS)
Jiang, Wei; Yang, Junjie
2016-09-01
Luby Transform (LT) code is the first practical implementation of digital fountain code. The coding behavior of LT code is mainly decided by the degree distribution which determines the relationship between source data and codewords. Two degree distributions are suggested by Luby. They work well in typical situations but not optimally in case of finite encoding symbols. In this work, the degree distribution optimization algorithm is proposed to explore the potential of LT code. Firstly selection scheme of sparse degrees for LT codes is introduced. Then probability distribution is optimized according to the selected degrees. In image transmission, bit stream is sensitive to the channel noise and even a single bit error may cause the loss of synchronization between the encoder and the decoder. Therefore the proposed algorithm is designed for image transmission situation. Moreover, optimal class partition is studied for image transmission with unequal error protection. The experimental results are quite promising. Compared with LT code with robust soliton distribution, the proposed algorithm improves the final quality of recovered images obviously with the same overhead.
NASA Technical Reports Server (NTRS)
Metcalfe, A. G.; Bodenheimer, R. E.
1976-01-01
A parallel algorithm for counting the number of logic-l elements in a binary array or image developed during preliminary investigation of the Tse concept is described. The counting algorithm is implemented using a basic combinational structure. Modifications which improve the efficiency of the basic structure are also presented. A programmable Tse computer structure is proposed, along with a hardware control unit, Tse instruction set, and software program for execution of the counting algorithm. Finally, a comparison is made between the different structures in terms of their more important characteristics.
The use of Lanczos's method to solve the large generalized symmetric definite eigenvalue problem
NASA Technical Reports Server (NTRS)
Jones, Mark T.; Patrick, Merrell L.
1989-01-01
The generalized eigenvalue problem, Kx = Lambda Mx, is of significant practical importance, especially in structural enginering where it arises as the vibration and buckling problem. A new algorithm, LANZ, based on Lanczos's method is developed. LANZ uses a technique called dynamic shifting to improve the efficiency and reliability of the Lanczos algorithm. A new algorithm for solving the tridiagonal matrices that arise when using Lanczos's method is described. A modification of Parlett and Scott's selective orthogonalization algorithm is proposed. Results from an implementation of LANZ on a Convex C-220 show it to be superior to a subspace iteration code.
A Novel Image Encryption Algorithm Based on DNA Subsequence Operation
Zhang, Qiang; Xue, Xianglian; Wei, Xiaopeng
2012-01-01
We present a novel image encryption algorithm based on DNA subsequence operation. Different from the traditional DNA encryption methods, our algorithm does not use complex biological operation but just uses the idea of DNA subsequence operations (such as elongation operation, truncation operation, deletion operation, etc.) combining with the logistic chaotic map to scramble the location and the value of pixel points from the image. The experimental results and security analysis show that the proposed algorithm is easy to be implemented, can get good encryption effect, has a wide secret key's space, strong sensitivity to secret key, and has the abilities of resisting exhaustive attack and statistic attack. PMID:23093912
Complexity-reduced implementations of complete and null-space-based linear discriminant analysis.
Lu, Gui-Fu; Zheng, Wenming
2013-10-01
Dimensionality reduction has become an important data preprocessing step in a lot of applications. Linear discriminant analysis (LDA) is one of the most well-known dimensionality reduction methods. However, the classical LDA cannot be used directly in the small sample size (SSS) problem where the within-class scatter matrix is singular. In the past, many generalized LDA methods has been reported to address the SSS problem. Among these methods, complete linear discriminant analysis (CLDA) and null-space-based LDA (NLDA) provide good performances. The existing implementations of CLDA are computationally expensive. In this paper, we propose a new and fast implementation of CLDA. Our proposed implementation of CLDA, which is the most efficient one, is equivalent to the existing implementations of CLDA in theory. Since CLDA is an extension of null-space-based LDA (NLDA), our implementation of CLDA also provides a fast implementation of NLDA. Experiments on some real-world data sets demonstrate the effectiveness of our proposed new CLDA and NLDA algorithms. Copyright © 2013 Elsevier Ltd. All rights reserved.
A low-complexity Reed-Solomon decoder using new key equation solver
NASA Astrophysics Data System (ADS)
Xie, Jun; Yuan, Songxin; Tu, Xiaodong; Zhang, Chongfu
2006-09-01
This paper presents a low-complexity parallel Reed-Solomon (RS) (255,239) decoder architecture using a novel pipelined variable stages recursive Modified Euclidean (ME) algorithm for optical communication. The pipelined four-parallel syndrome generator is proposed. The time multiplexing and resource sharing schemes are used in the novel recursive ME algorithm to reduce the logic gate count. The new key equation solver can be shared by two decoder macro. A new Chien search cell which doesn't need initialization is proposed in the paper. The proposed decoder can be used for 2.5Gb/s data rates device. The decoder is implemented in Altera' Stratixll device. The resource utilization is reduced about 40% comparing to the conventional method.
Face verification system for Android mobile devices using histogram based features
NASA Astrophysics Data System (ADS)
Sato, Sho; Kobayashi, Kazuhiro; Chen, Qiu
2016-07-01
This paper proposes a face verification system that runs on Android mobile devices. In this system, facial image is captured by a built-in camera on the Android device firstly, and then face detection is implemented using Haar-like features and AdaBoost learning algorithm. The proposed system verify the detected face using histogram based features, which are generated by binary Vector Quantization (VQ) histogram using DCT coefficients in low frequency domains, as well as Improved Local Binary Pattern (Improved LBP) histogram in spatial domain. Verification results with different type of histogram based features are first obtained separately and then combined by weighted averaging. We evaluate our proposed algorithm by using publicly available ORL database and facial images captured by an Android tablet.
Fast discrete cosine transform structure suitable for implementation with integer computation
NASA Astrophysics Data System (ADS)
Jeong, Yeonsik; Lee, Imgeun
2000-10-01
The discrete cosine transform (DCT) has wide applications in speech and image coding. We propose a fast DCT scheme with the property of reduced multiplication stages and fewer additions and multiplications. The proposed algorithm is structured so that most multiplications are performed at the final stage, which reduces the propagation error that could occur in the integer computation.
Evaluation metrics for bone segmentation in ultrasound
NASA Astrophysics Data System (ADS)
Lougheed, Matthew; Fichtinger, Gabor; Ungi, Tamas
2015-03-01
Tracked ultrasound is a safe alternative to X-ray for imaging bones. The interpretation of bony structures is challenging as ultrasound has no specific intensity characteristic of bones. Several image segmentation algorithms have been devised to identify bony structures. We propose an open-source framework that would aid in the development and comparison of such algorithms by quantitatively measuring segmentation performance in the ultrasound images. True-positive and false-negative metrics used in the framework quantify algorithm performance based on correctly segmented bone and correctly segmented boneless regions. Ground-truth for these metrics are defined manually and along with the corresponding automatically segmented image are used for the performance analysis. Manually created ground truth tests were generated to verify the accuracy of the analysis. Further evaluation metrics for determining average performance per slide and standard deviation are considered. The metrics provide a means of evaluating accuracy of frames along the length of a volume. This would aid in assessing the accuracy of the volume itself and the approach to image acquisition (positioning and frequency of frame). The framework was implemented as an open-source module of the 3D Slicer platform. The ground truth tests verified that the framework correctly calculates the implemented metrics. The developed framework provides a convenient way to evaluate bone segmentation algorithms. The implementation fits in a widely used application for segmentation algorithm prototyping. Future algorithm development will benefit by monitoring the effects of adjustments to an algorithm in a standard evaluation framework.
Complexity of the Quantum Adiabatic Algorithm
NASA Astrophysics Data System (ADS)
Hen, Itay
2013-03-01
The Quantum Adiabatic Algorithm (QAA) has been proposed as a mechanism for efficiently solving optimization problems on a quantum computer. Since adiabatic computation is analog in nature and does not require the design and use of quantum gates, it can be thought of as a simpler and perhaps more profound method for performing quantum computations that might also be easier to implement experimentally. While these features have generated substantial research in QAA, to date there is still a lack of solid evidence that the algorithm can outperform classical optimization algorihms. Here, we discuss several aspects of the quantum adiabatic algorithm: We analyze the efficiency of the algorithm on several ``hard'' (NP) computational problems. Studying the size dependence of the typical minimum energy gap of the Hamiltonians of these problems using quantum Monte Carlo methods, we find that while for most problems the minimum gap decreases exponentially with the size of the problem, indicating that the QAA is not more efficient than existing classical search algorithms, for other problems there is evidence to suggest that the gap may be polynomial near the phase transition. We also discuss applications of the QAA to ``real life'' problems and how they can be implemented on currently available (albeit prototypical) quantum hardware such as ``D-Wave One'', that impose serious restrictions as to which type of problems may be tested. Finally, we discuss different approaches to find improved implementations of the algorithm such as local adiabatic evolution, adaptive methods, local search in Hamiltonian space and others.
Speeding Up the Bilateral Filter: A Joint Acceleration Way.
Dai, Longquan; Yuan, Mengke; Zhang, Xiaopeng
2016-06-01
Computational complexity of the brute-force implementation of the bilateral filter (BF) depends on its filter kernel size. To achieve the constant-time BF whose complexity is irrelevant to the kernel size, many techniques have been proposed, such as 2D box filtering, dimension promotion, and shiftability property. Although each of the above techniques suffers from accuracy and efficiency problems, previous algorithm designers were used to take only one of them to assemble fast implementations due to the hardness of combining them together. Hence, no joint exploitation of these techniques has been proposed to construct a new cutting edge implementation that solves these problems. Jointly employing five techniques: kernel truncation, best N-term approximation as well as previous 2D box filtering, dimension promotion, and shiftability property, we propose a unified framework to transform BF with arbitrary spatial and range kernels into a set of 3D box filters that can be computed in linear time. To the best of our knowledge, our algorithm is the first method that can integrate all these acceleration techniques and, therefore, can draw upon one another's strong point to overcome deficiencies. The strength of our method has been corroborated by several carefully designed experiments. In particular, the filtering accuracy is significantly improved without sacrificing the efficiency at running time.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hui, C; Suh, Y; Robertson, D
Purpose: To develop a novel algorithm to generate internal respiratory signals for sorting of four-dimensional (4D) computed tomography (CT) images. Methods: The proposed algorithm extracted multiple time resolved features as potential respiratory signals. These features were taken from the 4D CT images and its Fourier transformed space. Several low-frequency locations in the Fourier space and selected anatomical features from the images were used as potential respiratory signals. A clustering algorithm was then used to search for the group of appropriate potential respiratory signals. The chosen signals were then normalized and averaged to form the final internal respiratory signal. Performance ofmore » the algorithm was tested in 50 4D CT data sets and results were compared with external signals from the real-time position management (RPM) system. Results: In almost all cases, the proposed algorithm generated internal respiratory signals that visibly matched the external respiratory signals from the RPM system. On average, the end inspiration times calculated by the proposed algorithm were within 0.1 s of those given by the RPM system. Less than 3% of the calculated end inspiration times were more than one time frame away from those given by the RPM system. In 3 out of the 50 cases, the proposed algorithm generated internal respiratory signals that were significantly smoother than the RPM signals. In these cases, images sorted using the internal respiratory signals showed fewer artifacts in locations corresponding to the discrepancy in the internal and external respiratory signals. Conclusion: We developed a robust algorithm that generates internal respiratory signals from 4D CT images. In some cases, it even showed the potential to outperform the RPM system. The proposed algorithm is completely automatic and generally takes less than 2 min to process. It can be easily implemented into the clinic and can potentially replace the use of external surrogates.« less
Implementation of dictionary pair learning algorithm for image quality improvement
NASA Astrophysics Data System (ADS)
Vimala, C.; Aruna Priya, P.
2018-04-01
This paper proposes an image denoising on dictionary pair learning algorithm. Visual information is transmitted in the form of digital images is becoming a major method of communication in the modern age, but the image obtained after transmissions is often corrupted with noise. The received image needs processing before it can be used in applications. Image denoising involves the manipulation of the image data to produce a visually high quality image.
A Two-Stage Reconstruction Processor for Human Detection in Compressive Sensing CMOS Radar
Tsao, Kuei-Chi; Lee, Ling; Chu, Ta-Shun
2018-01-01
Complementary metal-oxide-semiconductor (CMOS) radar has recently gained much research attraction because small and low-power CMOS devices are very suitable for deploying sensing nodes in a low-power wireless sensing system. This study focuses on the signal processing of a wireless CMOS impulse radar system that can detect humans and objects in the home-care internet-of-things sensing system. The challenges of low-power CMOS radar systems are the weakness of human signals and the high computational complexity of the target detection algorithm. The compressive sensing-based detection algorithm can relax the computational costs by avoiding the utilization of matched filters and reducing the analog-to-digital converter bandwidth requirement. The orthogonal matching pursuit (OMP) is one of the popular signal reconstruction algorithms for compressive sensing radar; however, the complexity is still very high because the high resolution of human respiration leads to high-dimension signal reconstruction. Thus, this paper proposes a two-stage reconstruction algorithm for compressive sensing radar. The proposed algorithm not only has lower complexity than the OMP algorithm by 75% but also achieves better positioning performance than the OMP algorithm especially in noisy environments. This study also designed and implemented the algorithm by using Vertex-7 FPGA chip (Xilinx, San Jose, CA, USA). The proposed reconstruction processor can support the 256×13 real-time radar image display with a throughput of 28.2 frames per second. PMID:29621170
Understanding and Optimizing Asynchronous Low-Precision Stochastic Gradient Descent
De Sa, Christopher; Feldman, Matthew; Ré, Christopher; Olukotun, Kunle
2018-01-01
Stochastic gradient descent (SGD) is one of the most popular numerical algorithms used in machine learning and other domains. Since this is likely to continue for the foreseeable future, it is important to study techniques that can make it run fast on parallel hardware. In this paper, we provide the first analysis of a technique called Buckwild! that uses both asynchronous execution and low-precision computation. We introduce the DMGC model, the first conceptualization of the parameter space that exists when implementing low-precision SGD, and show that it provides a way to both classify these algorithms and model their performance. We leverage this insight to propose and analyze techniques to improve the speed of low-precision SGD. First, we propose software optimizations that can increase throughput on existing CPUs by up to 11×. Second, we propose architectural changes, including a new cache technique we call an obstinate cache, that increase throughput beyond the limits of current-generation hardware. We also implement and analyze low-precision SGD on the FPGA, which is a promising alternative to the CPU for future SGD systems. PMID:29391770
A new methodology for automatic detection of reference points in 3D cephalometry: A pilot study.
Ed-Dhahraouy, Mohammed; Riri, Hicham; Ezzahmouly, Manal; Bourzgui, Farid; El Moutaoukkil, Abdelmajid
2018-04-05
The aim of this study was to develop a new method for an automatic detection of reference points in 3D cephalometry to overcome the limits of 2D cephalometric analyses. A specific application was designed using the C++ language for automatic and manual identification of 21 (reference) points on the craniofacial structures. Our algorithm is based on the implementation of an anatomical and geometrical network adapted to the craniofacial structure. This network was constructed based on the anatomical knowledge of the 3D cephalometric (reference) points. The proposed algorithm was tested on five CBCT images. The proposed approach for the automatic 3D cephalometric identification was able to detect 21 points with a mean error of 2.32mm. In this pilot study, we propose an automated methodology for the identification of the 3D cephalometric (reference) points. A larger sample will be implemented in the future to assess the method validity and reliability. Copyright © 2018 CEO. Published by Elsevier Masson SAS. All rights reserved.
Design and implementation of streaming media server cluster based on FFMpeg.
Zhao, Hong; Zhou, Chun-long; Jin, Bao-zhao
2015-01-01
Poor performance and network congestion are commonly observed in the streaming media single server system. This paper proposes a scheme to construct a streaming media server cluster system based on FFMpeg. In this scheme, different users are distributed to different servers according to their locations and the balance among servers is maintained by the dynamic load-balancing algorithm based on active feedback. Furthermore, a service redirection algorithm is proposed to improve the transmission efficiency of streaming media data. The experiment results show that the server cluster system has significantly alleviated the network congestion and improved the performance in comparison with the single server system.
Design and Implementation of Streaming Media Server Cluster Based on FFMpeg
Zhao, Hong; Zhou, Chun-long; Jin, Bao-zhao
2015-01-01
Poor performance and network congestion are commonly observed in the streaming media single server system. This paper proposes a scheme to construct a streaming media server cluster system based on FFMpeg. In this scheme, different users are distributed to different servers according to their locations and the balance among servers is maintained by the dynamic load-balancing algorithm based on active feedback. Furthermore, a service redirection algorithm is proposed to improve the transmission efficiency of streaming media data. The experiment results show that the server cluster system has significantly alleviated the network congestion and improved the performance in comparison with the single server system. PMID:25734187
Ship detection in panchromatic images: a new method and its DSP implementation
NASA Astrophysics Data System (ADS)
Yao, Yuan; Jiang, Zhiguo; Zhang, Haopeng; Wang, Mengfei; Meng, Gang
2016-03-01
In this paper, a new ship detection method is proposed after analyzing the characteristics of panchromatic remote sensing images and ship targets. Firstly, AdaBoost(Adaptive Boosting) classifiers trained by Haar features are utilized to make coarse detection of ship targets. Then LSD (Line Segment Detector) is adopted to extract the line features in target slices to make fine detection. Experimental results on a dataset of panchromatic remote sensing images with a spatial resolution of 2m show that the proposed algorithm can achieve high detection rate and low false alarm rate. Meanwhile, the algorithm can meet the needs of practical applications on DSP (Digital Signal Processor).
Zeb, Salman; Yousaf, Muhammad
2017-01-01
In this article, we present a QR updating procedure as a solution approach for linear least squares problem with equality constraints. We reduce the constrained problem to unconstrained linear least squares and partition it into a small subproblem. The QR factorization of the subproblem is calculated and then we apply updating techniques to its upper triangular factor R to obtain its solution. We carry out the error analysis of the proposed algorithm to show that it is backward stable. We also illustrate the implementation and accuracy of the proposed algorithm by providing some numerical experiments with particular emphasis on dense problems.
Opto-digital spectrum encryption by using Baker mapping and gyrator transform
NASA Astrophysics Data System (ADS)
Chen, Hang; Zhao, Jiguang; Liu, Zhengjun; Du, Xiaoping
2015-03-01
A concept of spectrum information hidden technology is proposed in this paper. We present an optical encryption algorithm for hiding both the spatial and spectrum information by using the Baker mapping in gyrator transform domains. The Baker mapping is introduced for scrambling the every single band of the hyperspectral image before adding the random phase functions. Subsequently, three thin cylinder lenses are controlled by PC for implementing the gyrator transform. The amplitude and phase information in the output plane can be regarded as the encrypted information and main key. Some numerical simulations are made to test the validity and capability of the proposed encryption algorithm.
A low-power and high-quality implementation of the discrete cosine transformation
NASA Astrophysics Data System (ADS)
Heyne, B.; Götze, J.
2007-06-01
In this paper a computationally efficient and high-quality preserving DCT architecture is presented. It is obtained by optimizing the Loeffler DCT based on the Cordic algorithm. The computational complexity is reduced from 11 multiply and 29 add operations (Loeffler DCT) to 38 add and 16 shift operations (which is similar to the complexity of the binDCT). The experimental results show that the proposed DCT algorithm not only reduces the computational complexity significantly, but also retains the good transformation quality of the Loeffler DCT. Therefore, the proposed Cordic based Loeffler DCT is especially suited for low-power and high-quality CODECs in battery-based systems.
Bit-Table Based Biclustering and Frequent Closed Itemset Mining in High-Dimensional Binary Data
Király, András; Abonyi, János
2014-01-01
During the last decade various algorithms have been developed and proposed for discovering overlapping clusters in high-dimensional data. The two most prominent application fields in this research, proposed independently, are frequent itemset mining (developed for market basket data) and biclustering (applied to gene expression data analysis). The common limitation of both methodologies is the limited applicability for very large binary data sets. In this paper we propose a novel and efficient method to find both frequent closed itemsets and biclusters in high-dimensional binary data. The method is based on simple but very powerful matrix and vector multiplication approaches that ensure that all patterns can be discovered in a fast manner. The proposed algorithm has been implemented in the commonly used MATLAB environment and freely available for researchers. PMID:24616651
Log-Linear Model Based Behavior Selection Method for Artificial Fish Swarm Algorithm
Huang, Zhehuang; Chen, Yidong
2015-01-01
Artificial fish swarm algorithm (AFSA) is a population based optimization technique inspired by social behavior of fishes. In past several years, AFSA has been successfully applied in many research and application areas. The behavior of fishes has a crucial impact on the performance of AFSA, such as global exploration ability and convergence speed. How to construct and select behaviors of fishes are an important task. To solve these problems, an improved artificial fish swarm algorithm based on log-linear model is proposed and implemented in this paper. There are three main works. Firstly, we proposed a new behavior selection algorithm based on log-linear model which can enhance decision making ability of behavior selection. Secondly, adaptive movement behavior based on adaptive weight is presented, which can dynamically adjust according to the diversity of fishes. Finally, some new behaviors are defined and introduced into artificial fish swarm algorithm at the first time to improve global optimization capability. The experiments on high dimensional function optimization showed that the improved algorithm has more powerful global exploration ability and reasonable convergence speed compared with the standard artificial fish swarm algorithm. PMID:25691895
NASA Astrophysics Data System (ADS)
Mazur, Krzysztof; Wrona, Stanislaw; Pawelczyk, Marek
2018-01-01
The paper presents the idea and discussion on implementation of multichannel global active noise control systems. As a test plant an active casing is used. It has been developed by the authors to reduce device noise directly at the source by controlling vibration of its casing. To provide global acoustic effect in the whole environment, where the device operates, it requires a number of secondary sources and sensors for each casing wall, thus making the whole active control structure complicated, i.e. with a large number of interacting channels. The paper discloses all details concerning hardware setup and efficient implementation of control algorithms for the multichannel case. A new formulation is presented to introduce the distributed version of the Switched-error Filtered-reference Least Mean Squares (FXLMS) algorithm together with adaptation rate enhancement. The convergence rate of the proposed algorithm is compared with original Multiple-error FXLMS. A number of hints followed from many years of authors' experience on microprocessor control systems design and signal processing algorithms optimization are presented. They can be used for various active control and signal processing applications, both for academic research and commercialization.
Implementation of Wi-Fi Signal Sampling on an Android Smartphone for Indoor Positioning Systems.
Liu, Hung-Huan; Liu, Chun
2017-12-21
Collecting and maintaining radio fingerprint for wireless indoor positioning systems involves considerable time and labor. We have proposed the quick radio fingerprint collection (QRFC) algorithm which employed the built-in accelerometer of Android smartphones to implement step detection in order to assist in collecting radio fingerprints. In the present study, we divided the algorithm into moving sampling (MS) and stepped MS (SMS), and describe the implementation of both algorithms and their comparison. Technical details and common errors concerning the use of Android smartphones to collect Wi-Fi radio beacons were surveyed and discussed. The results of signal sampling experiments performed in a hallway measuring 54 m in length showed that in terms of the amount of time required to complete collection of access point (AP) signals, static sampling (SS; a traditional procedure for collecting Wi-Fi signals) took at least 2 h, whereas MS and SMS took approximately 150 and 300 s, respectively. Notably, AP signals obtained through MS and SMS were comparable to those obtained through SS in terms of the distribution of received signal strength indicator (RSSI) and positioning accuracy. Therefore, MS and SMS are recommended instead of SS as signal sampling procedures for indoor positioning algorithms.
Implementation of Wi-Fi Signal Sampling on an Android Smartphone for Indoor Positioning Systems
Liu, Chun
2017-01-01
Collecting and maintaining radio fingerprint for wireless indoor positioning systems involves considerable time and labor. We have proposed the quick radio fingerprint collection (QRFC) algorithm which employed the built-in accelerometer of Android smartphones to implement step detection in order to assist in collecting radio fingerprints. In the present study, we divided the algorithm into moving sampling (MS) and stepped MS (SMS), and describe the implementation of both algorithms and their comparison. Technical details and common errors concerning the use of Android smartphones to collect Wi-Fi radio beacons were surveyed and discussed. The results of signal sampling experiments performed in a hallway measuring 54 m in length showed that in terms of the amount of time required to complete collection of access point (AP) signals, static sampling (SS; a traditional procedure for collecting Wi-Fi signals) took at least 2 h, whereas MS and SMS took approximately 150 and 300 s, respectively. Notably, AP signals obtained through MS and SMS were comparable to those obtained through SS in terms of the distribution of received signal strength indicator (RSSI) and positioning accuracy. Therefore, MS and SMS are recommended instead of SS as signal sampling procedures for indoor positioning algorithms. PMID:29267234
Huang, Hsuan-Ming; Hsiao, Ing-Tsung
2017-01-01
Over the past decade, image quality in low-dose computed tomography has been greatly improved by various compressive sensing- (CS-) based reconstruction methods. However, these methods have some disadvantages including high computational cost and slow convergence rate. Many different speed-up techniques for CS-based reconstruction algorithms have been developed. The purpose of this paper is to propose a fast reconstruction framework that combines a CS-based reconstruction algorithm with several speed-up techniques. First, total difference minimization (TDM) was implemented using the soft-threshold filtering (STF). Second, we combined TDM-STF with the ordered subsets transmission (OSTR) algorithm for accelerating the convergence. To further speed up the convergence of the proposed method, we applied the power factor and the fast iterative shrinkage thresholding algorithm to OSTR and TDM-STF, respectively. Results obtained from simulation and phantom studies showed that many speed-up techniques could be combined to greatly improve the convergence speed of a CS-based reconstruction algorithm. More importantly, the increased computation time (≤10%) was minor as compared to the acceleration provided by the proposed method. In this paper, we have presented a CS-based reconstruction framework that combines several acceleration techniques. Both simulation and phantom studies provide evidence that the proposed method has the potential to satisfy the requirement of fast image reconstruction in practical CT.
Gangadari, Bhoopal Rao; Ahamed, Shaik Rafi
2016-12-01
In this paper, we presented a novel approach of low energy consumption architecture of S-Box used in Advanced Encryption Standard (AES) algorithm using programmable second order reversible cellular automata (RCA 2 ). The architecture entails a low power implementation with minimal delay overhead and the performance of proposed RCA 2 based S-Box in terms of security is evaluated using the cryptographic properties such as nonlinearity, correlation immunity bias, strict avalanche criteria, entropy and also found that the proposed architecture is secure enough for cryptographic applications. Moreover, the proposed AES algorithm architecture simulation studies show that energy consumption of 68.726 nJ, power dissipation of 3.856 mW for 0.18- μm at 13.69 MHz and energy consumption of 29.408 nJ, power dissipation of 1.65 mW for 0.13- μm at 13.69 MHz. The proposed AES algorithm with RCA 2 based S-Box shows a reduction power consumption by 50 % and energy consumption by 5 % compared to best classical S-Box and composite field arithmetic based AES algorithm. Apart from that, it is also shown that RCA 2 based S-Boxes are dynamic in nature, invertible, low power dissipation compared to that of LUT based S-Box and hence suitable for Wireless Body Area Network (WBAN) applications.
Al-Mayouf, Yusor Rafid Bahar; Ismail, Mahamod; Abdullah, Nor Fadzilah; Wahab, Ainuddin Wahid Abdul; Mahdi, Omar Adil; Khan, Suleman; Choo, Kim-Kwang Raymond
2016-01-01
Vehicular ad hoc networks (VANETs) are considered an emerging technology in the industrial and educational fields. This technology is essential in the deployment of the intelligent transportation system, which is targeted to improve safety and efficiency of traffic. The implementation of VANETs can be effectively executed by transmitting data among vehicles with the use of multiple hops. However, the intrinsic characteristics of VANETs, such as its dynamic network topology and intermittent connectivity, limit data delivery. One particular challenge of this network is the possibility that the contributing node may only remain in the network for a limited time. Hence, to prevent data loss from that node, the information must reach the destination node via multi-hop routing techniques. An appropriate, efficient, and stable routing algorithm must be developed for various VANET applications to address the issues of dynamic topology and intermittent connectivity. Therefore, this paper proposes a novel routing algorithm called efficient and stable routing algorithm based on user mobility and node density (ESRA-MD). The proposed algorithm can adapt to significant changes that may occur in the urban vehicular environment. This algorithm works by selecting an optimal route on the basis of hop count and link duration for delivering data from source to destination, thereby satisfying various quality of service considerations. The validity of the proposed algorithm is investigated by its comparison with ARP-QD protocol, which works on the mechanism of optimal route finding in VANETs in urban environments. Simulation results reveal that the proposed ESRA-MD algorithm shows remarkable improvement in terms of delivery ratio, delivery delay, and communication overhead. PMID:27855165
Kim, Yeoun Jae; Seo, Jong Hyun; Kim, Hong Rae; Kim, Kwang Gi
2017-06-01
Clinicians who frequently perform ultrasound scanning procedures often suffer from musculoskeletal disorders, arthritis, and myalgias. To minimize their occurrence and to assist clinicians, ultrasound scanning robots have been developed worldwide. Although, to date, there is still no commercially available ultrasound scanning robot, many control methods have been suggested and researched. These control algorithms are either image based or force based. If the ultrasound scanning robot control algorithm was a combination of the two algorithms, it could benefit from the advantage of each one. However, there are no existing control methods for ultrasound scanning robots that combine force control and image analysis. Therefore, in this work, a control algorithm is developed for an ultrasound scanning robot using force feedback and ultrasound image analysis. A manipulator-type ultrasound scanning robot named 'NCCUSR' is developed and a control algorithm for this robot is suggested and verified. First, conventional hybrid position-force control is implemented for the robot and the hybrid position-force control algorithm is combined with ultrasound image analysis to fully control the robot. The control method is verified using a thyroid phantom. It was found that the proposed algorithm can be applied to control the ultrasound scanning robot and experimental outcomes suggest that the images acquired using the proposed control method can yield a rating score that is equivalent to images acquired directly by the clinicians. The proposed control method can be applied to control the ultrasound scanning robot. However, more work must be completed to verify the proposed control method in order to become clinically feasible. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Efficient irregular wavefront propagation algorithms on Intel® Xeon Phi™
Gomes, Jeremias M.; Teodoro, George; de Melo, Alba; Kong, Jun; Kurc, Tahsin; Saltz, Joel H.
2016-01-01
We investigate the execution of the Irregular Wavefront Propagation Pattern (IWPP), a fundamental computing structure used in several image analysis operations, on the Intel® Xeon Phi™ co-processor. An efficient implementation of IWPP on the Xeon Phi is a challenging problem because of IWPP’s irregularity and the use of atomic instructions in the original IWPP algorithm to resolve race conditions. On the Xeon Phi, the use of SIMD and vectorization instructions is critical to attain high performance. However, SIMD atomic instructions are not supported. Therefore, we propose a new IWPP algorithm that can take advantage of the supported SIMD instruction set. We also evaluate an alternate storage container (priority queue) to track active elements in the wavefront in an effort to improve the parallel algorithm efficiency. The new IWPP algorithm is evaluated with Morphological Reconstruction and Imfill operations as use cases. Our results show performance improvements of up to 5.63× on top of the original IWPP due to vectorization. Moreover, the new IWPP achieves speedups of 45.7× and 1.62×, respectively, as compared to efficient CPU and GPU implementations. PMID:27298591
Efficient irregular wavefront propagation algorithms on Intel® Xeon Phi™.
Gomes, Jeremias M; Teodoro, George; de Melo, Alba; Kong, Jun; Kurc, Tahsin; Saltz, Joel H
2015-10-01
We investigate the execution of the Irregular Wavefront Propagation Pattern (IWPP), a fundamental computing structure used in several image analysis operations, on the Intel ® Xeon Phi ™ co-processor. An efficient implementation of IWPP on the Xeon Phi is a challenging problem because of IWPP's irregularity and the use of atomic instructions in the original IWPP algorithm to resolve race conditions. On the Xeon Phi, the use of SIMD and vectorization instructions is critical to attain high performance. However, SIMD atomic instructions are not supported. Therefore, we propose a new IWPP algorithm that can take advantage of the supported SIMD instruction set. We also evaluate an alternate storage container (priority queue) to track active elements in the wavefront in an effort to improve the parallel algorithm efficiency. The new IWPP algorithm is evaluated with Morphological Reconstruction and Imfill operations as use cases. Our results show performance improvements of up to 5.63 × on top of the original IWPP due to vectorization. Moreover, the new IWPP achieves speedups of 45.7 × and 1.62 × , respectively, as compared to efficient CPU and GPU implementations.
Infrared small target detection technology based on OpenCV
NASA Astrophysics Data System (ADS)
Liu, Lei; Huang, Zhijian
2013-05-01
Accurate and fast detection of infrared (IR) dim target has very important meaning for infrared precise guidance, early warning, video surveillance, etc. In this paper, some basic principles and the implementing flow charts of a series of algorithms for target detection are described. These algorithms are traditional two-frame difference method, improved three-frame difference method, background estimate and frame difference fusion method, and building background with neighborhood mean method. On the foundation of above works, an infrared target detection software platform which is developed by OpenCV and MFC is introduced. Three kinds of tracking algorithms are integrated in this software. In order to explain the software clearly, the framework and the function are described in this paper. At last, the experiments are performed for some real-life IR images. The whole algorithm implementing processes and results are analyzed, and those algorithms for detection targets are evaluated from the two aspects of subjective and objective. The results prove that the proposed method has satisfying detection effectiveness and robustness. Meanwhile, it has high detection efficiency and can be used for real-time detection.
Infrared small target detection technology based on OpenCV
NASA Astrophysics Data System (ADS)
Liu, Lei; Huang, Zhijian
2013-09-01
Accurate and fast detection of infrared (IR) dim target has very important meaning for infrared precise guidance, early warning, video surveillance, etc. In this paper, some basic principles and the implementing flow charts of a series of algorithms for target detection are described. These algorithms are traditional two-frame difference method, improved three-frame difference method, background estimate and frame difference fusion method, and building background with neighborhood mean method. On the foundation of above works, an infrared target detection software platform which is developed by OpenCV and MFC is introduced. Three kinds of tracking algorithms are integrated in this software. In order to explain the software clearly, the framework and the function are described in this paper. At last, the experiments are performed for some real-life IR images. The whole algorithm implementing processes and results are analyzed, and those algorithms for detection targets are evaluated from the two aspects of subjective and objective. The results prove that the proposed method has satisfying detection effectiveness and robustness. Meanwhile, it has high detection efficiency and can be used for real-time detection.
Improvement and implementation for Canny edge detection algorithm
NASA Astrophysics Data System (ADS)
Yang, Tao; Qiu, Yue-hong
2015-07-01
Edge detection is necessary for image segmentation and pattern recognition. In this paper, an improved Canny edge detection approach is proposed due to the defect of traditional algorithm. A modified bilateral filter with a compensation function based on pixel intensity similarity judgment was used to smooth image instead of Gaussian filter, which could preserve edge feature and remove noise effectively. In order to solve the problems of sensitivity to the noise in gradient calculating, the algorithm used 4 directions gradient templates. Finally, Otsu algorithm adaptively obtain the dual-threshold. All of the algorithm simulated with OpenCV 2.4.0 library in the environments of vs2010, and through the experimental analysis, the improved algorithm has been proved to detect edge details more effectively and with more adaptability.
NASA Astrophysics Data System (ADS)
Job, Joshua; Wang, Zhihui; Rønnow, Troels; Troyer, Matthias; Lidar, Daniel
2014-03-01
We report on experimental work benchmarking the performance of the D-Wave Two programmable annealer on its native Ising problem, and a comparison to available classical algorithms. In this talk we will focus on the comparison with an algorithm originally proposed and implemented by Alex Selby. This algorithm uses dynamic programming to repeatedly optimize over randomly selected maximal induced trees of the problem graph starting from a random initial state. If one is looking for a quantum advantage over classical algorithms, one should compare to classical algorithms which are designed and optimized to maximally take advantage of the structure of the type of problem one is using for the comparison. In that light, this classical algorithm should serve as a good gauge for any potential quantum speedup for the D-Wave Two.
Using Deep Learning Algorithm to Enhance Image-review Software for Surveillance Cameras
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cui, Yonggang; Thomas, Maikael A.
We propose the development of proven deep learning algorithms to flag objects and events of interest in Next Generation Surveillance System (NGSS) surveillance to make IAEA image review more efficient. Video surveillance is one of the core monitoring technologies used by the IAEA Department of Safeguards when implementing safeguards at nuclear facilities worldwide. The current image review software GARS has limited automated functions, such as scene-change detection, black image detection and missing scene analysis, but struggles with highly cluttered backgrounds. A cutting-edge algorithm to be developed in this project will enable efficient and effective searches in images and video streamsmore » by identifying and tracking safeguards relevant objects and detect anomalies in their vicinity. In this project, we will develop the algorithm, test it with the IAEA surveillance cameras and data sets collected at simulated nuclear facilities at BNL and SNL, and implement it in a software program for potential integration into the IAEA’s IRAP (Integrated Review and Analysis Program).« less
Real-time image processing of TOF range images using a reconfigurable processor system
NASA Astrophysics Data System (ADS)
Hussmann, S.; Knoll, F.; Edeler, T.
2011-07-01
During the last years, Time-of-Flight sensors achieved a significant impact onto research fields in machine vision. In comparison to stereo vision system and laser range scanners they combine the advantages of active sensors providing accurate distance measurements and camera-based systems recording a 2D matrix at a high frame rate. Moreover low cost 3D imaging has the potential to open a wide field of additional applications and solutions in markets like consumer electronics, multimedia, digital photography, robotics and medical technologies. This paper focuses on the currently implemented 4-phase-shift algorithm in this type of sensors. The most time critical operation of the phase-shift algorithm is the arctangent function. In this paper a novel hardware implementation of the arctangent function using a reconfigurable processor system is presented and benchmarked against the state-of-the-art CORDIC arctangent algorithm. Experimental results show that the proposed algorithm is well suited for real-time processing of the range images of TOF cameras.
A novel digital pulse processing architecture for nuclear instrumentation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Moline, Yoann; Thevenin, Mathieu; Corre, Gwenole
The field of nuclear instrumentation covers a wide range of applications, including counting, spectrometry, pulse shape discrimination and multi-channel coincidence. These applications are the topic of many researches, new algorithms and implementations are constantly proposed thanks to advances in digital signal processing. However, these improvements are not yet implemented in instrumentation devices. This is especially true for neutron-gamma discrimination applications which traditionally use charge comparison method while literature proposes other algorithms based on frequency domain or wavelet theory which show better performances. Another example is pileups which are generally rejected while pileup correction algorithms also exist. These processes are traditionallymore » performed offline due to two issues. The first is the Poissonian characteristic of the signal, composed of random arrival pulses which requires to current architectures to work in data flow. The second is the real-time requirement, which implies losing pulses when the pulse rate is too high. Despite the possibility of treating the pulses independently from each other, current architectures paralyze the acquisition of the signal during the processing of a pulse. This loss is called dead-time. These two issues have led current architectures to use dedicated solutions based on re-configurable components like Field Programmable Gate Arrays (FPGAs) to overcome the need of performance necessary to deal with dead-time. However, dedicated hardware algorithm implementations on re-configurable technologies are complex and time-consuming. For all these reasons, a programmable Digital pulse Processing (DPP) architecture in a high level language such as Cor C++ which can reduce dead-time would be worthwhile for nuclear instrumentation. This would reduce prototyping and test duration by reducing the level of hardware expertise to implement new algorithms. However, today's programmable solutions do not meet the need of performance to operate online and not allow scaling with the increase in the number of measurement channel. That is why an innovative DPP architecture is proposed in this paper. This architecture is able to overcome dead-time while being programmable and is flexible with the number of measurement channel. Proposed architecture is based on an innovative execution model for pulse processing applications which can be summarized as follow. The signal is not composed of pulses only, consequently, pulses processing does not have to operate on the entire signal. Therefore, the first step of our proposal is pulse extraction by the use of dedicated components named pulse extractors. The triggering step can be achieved after the analog-to-digital conversion without any signal shaping or filtering stages. Pileup detection and accurate pulse time stamping are done at this stage. Any application downstream this step can work on adaptive variable-sized array of samples simplifying pulse processing methods. Then, once the data flow is broken, it is possible to distribute pulses on Functional Units (FUs) which perform processing. As the date of each pulse is known, they can be processed individually out-of-order to provide the results. To manage the pulses distribution, a scheduler and an interconnection network are used. pulses are distributed on the first FU which is not busy without congesting the interconnection network. For this reason, the process duration does not result anymore in dead-time if there are enough FUs. FUs are designed to be standalone and to comprises at least a programmable general purpose processor (ARM, Microblaze) allowing the implementation of complex algorithms without any modification of the hardware. An acquisition chain is composed of a succession of algorithms which lead to organize our FUs as a software macro-pipeline, A simple approach consists in assigning one algorithm per FU. Consequently, the global latency becomes the worst latency of algorithms execution on FU. Moreover, as algorithms are executed locally - i.e. on a FU - this approach limits shared memory requirement. To handle multichannel, we propose FUs sharing, this approach maximize the chance to find a non-busy FU to process an incoming pulse. This is possible since each channel receive random event independently, the pulse extractors associated to them do not necessarily need to access simultaneously to all Computing resources at the same time to distribute their pulses. The major contribution of this paper is the proposition of an execution model and its associated hardware programmable architecture for digital pulse processing that can handle multiple acquisition channels while maintaining the scalability thanks to the use of shared resources. This execution model and associated architecture are validated by simulation of a cycle accurate architecture SystemC model. Proposed architecture shows promising results in terms of scalability while maintaining zero dead-time. This work also permit the sizing of hardware resources requirement required for a predefined set of applications. Future work will focus on the interconnection network and a scheduling policy that can exploit the variable-length of pulses. Then, the hardware implementation of this architecture will be performed and tested for a representative set of application.« less
An improved harmony search algorithm for emergency inspection scheduling
NASA Astrophysics Data System (ADS)
Kallioras, Nikos A.; Lagaros, Nikos D.; Karlaftis, Matthew G.
2014-11-01
The ability of nature-inspired search algorithms to efficiently handle combinatorial problems, and their successful implementation in many fields of engineering and applied sciences, have led to the development of new, improved algorithms. In this work, an improved harmony search (IHS) algorithm is presented, while a holistic approach for solving the problem of post-disaster infrastructure management is also proposed. The efficiency of IHS is compared with that of the algorithms of particle swarm optimization, differential evolution, basic harmony search and the pure random search procedure, when solving the districting problem that is the first part of post-disaster infrastructure management. The ant colony optimization algorithm is employed for solving the associated routing problem that constitutes the second part. The comparison is based on the quality of the results obtained, the computational demands and the sensitivity on the algorithmic parameters.
NASA Astrophysics Data System (ADS)
Puhan, Pratap Sekhar; Ray, Pravat Kumar; Panda, Gayadhar
2016-12-01
This paper presents the effectiveness of 5/5 Fuzzy rule implementation in Fuzzy Logic Controller conjunction with indirect control technique to enhance the power quality in single phase system, An indirect current controller in conjunction with Fuzzy Logic Controller is applied to the proposed shunt active power filter to estimate the peak reference current and capacitor voltage. Current Controller based pulse width modulation (CCPWM) is used to generate the switching signals of voltage source inverter. Various simulation results are presented to verify the good behaviour of the Shunt active Power Filter (SAPF) with proposed two levels Hysteresis Current Controller (HCC). For verification of Shunt Active Power Filter in real time, the proposed control algorithm has been implemented in laboratory developed setup in dSPACE platform.
NASA Astrophysics Data System (ADS)
Zhang, Xianxia; Wang, Jian; Qin, Tinggao
2003-09-01
Intelligent control algorithms are introduced into the control system of temperature and humidity. A multi-mode control algorithm of PI-Single Neuron is proposed for single loop control of temperature and humidity. In order to remove the coupling between temperature and humidity, a new decoupling method is presented, which is called fuzzy decoupling. The decoupling is achieved by using a fuzzy controller that dynamically modifies the static decoupling coefficient. Taking the control algorithm of PI-Single Neuron as the single loop control of temperature and humidity, the paper provides the simulated output response curves with no decoupling control, static decoupling control and fuzzy decoupling control. Those control algorithms are easily implemented in singlechip-based hardware systems.
Single-pass incremental force updates for adaptively restrained molecular dynamics.
Singh, Krishna Kant; Redon, Stephane
2018-03-30
Adaptively restrained molecular dynamics (ARMD) allows users to perform more integration steps in wall-clock time by switching on and off positional degrees of freedoms. This article presents new, single-pass incremental force updates algorithms to efficiently simulate a system using ARMD. We assessed different algorithms for speedup measurements and implemented them in the LAMMPS MD package. We validated the single-pass incremental force update algorithm on four different benchmarks using diverse pair potentials. The proposed algorithm allows us to perform simulation of a system faster than traditional MD in both NVE and NVT ensembles. Moreover, ARMD using the new single-pass algorithm speeds up the convergence of observables in wall-clock time. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
Cellular neural networks, the Navier-Stokes equation, and microarray image reconstruction.
Zineddin, Bachar; Wang, Zidong; Liu, Xiaohui
2011-11-01
Although the last decade has witnessed a great deal of improvements achieved for the microarray technology, many major developments in all the main stages of this technology, including image processing, are still needed. Some hardware implementations of microarray image processing have been proposed in the literature and proved to be promising alternatives to the currently available software systems. However, the main drawback of those proposed approaches is the unsuitable addressing of the quantification of the gene spot in a realistic way without any assumption about the image surface. Our aim in this paper is to present a new image-reconstruction algorithm using the cellular neural network that solves the Navier-Stokes equation. This algorithm offers a robust method for estimating the background signal within the gene-spot region. The MATCNN toolbox for Matlab is used to test the proposed method. Quantitative comparisons are carried out, i.e., in terms of objective criteria, between our approach and some other available methods. It is shown that the proposed algorithm gives highly accurate and realistic measurements in a fully automated manner within a remarkably efficient time.
Search automation of the generalized method of device operational characteristics improvement
NASA Astrophysics Data System (ADS)
Petrova, I. Yu; Puchkova, A. A.; Zaripova, V. M.
2017-01-01
The article presents brief results of analysis of existing search methods of the closest patents, which can be applied to determine generalized methods of device operational characteristics improvement. There were observed the most widespread clustering algorithms and metrics for determining the proximity degree between two documents. The article proposes the technique of generalized methods determination; it has two implementation variants and consists of 7 steps. This technique has been implemented in the “Patents search” subsystem of the “Intellect” system. Also the article gives an example of the use of the proposed technique.
A spline-based approach for computing spatial impulse responses.
Ellis, Michael A; Guenther, Drake; Walker, William F
2007-05-01
Computer simulations are an essential tool for the design of phased-array ultrasonic imaging systems. FIELD II, which determines the two-way temporal response of a transducer at a point in space, is the current de facto standard for ultrasound simulation tools. However, the need often arises to obtain two-way spatial responses at a single point in time, a set of dimensions for which FIELD II is not well optimized. This paper describes an analytical approach for computing the two-way, far-field, spatial impulse response from rectangular transducer elements under arbitrary excitation. The described approach determines the response as the sum of polynomial functions, making computational implementation quite straightforward. The proposed algorithm, named DELFI, was implemented as a C routine under Matlab and results were compared to those obtained under similar conditions from the well-established FIELD II program. Under the specific conditions tested here, the proposed algorithm was approximately 142 times faster than FIELD II for computing spatial sensitivity functions with similar amounts of error. For temporal sensitivity functions with similar amounts of error, the proposed algorithm was about 1.7 times slower than FIELD II using rectangular elements and 19.2 times faster than FIELD II using triangular elements. DELFI is shown to be an attractive complement to FIELD II, especially when spatial responses are needed at a specific point in time.
Optimal Decentralized Protocol for Electric Vehicle Charging
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gan, LW; Topcu, U; Low, SH
We propose a decentralized algorithm to optimally schedule electric vehicle (EV) charging. The algorithm exploits the elasticity of electric vehicle loads to fill the valleys in electric load profiles. We first formulate the EV charging scheduling problem as an optimal control problem, whose objective is to impose a generalized notion of valley-filling, and study properties of optimal charging profiles. We then give a decentralized algorithm to iteratively solve the optimal control problem. In each iteration, EVs update their charging profiles according to the control signal broadcast by the utility company, and the utility company alters the control signal to guidemore » their updates. The algorithm converges to optimal charging profiles (that are as "flat" as they can possibly be) irrespective of the specifications (e.g., maximum charging rate and deadline) of EVs, even if EVs do not necessarily update their charging profiles in every iteration, and use potentially outdated control signal when they update. Moreover, the algorithm only requires each EV solving its local problem, hence its implementation requires low computation capability. We also extend the algorithm to track a given load profile and to real-time implementation.« less
NASA Astrophysics Data System (ADS)
Roche-Lima, Abiel; Thulasiram, Ruppa K.
2012-02-01
Finite automata, in which each transition is augmented with an output label in addition to the familiar input label, are considered finite-state transducers. Transducers have been used to analyze some fundamental issues in bioinformatics. Weighted finite-state transducers have been proposed to pairwise alignments of DNA and protein sequences; as well as to develop kernels for computational biology. Machine learning algorithms for conditional transducers have been implemented and used for DNA sequence analysis. Transducer learning algorithms are based on conditional probability computation. It is calculated by using techniques, such as pair-database creation, normalization (with Maximum-Likelihood normalization) and parameters optimization (with Expectation-Maximization - EM). These techniques are intrinsically costly for computation, even worse when are applied to bioinformatics, because the databases sizes are large. In this work, we describe a parallel implementation of an algorithm to learn conditional transducers using these techniques. The algorithm is oriented to bioinformatics applications, such as alignments, phylogenetic trees, and other genome evolution studies. Indeed, several experiences were developed using the parallel and sequential algorithm on Westgrid (specifically, on the Breeze cluster). As results, we obtain that our parallel algorithm is scalable, because execution times are reduced considerably when the data size parameter is increased. Another experience is developed by changing precision parameter. In this case, we obtain smaller execution times using the parallel algorithm. Finally, number of threads used to execute the parallel algorithm on the Breezy cluster is changed. In this last experience, we obtain as result that speedup is considerably increased when more threads are used; however there is a convergence for number of threads equal to or greater than 16.
Minimizing embedding impact in steganography using trellis-coded quantization
NASA Astrophysics Data System (ADS)
Filler, Tomáš; Judas, Jan; Fridrich, Jessica
2010-01-01
In this paper, we propose a practical approach to minimizing embedding impact in steganography based on syndrome coding and trellis-coded quantization and contrast its performance with bounds derived from appropriate rate-distortion bounds. We assume that each cover element can be assigned a positive scalar expressing the impact of making an embedding change at that element (single-letter distortion). The problem is to embed a given payload with minimal possible average embedding impact. This task, which can be viewed as a generalization of matrix embedding or writing on wet paper, has been approached using heuristic and suboptimal tools in the past. Here, we propose a fast and very versatile solution to this problem that can theoretically achieve performance arbitrarily close to the bound. It is based on syndrome coding using linear convolutional codes with the optimal binary quantizer implemented using the Viterbi algorithm run in the dual domain. The complexity and memory requirements of the embedding algorithm are linear w.r.t. the number of cover elements. For practitioners, we include detailed algorithms for finding good codes and their implementation. Finally, we report extensive experimental results for a large set of relative payloads and for different distortion profiles, including the wet paper channel.
Efficient Double Auction Mechanisms in the Energy Grid with Connected and Islanded Microgrids
NASA Astrophysics Data System (ADS)
Faqiry, Mohammad Nazif
The future energy grid is expected to operate in a decentralized fashion as a network of autonomous microgrids that are coordinated by a Distribution System Operator (DSO), which should allocate energy to them in an efficient manner. Each microgrid operating in either islanded or grid-connected mode may be considered to manage its own resources. This can take place through auctions with individual units of the microgrid as the agents. This research proposes efficient auction mechanisms for the energy grid, with is-landed and connected microgrids. The microgrid level auction is carried out by means of an intermediate agent called an aggregator. The individual consumer and producer units are modeled as selfish agents. With the microgrid in islanded mode, two aggregator-level auction classes are analyzed: (i) price-heterogeneous, and (ii) price homogeneous. Under the price heterogeneity paradigm, this research extends earlier work on the well-known, single-sided Kelly mechanism to double auctions. As in Kelly auctions, the proposed algorithm implements the bidding without using any agent level private infor-mation (i.e. generation capacity and utility functions). The proposed auction is shown to be an efficient mechanism that maximizes the social welfare, i.e. the sum of the utilities of all the agents. Furthermore, the research considers the situation where a subset of agents act as a coalition to redistribute the allocated energy and price using any other specific fairness criterion. The price homogeneous double auction algorithm proposed in this research ad-dresses the problem of price-anticipation, where each agent tries to influence the equilibri-um price of energy by placing strategic bids. As a result of this behavior, the auction's efficiency is lowered. This research proposes a novel approach that is implemented by the aggregator, called virtual bidding, where the efficiency can be asymptotically maximized, even in the presence of price anticipatory bidders. Next, an auction mechanism for the energy grid, with multiple connected mi-crogrids is considered. A globally efficient bi-level auction algorithm is proposed. At the upper-level, the algorithm takes into account physical grid constraints in allocating energy to the microgrids. It is implemented by the DSO as a linear objective quadratic constraint problem that allows price heterogeneity across the aggregators. In parallel, each aggrega-tor implements its own lower-level price homogeneous auction with virtual bidding. The research concludes with a preliminary study on extending the DSO level auc-tion to multi-period day-ahead scheduling. It takes into account storage units and conven-tional generators that are present in the grid by formulating the auction as a mixed inte-ger linear programming problem.
A new approach to optic disc detection in human retinal images using the firefly algorithm.
Rahebi, Javad; Hardalaç, Fırat
2016-03-01
There are various methods and algorithms to detect the optic discs in retinal images. In recent years, much attention has been given to the utilization of the intelligent algorithms. In this paper, we present a new automated method of optic disc detection in human retinal images using the firefly algorithm. The firefly intelligent algorithm is an emerging intelligent algorithm that was inspired by the social behavior of fireflies. The population in this algorithm includes the fireflies, each of which has a specific rate of lighting or fitness. In this method, the insects are compared two by two, and the less attractive insects can be observed to move toward the more attractive insects. Finally, one of the insects is selected as the most attractive, and this insect presents the optimum response to the problem in question. Here, we used the light intensity of the pixels of the retinal image pixels instead of firefly lightings. The movement of these insects due to local fluctuations produces different light intensity values in the images. Because the optic disc is the brightest area in the retinal images, all of the insects move toward brightest area and thus specify the location of the optic disc in the image. The results of implementation show that proposed algorithm could acquire an accuracy rate of 100 % in DRIVE dataset, 95 % in STARE dataset, and 94.38 % in DiaRetDB1 dataset. The results of implementation reveal high capability and accuracy of proposed algorithm in the detection of the optic disc from retinal images. Also, recorded required time for the detection of the optic disc in these images is 2.13 s for DRIVE dataset, 2.81 s for STARE dataset, and 3.52 s for DiaRetDB1 dataset accordingly. These time values are average value.
Alam, M S; Bognar, J G; Cain, S; Yasuda, B J
1998-03-10
During the process of microscanning a controlled vibrating mirror typically is used to produce subpixel shifts in a sequence of forward-looking infrared (FLIR) images. If the FLIR is mounted on a moving platform, such as an aircraft, uncontrolled random vibrations associated with the platform can be used to generate the shifts. Iterative techniques such as the expectation-maximization (EM) approach by means of the maximum-likelihood algorithm can be used to generate high-resolution images from multiple randomly shifted aliased frames. In the maximum-likelihood approach the data are considered to be Poisson random variables and an EM algorithm is developed that iteratively estimates an unaliased image that is compensated for known imager-system blur while it simultaneously estimates the translational shifts. Although this algorithm yields high-resolution images from a sequence of randomly shifted frames, it requires significant computation time and cannot be implemented for real-time applications that use the currently available high-performance processors. The new image shifts are iteratively calculated by evaluation of a cost function that compares the shifted and interlaced data frames with the corresponding values in the algorithm's latest estimate of the high-resolution image. We present a registration algorithm that estimates the shifts in one step. The shift parameters provided by the new algorithm are accurate enough to eliminate the need for iterative recalculation of translational shifts. Using this shift information, we apply a simplified version of the EM algorithm to estimate a high-resolution image from a given sequence of video frames. The proposed modified EM algorithm has been found to reduce significantly the computational burden when compared with the original EM algorithm, thus making it more attractive for practical implementation. Both simulation and experimental results are presented to verify the effectiveness of the proposed technique.
Semantic super networks: A case analysis of Wikipedia papers
NASA Astrophysics Data System (ADS)
Kostyuchenko, Evgeny; Lebedeva, Taisiya; Goritov, Alexander
2017-11-01
An algorithm for constructing super-large semantic networks has been developed in current work. Algorithm was tested using the "Cosmos" category of the Internet encyclopedia "Wikipedia" as an example. During the implementation, a parser for the syntax analysis of Wikipedia pages was developed. A graph based on list of articles and categories was formed. On the basis of the obtained graph analysis, algorithms for finding domains of high connectivity in a graph were proposed and tested. Algorithms for constructing a domain based on the number of links and the number of articles in the current subject area is considered. The shortcomings of these algorithms are shown and explained, an algorithm is developed on their joint use. The possibility of applying a combined algorithm for obtaining the final domain is shown. The problem of instability of the received domain was discovered when starting an algorithm from two neighboring vertices related to the domain.
A novel surface registration algorithm with biomedical modeling applications.
Huang, Heng; Shen, Li; Zhang, Rong; Makedon, Fillia; Saykin, Andrew; Pearlman, Justin
2007-07-01
In this paper, we propose a novel surface matching algorithm for arbitrarily shaped but simply connected 3-D objects. The spherical harmonic (SPHARM) method is used to describe these 3-D objects, and a novel surface registration approach is presented. The proposed technique is applied to various applications of medical image analysis. The results are compared with those using the traditional method, in which the first-order ellipsoid is used for establishing surface correspondence and aligning objects. In these applications, our surface alignment method is demonstrated to be more accurate and flexible than the traditional approach. This is due in large part to the fact that a new surface parameterization is generated by a shortcut that employs a useful rotational property of spherical harmonic basis functions for a fast implementation. In order to achieve a suitable computational speed for practical applications, we propose a fast alignment algorithm that improves computational complexity of the new surface registration method from O(n3) to O(n2).
Large Scale Frequent Pattern Mining using MPI One-Sided Model
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vishnu, Abhinav; Agarwal, Khushbu
In this paper, we propose a work-stealing runtime --- Library for Work Stealing LibWS --- using MPI one-sided model for designing scalable FP-Growth --- {\\em de facto} frequent pattern mining algorithm --- on large scale systems. LibWS provides locality efficient and highly scalable work-stealing techniques for load balancing on a variety of data distributions. We also propose a novel communication algorithm for FP-growth data exchange phase, which reduces the communication complexity from state-of-the-art O(p) to O(f + p/f) for p processes and f frequent attributed-ids. FP-Growth is implemented using LibWS and evaluated on several work distributions and support counts. Anmore » experimental evaluation of the FP-Growth on LibWS using 4096 processes on an InfiniBand Cluster demonstrates excellent efficiency for several work distributions (87\\% efficiency for Power-law and 91% for Poisson). The proposed distributed FP-Tree merging algorithm provides 38x communication speedup on 4096 cores.« less
Generalized SMO algorithm for SVM-based multitask learning.
Cai, Feng; Cherkassky, Vladimir
2012-06-01
Exploiting additional information to improve traditional inductive learning is an active research area in machine learning. In many supervised-learning applications, training data can be naturally separated into several groups, and incorporating this group information into learning may improve generalization. Recently, Vapnik proposed a general approach to formalizing such problems, known as "learning with structured data" and its support vector machine (SVM) based optimization formulation called SVM+. Liang and Cherkassky showed the connection between SVM+ and multitask learning (MTL) approaches in machine learning, and proposed an SVM-based formulation for MTL called SVM+MTL for classification. Training the SVM+MTL classifier requires the solution of a large quadratic programming optimization problem which scales as O(n(3)) with sample size n. So there is a need to develop computationally efficient algorithms for implementing SVM+MTL. This brief generalizes Platt's sequential minimal optimization (SMO) algorithm to the SVM+MTL setting. Empirical results show that, for typical SVM+MTL problems, the proposed generalized SMO achieves over 100 times speed-up, in comparison with general-purpose optimization routines.
CUDA-based acceleration of collateral filtering in brain MR images
NASA Astrophysics Data System (ADS)
Li, Cheng-Yuan; Chang, Herng-Hua
2017-02-01
Image denoising is one of the fundamental and essential tasks within image processing. In medical imaging, finding an effective algorithm that can remove random noise in MR images is important. This paper proposes an effective noise reduction method for brain magnetic resonance (MR) images. Our approach is based on the collateral filter which is a more powerful method than the bilateral filter in many cases. However, the computation of the collateral filter algorithm is quite time-consuming. To solve this problem, we improved the collateral filter algorithm with parallel computing using GPU. We adopted CUDA, an application programming interface for GPU by NVIDIA, to accelerate the computation. Our experimental evaluation on an Intel Xeon CPU E5-2620 v3 2.40GHz with a NVIDIA Tesla K40c GPU indicated that the proposed implementation runs dramatically faster than the traditional collateral filter. We believe that the proposed framework has established a general blueprint for achieving fast and robust filtering in a wide variety of medical image denoising applications.
Development of a Guide-Dog Robot: Leading and Recognizing a Visually-Handicapped Person using a LRF
NASA Astrophysics Data System (ADS)
Saegusa, Shozo; Yasuda, Yuya; Uratani, Yoshitaka; Tanaka, Eiichirou; Makino, Toshiaki; Chang, Jen-Yuan (James
A conceptual Guide-Dog Robot prototype to lead and to recognize a visually-handicapped person is developed and discussed in this paper. Key design features of the robot include a movable platform, human-machine interface, and capability of avoiding obstacles. A novel algorithm enabling the robot to recognize its follower's locomotion as well to detect the center of corridor is proposed and implemented in the robot's human-machine interface. It is demonstrated that using the proposed novel leading and detecting algorithm along with a rapid scanning laser range finder (LRF) sensor, the robot is able to successfully and effectively lead a human walking in corridor without running into obstacles such as trash boxes or adjacent walking persons. Position and trajectory of the robot leading a human maneuvering in common corridor environment are measured by an independent LRF observer. The measured data suggest that the proposed algorithms are effective to enable the robot to detect center of the corridor and position of its follower correctly.
Convolution of large 3D images on GPU and its decomposition
NASA Astrophysics Data System (ADS)
Karas, Pavel; Svoboda, David
2011-12-01
In this article, we propose a method for computing convolution of large 3D images. The convolution is performed in a frequency domain using a convolution theorem. The algorithm is accelerated on a graphic card by means of the CUDA parallel computing model. Convolution is decomposed in a frequency domain using the decimation in frequency algorithm. We pay attention to keeping our approach efficient in terms of both time and memory consumption and also in terms of memory transfers between CPU and GPU which have a significant inuence on overall computational time. We also study the implementation on multiple GPUs and compare the results between the multi-GPU and multi-CPU implementations.
Time-Domain Fluorescence Lifetime Imaging Techniques Suitable for Solid-State Imaging Sensor Arrays
Li, David Day-Uei; Ameer-Beg, Simon; Arlt, Jochen; Tyndall, David; Walker, Richard; Matthews, Daniel R.; Visitkul, Viput; Richardson, Justin; Henderson, Robert K.
2012-01-01
We have successfully demonstrated video-rate CMOS single-photon avalanche diode (SPAD)-based cameras for fluorescence lifetime imaging microscopy (FLIM) by applying innovative FLIM algorithms. We also review and compare several time-domain techniques and solid-state FLIM systems, and adapt the proposed algorithms for massive CMOS SPAD-based arrays and hardware implementations. The theoretical error equations are derived and their performances are demonstrated on the data obtained from 0.13 μm CMOS SPAD arrays and the multiple-decay data obtained from scanning PMT systems. In vivo two photon fluorescence lifetime imaging data of FITC-albumin labeled vasculature of a P22 rat carcinosarcoma (BD9 rat window chamber) are used to test how different algorithms perform on bi-decay data. The proposed techniques are capable of producing lifetime images with enough contrast. PMID:22778606
Data-driven process decomposition and robust online distributed modelling for large-scale processes
NASA Astrophysics Data System (ADS)
Shu, Zhang; Lijuan, Li; Lijuan, Yao; Shipin, Yang; Tao, Zou
2018-02-01
With the increasing attention of networked control, system decomposition and distributed models show significant importance in the implementation of model-based control strategy. In this paper, a data-driven system decomposition and online distributed subsystem modelling algorithm was proposed for large-scale chemical processes. The key controlled variables are first partitioned by affinity propagation clustering algorithm into several clusters. Each cluster can be regarded as a subsystem. Then the inputs of each subsystem are selected by offline canonical correlation analysis between all process variables and its controlled variables. Process decomposition is then realised after the screening of input and output variables. When the system decomposition is finished, the online subsystem modelling can be carried out by recursively block-wise renewing the samples. The proposed algorithm was applied in the Tennessee Eastman process and the validity was verified.
Path Planning Algorithms for Autonomous Border Patrol Vehicles
NASA Astrophysics Data System (ADS)
Lau, George Tin Lam
This thesis presents an online path planning algorithm developed for unmanned vehicles in charge of autonomous border patrol. In this Pursuit-Evasion game, the unmanned vehicle is required to capture multiple trespassers on its own before any of them reach a target safe house where they are safe from capture. The problem formulation is based on Isaacs' Target Guarding problem, but extended to the case of multiple evaders. The proposed path planning method is based on Rapidly-exploring random trees (RRT) and is capable of producing trajectories within several seconds to capture 2 or 3 evaders. Simulations are carried out to demonstrate that the resulting trajectories approach the optimal solution produced by a nonlinear programming-based numerical optimal control solver. Experiments are also conducted on unmanned ground vehicles to show the feasibility of implementing the proposed online path planning algorithm on physical applications.
Multidimensional Optimization of Signal Space Distance Parameters in WLAN Positioning
Brković, Milenko; Simić, Mirjana
2014-01-01
Accurate indoor localization of mobile users is one of the challenging problems of the last decade. Besides delivering high speed Internet, Wireless Local Area Network (WLAN) can be used as an effective indoor positioning system, being competitive both in terms of accuracy and cost. Among the localization algorithms, nearest neighbor fingerprinting algorithms based on Received Signal Strength (RSS) parameter have been extensively studied as an inexpensive solution for delivering indoor Location Based Services (LBS). In this paper, we propose the optimization of the signal space distance parameters in order to improve precision of WLAN indoor positioning, based on nearest neighbor fingerprinting algorithms. Experiments in a real WLAN environment indicate that proposed optimization leads to substantial improvements of the localization accuracy. Our approach is conceptually simple, is easy to implement, and does not require any additional hardware. PMID:24757443
Speeding up local correlation methods
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kats, Daniel
2014-12-28
We present two techniques that can substantially speed up the local correlation methods. The first one allows one to avoid the expensive transformation of the electron-repulsion integrals from atomic orbitals to virtual space. The second one introduces an algorithm for the residual equations in the local perturbative treatment that, in contrast to the standard scheme, does not require holding the amplitudes or residuals in memory. It is shown that even an interpreter-based implementation of the proposed algorithm in the context of local MP2 method is faster and requires less memory than the highly optimized variants of conventional algorithms.
A gradient based algorithm to solve inverse plane bimodular problems of identification
NASA Astrophysics Data System (ADS)
Ran, Chunjiang; Yang, Haitian; Zhang, Guoqing
2018-02-01
This paper presents a gradient based algorithm to solve inverse plane bimodular problems of identifying constitutive parameters, including tensile/compressive moduli and tensile/compressive Poisson's ratios. For the forward bimodular problem, a FE tangent stiffness matrix is derived facilitating the implementation of gradient based algorithms, for the inverse bimodular problem of identification, a two-level sensitivity analysis based strategy is proposed. Numerical verification in term of accuracy and efficiency is provided, and the impacts of initial guess, number of measurement points, regional inhomogeneity, and noisy data on the identification are taken into accounts.
Comparison of multihardware parallel implementations for a phase unwrapping algorithm
NASA Astrophysics Data System (ADS)
Hernandez-Lopez, Francisco Javier; Rivera, Mariano; Salazar-Garibay, Adan; Legarda-Sáenz, Ricardo
2018-04-01
Phase unwrapping is an important problem in the areas of optical metrology, synthetic aperture radar (SAR) image analysis, and magnetic resonance imaging (MRI) analysis. These images are becoming larger in size and, particularly, the availability and need for processing of SAR and MRI data have increased significantly with the acquisition of remote sensing data and the popularization of magnetic resonators in clinical diagnosis. Therefore, it is important to develop faster and accurate phase unwrapping algorithms. We propose a parallel multigrid algorithm of a phase unwrapping method named accumulation of residual maps, which builds on a serial algorithm that consists of the minimization of a cost function; minimization achieved by means of a serial Gauss-Seidel kind algorithm. Our algorithm also optimizes the original cost function, but unlike the original work, our algorithm is a parallel Jacobi class with alternated minimizations. This strategy is known as the chessboard type, where red pixels can be updated in parallel at same iteration since they are independent. Similarly, black pixels can be updated in parallel in an alternating iteration. We present parallel implementations of our algorithm for different parallel multicore architecture such as CPU-multicore, Xeon Phi coprocessor, and Nvidia graphics processing unit. In all the cases, we obtain a superior performance of our parallel algorithm when compared with the original serial version. In addition, we present a detailed comparative performance of the developed parallel versions.
A hardware-oriented concurrent TZ search algorithm for High-Efficiency Video Coding
NASA Astrophysics Data System (ADS)
Doan, Nghia; Kim, Tae Sung; Rhee, Chae Eun; Lee, Hyuk-Jae
2017-12-01
High-Efficiency Video Coding (HEVC) is the latest video coding standard, in which the compression performance is double that of its predecessor, the H.264/AVC standard, while the video quality remains unchanged. In HEVC, the test zone (TZ) search algorithm is widely used for integer motion estimation because it effectively searches the good-quality motion vector with a relatively small amount of computation. However, the complex computation structure of the TZ search algorithm makes it difficult to implement it in the hardware. This paper proposes a new integer motion estimation algorithm which is designed for hardware execution by modifying the conventional TZ search to allow parallel motion estimations of all prediction unit (PU) partitions. The algorithm consists of the three phases of zonal, raster, and refinement searches. At the beginning of each phase, the algorithm obtains the search points required by the original TZ search for all PU partitions in a coding unit (CU). Then, all redundant search points are removed prior to the estimation of the motion costs, and the best search points are then selected for all PUs. Compared to the conventional TZ search algorithm, experimental results show that the proposed algorithm significantly decreases the Bjøntegaard Delta bitrate (BD-BR) by 0.84%, and it also reduces the computational complexity by 54.54%.
Image contrast enhancement using adjacent-blocks-based modification for local histogram equalization
NASA Astrophysics Data System (ADS)
Wang, Yang; Pan, Zhibin
2017-11-01
Infrared images usually have some non-ideal characteristics such as weak target-to-background contrast and strong noise. Because of these characteristics, it is necessary to apply the contrast enhancement algorithm to improve the visual quality of infrared images. Histogram equalization (HE) algorithm is a widely used contrast enhancement algorithm due to its effectiveness and simple implementation. But a drawback of HE algorithm is that the local contrast of an image cannot be equally enhanced. Local histogram equalization algorithms are proved to be the effective techniques for local image contrast enhancement. However, over-enhancement of noise and artifacts can be easily found in the local histogram equalization enhanced images. In this paper, a new contrast enhancement technique based on local histogram equalization algorithm is proposed to overcome the drawbacks mentioned above. The input images are segmented into three kinds of overlapped sub-blocks using the gradients of them. To overcome the over-enhancement effect, the histograms of these sub-blocks are then modified by adjacent sub-blocks. We pay more attention to improve the contrast of detail information while the brightness of the flat region in these sub-blocks is well preserved. It will be shown that the proposed algorithm outperforms other related algorithms by enhancing the local contrast without introducing over-enhancement effects and additional noise.
A Parallel Compact Multi-Dimensional Numerical Algorithm with Aeroacoustics Applications
NASA Technical Reports Server (NTRS)
Povitsky, Alex; Morris, Philip J.
1999-01-01
In this study we propose a novel method to parallelize high-order compact numerical algorithms for the solution of three-dimensional PDEs (Partial Differential Equations) in a space-time domain. For this numerical integration most of the computer time is spent in computation of spatial derivatives at each stage of the Runge-Kutta temporal update. The most efficient direct method to compute spatial derivatives on a serial computer is a version of Gaussian elimination for narrow linear banded systems known as the Thomas algorithm. In a straightforward pipelined implementation of the Thomas algorithm processors are idle due to the forward and backward recurrences of the Thomas algorithm. To utilize processors during this time, we propose to use them for either non-local data independent computations, solving lines in the next spatial direction, or local data-dependent computations by the Runge-Kutta method. To achieve this goal, control of processor communication and computations by a static schedule is adopted. Thus, our parallel code is driven by a communication and computation schedule instead of the usual "creative, programming" approach. The obtained parallelization speed-up of the novel algorithm is about twice as much as that for the standard pipelined algorithm and close to that for the explicit DRP algorithm.
Dynamic Appliances Scheduling in Collaborative MicroGrids System
Bilil, Hasnae; Aniba, Ghassane; Gharavi, Hamid
2017-01-01
In this paper a new approach which is based on a collaborative system of MicroGrids (MG’s), is proposed to enable household appliance scheduling. To achieve this, appliances are categorized into flexible and non-flexible Deferrable Loads (DL’s), according to their electrical components. We propose a dynamic scheduling algorithm where users can systematically manage the operation of their electric appliances. The main challenge is to develop a flattening function calculus (reshaping) for both flexible and non-flexible DL’s. In addition, implementation of the proposed algorithm would require dynamically analyzing two successive multi-objective optimization (MOO) problems. The first targets the activation schedule of non-flexible DL’s and the second deals with the power profiles of flexible DL’s. The MOO problems are resolved by using a fast and elitist multi-objective genetic algorithm (NSGA-II). Finally, in order to show the efficiency of the proposed approach, a case study of a collaborative system that consists of 40 MG’s registered in the load curve for the flattening program has been developed. The results verify that the load curve can indeed become very flat by applying the proposed scheduling approach. PMID:28824226
Efficient implementation of the many-body Reactive Bond Order (REBO) potential on GPU
NASA Astrophysics Data System (ADS)
Trędak, Przemysław; Rudnicki, Witold R.; Majewski, Jacek A.
2016-09-01
The second generation Reactive Bond Order (REBO) empirical potential is commonly used to accurately model a wide range hydrocarbon materials. It is also extensible to other atom types and interactions. REBO potential assumes complex multi-body interaction model, that is difficult to represent efficiently in the SIMD or SIMT programming model. Hence, despite its importance, no efficient GPGPU implementation has been developed for this potential. Here we present a detailed description of a highly efficient GPGPU implementation of molecular dynamics algorithm using REBO potential. The presented algorithm takes advantage of rarely used properties of the SIMT architecture of a modern GPU to solve difficult synchronizations issues that arise in computations of multi-body potential. Techniques developed for this problem may be also used to achieve efficient solutions of different problems. The performance of proposed algorithm is assessed using a range of model systems. It is compared to highly optimized CPU implementation (both single core and OpenMP) available in LAMMPS package. These experiments show up to 6x improvement in forces computation time using single processor of the NVIDIA Tesla K80 compared to high end 16-core Intel Xeon processor.
Real-time estimation of ionospheric delay using GPS measurements
NASA Astrophysics Data System (ADS)
Lin, Lao-Sheng
1997-12-01
When radio waves such as the GPS signals propagate through the ionosphere, they experience an extra time delay. The ionospheric delay can be eliminated (to the first order) through a linear combination of L1 and L2 observations from dual-frequency GPS receivers. Taking advantage of this dispersive principle, one or more dual- frequency GPS receivers can be used to determine a model of the ionospheric delay across a region of interest and, if implemented in real-time, can support single-frequency GPS positioning and navigation applications. The research objectives of this thesis were: (1) to develop algorithms to obtain accurate absolute Total Electron Content (TEC) estimates from dual-frequency GPS observables, and (2) to develop an algorithm to improve the accuracy of real-time ionosphere modelling. In order to fulfil these objectives, four algorithms have been proposed in this thesis. A 'multi-day multipath template technique' is proposed to mitigate the pseudo-range multipath effects at static GPS reference stations. This technique is based on the assumption that the multipath disturbance at a static station will be constant if the physical environment remains unchanged from day to day. The multipath template, either single-day or multi-day, can be generated from the previous days' GPS data. A 'real-time failure detection and repair algorithm' is proposed to detect and repair the GPS carrier phase 'failures', such as the occurrence of cycle slips. The proposed algorithm uses two procedures: (1) application of a statistical test on the state difference estimated from robust and conventional Kalman filters in order to detect and identify the carrier phase failure, and (2) application of a Kalman filter algorithm to repair the 'identified carrier phase failure'. A 'L1/L2 differential delay estimation algorithm' is proposed to estimate GPS satellite transmitter and receiver L1/L2 differential delays. This algorithm, based on the single-site modelling technique, is able to estimate the sum of the satellite and receiver L1/L2 differential delay for each tracked GPS satellite. A 'UNSW grid-based algorithm' is proposed to improve the accuracy of real-time ionosphere modelling. The proposed algorithm is similar to the conventional grid-based algorithm. However, two modifications were made to the algorithm: (1) an 'exponential function' is adopted as the weighting function, and (2) the 'grid-based ionosphere model' estimated from the previous day is used to predict the ionospheric delay ratios between the grid point and reference points. (Abstract shortened by UMI.)
Online signature recognition using principal component analysis and artificial neural network
NASA Astrophysics Data System (ADS)
Hwang, Seung-Jun; Park, Seung-Je; Baek, Joong-Hwan
2016-12-01
In this paper, we propose an algorithm for on-line signature recognition using fingertip point in the air from the depth image acquired by Kinect. We extract 10 statistical features from X, Y, Z axis, which are invariant to changes in shifting and scaling of the signature trajectories in three-dimensional space. Artificial neural network is adopted to solve the complex signature classification problem. 30 dimensional features are converted into 10 principal components using principal component analysis, which is 99.02% of total variances. We implement the proposed algorithm and test to actual on-line signatures. In experiment, we verify the proposed method is successful to classify 15 different on-line signatures. Experimental result shows 98.47% of recognition rate when using only 10 feature vectors.
BPP: a sequence-based algorithm for branch point prediction.
Zhang, Qing; Fan, Xiaodan; Wang, Yejun; Sun, Ming-An; Shao, Jianlin; Guo, Dianjing
2017-10-15
Although high-throughput sequencing methods have been proposed to identify splicing branch points in the human genome, these methods can only detect a small fraction of the branch points subject to the sequencing depth, experimental cost and the expression level of the mRNA. An accurate computational model for branch point prediction is therefore an ongoing objective in human genome research. We here propose a novel branch point prediction algorithm that utilizes information on the branch point sequence and the polypyrimidine tract. Using experimentally validated data, we demonstrate that our proposed method outperforms existing methods. Availability and implementation: https://github.com/zhqingit/BPP. djguo@cuhk.edu.hk. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Power Control and Optimization of Photovoltaic and Wind Energy Conversion Systems
NASA Astrophysics Data System (ADS)
Ghaffari, Azad
Power map and Maximum Power Point (MPP) of Photovoltaic (PV) and Wind Energy Conversion Systems (WECS) highly depend on system dynamics and environmental parameters, e.g., solar irradiance, temperature, and wind speed. Power optimization algorithms for PV systems and WECS are collectively known as Maximum Power Point Tracking (MPPT) algorithm. Gradient-based Extremum Seeking (ES), as a non-model-based MPPT algorithm, governs the system to its peak point on the steepest descent curve regardless of changes of the system dynamics and variations of the environmental parameters. Since the power map shape defines the gradient vector, then a close estimate of the power map shape is needed to create user assignable transients in the MPPT algorithm. The Hessian gives a precise estimate of the power map in a neighborhood around the MPP. The estimate of the inverse of the Hessian in combination with the estimate of the gradient vector are the key parts to implement the Newton-based ES algorithm. Hence, we generate an estimate of the Hessian using our proposed perturbation matrix. Also, we introduce a dynamic estimator to calculate the inverse of the Hessian which is an essential part of our algorithm. We present various simulations and experiments on the micro-converter PV systems to verify the validity of our proposed algorithm. The ES scheme can also be used in combination with other control algorithms to achieve desired closed-loop performance. The WECS dynamics is slow which causes even slower response time for the MPPT based on the ES. Hence, we present a control scheme, extended from Field-Oriented Control (FOC), in combination with feedback linearization to reduce the convergence time of the closed-loop system. Furthermore, the nonlinear control prevents magnetic saturation of the stator of the Induction Generator (IG). The proposed control algorithm in combination with the ES guarantees the closed-loop system robustness with respect to high level parameter uncertainty in the IG dynamics. The simulation results verify the effectiveness of the proposed algorithm.
A synergistic method for vibration suppression of an elevator mechatronic system
NASA Astrophysics Data System (ADS)
Knezevic, Bojan Z.; Blanusa, Branko; Marcetic, Darko P.
2017-10-01
Modern elevators are complex mechatronic systems which have to satisfy high performance in precision, safety and ride comfort. Each elevator mechatronic system (EMS) contains a mechanical subsystem which is characterized by its resonant frequency. In order to achieve high performance of the whole system, the control part of the EMS inevitably excites resonant circuits causing the occurrence of vibration. This paper proposes a synergistic solution based on the jerk control and the upgrade of the speed controller with a band-stop filter to restore lost ride comfort and speed control caused by vibration. The band-stop filter eliminates the resonant component from the speed controller spectra and jerk control provides operating of the speed controller in a linear mode as well as increased ride comfort. The original method for band-stop filter tuning based on Goertzel algorithm and Kiefer search algorithm is proposed in this paper. In order to generate the speed reference trajectory which can be defined by different shapes and amplitudes of jerk, a unique generalized model is proposed. The proposed algorithm is integrated in the power drive control algorithm and implemented on the digital signal processor. Through experimental verifications on a scale down prototype of the EMS it has been verified that only synergistic effect of controlling jerk and filtrating the reference torque can completely eliminate vibrations.
A Fine-Grained Pipelined Implementation for Large-Scale Matrix Inversion on FPGA
NASA Astrophysics Data System (ADS)
Zhou, Jie; Dou, Yong; Zhao, Jianxun; Xia, Fei; Lei, Yuanwu; Tang, Yuxing
Large-scale matrix inversion play an important role in many applications. However to the best of our knowledge, there is no FPGA-based implementation. In this paper, we explore the possibility of accelerating large-scale matrix inversion on FPGA. To exploit the computational potential of FPGA, we introduce a fine-grained parallel algorithm for matrix inversion. A scalable linear array processing elements (PEs), which is the core component of the FPGA accelerator, is proposed to implement this algorithm. A total of 12 PEs can be integrated into an Altera StratixII EP2S130F1020C5 FPGA on our self-designed board. Experimental results show that a factor of 2.6 speedup and the maximum power-performance of 41 can be achieved compare to Pentium Dual CPU with double SSE threads.
Experimental quantum computing to solve systems of linear equations.
Cai, X-D; Weedbrook, C; Su, Z-E; Chen, M-C; Gu, Mile; Zhu, M-J; Li, Li; Liu, Nai-Le; Lu, Chao-Yang; Pan, Jian-Wei
2013-06-07
Solving linear systems of equations is ubiquitous in all areas of science and engineering. With rapidly growing data sets, such a task can be intractable for classical computers, as the best known classical algorithms require a time proportional to the number of variables N. A recently proposed quantum algorithm shows that quantum computers could solve linear systems in a time scale of order log(N), giving an exponential speedup over classical computers. Here we realize the simplest instance of this algorithm, solving 2×2 linear equations for various input vectors on a quantum computer. We use four quantum bits and four controlled logic gates to implement every subroutine required, demonstrating the working principle of this algorithm.