A comparison of select image-compression algorithms for an electronic still camera
NASA Technical Reports Server (NTRS)
Nerheim, Rosalee
1989-01-01
This effort is a study of image-compression algorithms for an electronic still camera. An electronic still camera can record and transmit high-quality images without the use of film, because images are stored digitally in computer memory. However, high-resolution images contain an enormous amount of information, and will strain the camera's data-storage system. Image compression will allow more images to be stored in the camera's memory. For the electronic still camera, a compression algorithm that produces a reconstructed image of high fidelity is most important. Efficiency of the algorithm is the second priority. High fidelity and efficiency are more important than a high compression ratio. Several algorithms were chosen for this study and judged on fidelity, efficiency and compression ratio. The transform method appears to be the best choice. At present, the method is compressing images to a ratio of 5.3:1 and producing high-fidelity reconstructed images.
Tactical Synthesis Of Efficient Global Search Algorithms
NASA Technical Reports Server (NTRS)
Nedunuri, Srinivas; Smith, Douglas R.; Cook, William R.
2009-01-01
Algorithm synthesis transforms a formal specification into an efficient algorithm to solve a problem. Algorithm synthesis in Specware combines the formal specification of a problem with a high-level algorithm strategy. To derive an efficient algorithm, a developer must define operators that refine the algorithm by combining the generic operators in the algorithm with the details of the problem specification. This derivation requires skill and a deep understanding of the problem and the algorithmic strategy. In this paper we introduce two tactics to ease this process. The tactics serve a similar purpose to tactics used for determining indefinite integrals in calculus, that is suggesting possible ways to attack the problem.
A Simulated Annealing Algorithm for the Optimization of Multistage Depressed Collector Efficiency
NASA Technical Reports Server (NTRS)
Vaden, Karl R.; Wilson, Jeffrey D.; Bulson, Brian A.
2002-01-01
The microwave traveling wave tube amplifier (TWTA) is widely used as a high-power transmitting source for space and airborne communications. One critical factor in designing a TWTA is the overall efficiency. However, overall efficiency is highly dependent upon collector efficiency; so collector design is critical to the performance of a TWTA. Therefore, NASA Glenn Research Center has developed an optimization algorithm based on Simulated Annealing to quickly design highly efficient multi-stage depressed collectors (MDC).
NASA Astrophysics Data System (ADS)
Doha, E. H.; Abd-Elhameed, W. M.; Bassuony, M. A.
2013-03-01
This paper is concerned with spectral Galerkin algorithms for solving high even-order two point boundary value problems in one dimension subject to homogeneous and nonhomogeneous boundary conditions. The proposed algorithms are extended to solve two-dimensional high even-order differential equations. The key to the efficiency of these algorithms is to construct compact combinations of Chebyshev polynomials of the third and fourth kinds as basis functions. The algorithms lead to linear systems with specially structured matrices that can be efficiently inverted. Numerical examples are included to demonstrate the validity and applicability of the proposed algorithms, and some comparisons with some other methods are made.
A High Fuel Consumption Efficiency Management Scheme for PHEVs Using an Adaptive Genetic Algorithm
Lee, Wah Ching; Tsang, Kim Fung; Chi, Hao Ran; Hung, Faan Hei; Wu, Chung Kit; Chui, Kwok Tai; Lau, Wing Hong; Leung, Yat Wah
2015-01-01
A high fuel efficiency management scheme for plug-in hybrid electric vehicles (PHEVs) has been developed. In order to achieve fuel consumption reduction, an adaptive genetic algorithm scheme has been designed to adaptively manage the energy resource usage. The objective function of the genetic algorithm is implemented by designing a fuzzy logic controller which closely monitors and resembles the driving conditions and environment of PHEVs, thus trading off between petrol versus electricity for optimal driving efficiency. Comparison between calculated results and publicized data shows that the achieved efficiency of the fuzzified genetic algorithm is better by 10% than existing schemes. The developed scheme, if fully adopted, would help reduce over 600 tons of CO2 emissions worldwide every day. PMID:25587974
High-efficiency Gaussian key reconciliation in continuous variable quantum key distribution
NASA Astrophysics Data System (ADS)
Bai, ZengLiang; Wang, XuYang; Yang, ShenShen; Li, YongMin
2016-01-01
Efficient reconciliation is a crucial step in continuous variable quantum key distribution. The progressive-edge-growth (PEG) algorithm is an efficient method to construct relatively short block length low-density parity-check (LDPC) codes. The qua-sicyclic construction method can extend short block length codes and further eliminate the shortest cycle. In this paper, by combining the PEG algorithm and qua-si-cyclic construction method, we design long block length irregular LDPC codes with high error-correcting capacity. Based on these LDPC codes, we achieve high-efficiency Gaussian key reconciliation with slice recon-ciliation based on multilevel coding/multistage decoding with an efficiency of 93.7%.
Comparison of l₁-Norm SVR and Sparse Coding Algorithms for Linear Regression.
Zhang, Qingtian; Hu, Xiaolin; Zhang, Bo
2015-08-01
Support vector regression (SVR) is a popular function estimation technique based on Vapnik's concept of support vector machine. Among many variants, the l1-norm SVR is known to be good at selecting useful features when the features are redundant. Sparse coding (SC) is a technique widely used in many areas and a number of efficient algorithms are available. Both l1-norm SVR and SC can be used for linear regression. In this brief, the close connection between the l1-norm SVR and SC is revealed and some typical algorithms are compared for linear regression. The results show that the SC algorithms outperform the Newton linear programming algorithm, an efficient l1-norm SVR algorithm, in efficiency. The algorithms are then used to design the radial basis function (RBF) neural networks. Experiments on some benchmark data sets demonstrate the high efficiency of the SC algorithms. In particular, one of the SC algorithms, the orthogonal matching pursuit is two orders of magnitude faster than a well-known RBF network designing algorithm, the orthogonal least squares algorithm.
Vectorized algorithms for spiking neural network simulation.
Brette, Romain; Goodman, Dan F M
2011-06-01
High-level languages (Matlab, Python) are popular in neuroscience because they are flexible and accelerate development. However, for simulating spiking neural networks, the cost of interpretation is a bottleneck. We describe a set of algorithms to simulate large spiking neural networks efficiently with high-level languages using vector-based operations. These algorithms constitute the core of Brian, a spiking neural network simulator written in the Python language. Vectorized simulation makes it possible to combine the flexibility of high-level languages with the computational efficiency usually associated with compiled languages.
An Improved Perturb and Observe Algorithm for Photovoltaic Motion Carriers
NASA Astrophysics Data System (ADS)
Peng, Lele; Xu, Wei; Li, Liming; Zheng, Shubin
2018-03-01
An improved perturbation and observation algorithm for photovoltaic motion carriers is proposed in this paper. The model of the proposed algorithm is given by using Lambert W function and tangent error method. Moreover, by using matlab and experiment of photovoltaic system, the tracking performance of the proposed algorithm is tested. And the results demonstrate that the improved algorithm has fast tracking speed and high efficiency. Furthermore, the energy conversion efficiency by the improved method has increased by nearly 8.2%.
Accelerated simulation of stochastic particle removal processes in particle-resolved aerosol models
DOE Office of Scientific and Technical Information (OSTI.GOV)
Curtis, J.H.; Michelotti, M.D.; Riemer, N.
2016-10-01
Stochastic particle-resolved methods have proven useful for simulating multi-dimensional systems such as composition-resolved aerosol size distributions. While particle-resolved methods have substantial benefits for highly detailed simulations, these techniques suffer from high computational cost, motivating efforts to improve their algorithmic efficiency. Here we formulate an algorithm for accelerating particle removal processes by aggregating particles of similar size into bins. We present the Binned Algorithm for particle removal processes and analyze its performance with application to the atmospherically relevant process of aerosol dry deposition. We show that the Binned Algorithm can dramatically improve the efficiency of particle removals, particularly for low removalmore » rates, and that computational cost is reduced without introducing additional error. In simulations of aerosol particle removal by dry deposition in atmospherically relevant conditions, we demonstrate about 50-times increase in algorithm efficiency.« less
NASA Astrophysics Data System (ADS)
He, Qiang; Schultz, Richard R.; Chu, Chee-Hung Henry
2008-04-01
The concept surrounding super-resolution image reconstruction is to recover a highly-resolved image from a series of low-resolution images via between-frame subpixel image registration. In this paper, we propose a novel and efficient super-resolution algorithm, and then apply it to the reconstruction of real video data captured by a small Unmanned Aircraft System (UAS). Small UAS aircraft generally have a wingspan of less than four meters, so that these vehicles and their payloads can be buffeted by even light winds, resulting in potentially unstable video. This algorithm is based on a coarse-to-fine strategy, in which a coarsely super-resolved image sequence is first built from the original video data by image registration and bi-cubic interpolation between a fixed reference frame and every additional frame. It is well known that the median filter is robust to outliers. If we calculate pixel-wise medians in the coarsely super-resolved image sequence, we can restore a refined super-resolved image. The primary advantage is that this is a noniterative algorithm, unlike traditional approaches based on highly-computational iterative algorithms. Experimental results show that our coarse-to-fine super-resolution algorithm is not only robust, but also very efficient. In comparison with five well-known super-resolution algorithms, namely the robust super-resolution algorithm, bi-cubic interpolation, projection onto convex sets (POCS), the Papoulis-Gerchberg algorithm, and the iterated back projection algorithm, our proposed algorithm gives both strong efficiency and robustness, as well as good visual performance. This is particularly useful for the application of super-resolution to UAS surveillance video, where real-time processing is highly desired.
Algorithm for evaluating the effectiveness of a high-rise development project based on current yield
NASA Astrophysics Data System (ADS)
Soboleva, Elena
2018-03-01
The article is aimed at the issues of operational evaluation of development project efficiency in high-rise construction under the current economic conditions in Russia. The author touches the following issues: problems of implementing development projects, the influence of the operational evaluation quality of high-rise construction projects on general efficiency, assessing the influence of the project's external environment on the effectiveness of project activities under crisis conditions and the quality of project management. The article proposes the algorithm and the methodological approach to the quality management of the developer project efficiency based on operational evaluation of the current yield efficiency. The methodology for calculating the current efficiency of a development project for high-rise construction has been updated.
Vibration extraction based on fast NCC algorithm and high-speed camera.
Lei, Xiujun; Jin, Yi; Guo, Jie; Zhu, Chang'an
2015-09-20
In this study, a high-speed camera system is developed to complete the vibration measurement in real time and to overcome the mass introduced by conventional contact measurements. The proposed system consists of a notebook computer and a high-speed camera which can capture the images as many as 1000 frames per second. In order to process the captured images in the computer, the normalized cross-correlation (NCC) template tracking algorithm with subpixel accuracy is introduced. Additionally, a modified local search algorithm based on the NCC is proposed to reduce the computation time and to increase efficiency significantly. The modified algorithm can rapidly accomplish one displacement extraction 10 times faster than the traditional template matching without installing any target panel onto the structures. Two experiments were carried out under laboratory and outdoor conditions to validate the accuracy and efficiency of the system performance in practice. The results demonstrated the high accuracy and efficiency of the camera system in extracting vibrating signals.
Power and Efficiency Optimized in Traveling-Wave Tubes Over a Broad Frequency Bandwidth
NASA Technical Reports Server (NTRS)
Wilson, Jeffrey D.
2001-01-01
A traveling-wave tube (TWT) is an electron beam device that is used to amplify electromagnetic communication waves at radio and microwave frequencies. TWT's are critical components in deep space probes, communication satellites, and high-power radar systems. Power conversion efficiency is of paramount importance for TWT's employed in deep space probes and communication satellites. A previous effort was very successful in increasing efficiency and power at a single frequency (ref. 1). Such an algorithm is sufficient for narrow bandwidth designs, but for optimal designs in applications that require high radiofrequency power over a wide bandwidth, such as high-density communications or high-resolution radar, the variation of the circuit response with respect to frequency must be considered. This work at the NASA Glenn Research Center is the first to develop techniques for optimizing TWT efficiency and output power over a broad frequency bandwidth (ref. 2). The techniques are based on simulated annealing, which has the advantage over conventional optimization techniques in that it enables the best possible solution to be obtained (ref. 3). Two new broadband simulated annealing algorithms were developed that optimize (1) minimum saturated power efficiency over a frequency bandwidth and (2) simultaneous bandwidth and minimum power efficiency over the frequency band with constant input power. The algorithms were incorporated into the NASA coupled-cavity TWT computer model (ref. 4) and used to design optimal phase velocity tapers using the 59- to 64-GHz Hughes 961HA coupled-cavity TWT as a baseline model. In comparison to the baseline design, the computational results of the first broad-band design algorithm show an improvement of 73.9 percent in minimum saturated efficiency (see the top graph). The second broadband design algorithm (see the bottom graph) improves minimum radiofrequency efficiency with constant input power drive by a factor of 2.7 at the high band edge (64 GHz) and increases simultaneous bandwidth by 500 MHz.
A novel hybrid algorithm for the design of the phase diffractive optical elements for beam shaping
NASA Astrophysics Data System (ADS)
Jiang, Wenbo; Wang, Jun; Dong, Xiucheng
2013-02-01
In this paper, a novel hybrid algorithm for the design of a phase diffractive optical elements (PDOE) is proposed. It combines the genetic algorithm (GA) with the transformable scale BFGS (Broyden, Fletcher, Goldfarb, Shanno) algorithm, the penalty function was used in the cost function definition. The novel hybrid algorithm has the global merits of the genetic algorithm as well as the local improvement capabilities of the transformable scale BFGS algorithm. We designed the PDOE using the conventional simulated annealing algorithm and the novel hybrid algorithm. To compare the performance of two algorithms, three indexes of the diffractive efficiency, uniformity error and the signal-to-noise ratio are considered in numerical simulation. The results show that the novel hybrid algorithm has good convergence property and good stability. As an application example, the PDOE was used for the Gaussian beam shaping; high diffractive efficiency, low uniformity error and high signal-to-noise were obtained. The PDOE can be used for high quality beam shaping such as inertial confinement fusion (ICF), excimer laser lithography, fiber coupling laser diode array, laser welding, etc. It shows wide application value.
PS-FW: A Hybrid Algorithm Based on Particle Swarm and Fireworks for Global Optimization
Chen, Shuangqing; Wei, Lixin; Guan, Bing
2018-01-01
Particle swarm optimization (PSO) and fireworks algorithm (FWA) are two recently developed optimization methods which have been applied in various areas due to their simplicity and efficiency. However, when being applied to high-dimensional optimization problems, PSO algorithm may be trapped in the local optima owing to the lack of powerful global exploration capability, and fireworks algorithm is difficult to converge in some cases because of its relatively low local exploitation efficiency for noncore fireworks. In this paper, a hybrid algorithm called PS-FW is presented, in which the modified operators of FWA are embedded into the solving process of PSO. In the iteration process, the abandonment and supplement mechanism is adopted to balance the exploration and exploitation ability of PS-FW, and the modified explosion operator and the novel mutation operator are proposed to speed up the global convergence and to avoid prematurity. To verify the performance of the proposed PS-FW algorithm, 22 high-dimensional benchmark functions have been employed, and it is compared with PSO, FWA, stdPSO, CPSO, CLPSO, FIPS, Frankenstein, and ALWPSO algorithms. Results show that the PS-FW algorithm is an efficient, robust, and fast converging optimization method for solving global optimization problems. PMID:29675036
Multi-fidelity stochastic collocation method for computation of statistical moments
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhu, Xueyu, E-mail: xueyu-zhu@uiowa.edu; Linebarger, Erin M., E-mail: aerinline@sci.utah.edu; Xiu, Dongbin, E-mail: xiu.16@osu.edu
We present an efficient numerical algorithm to approximate the statistical moments of stochastic problems, in the presence of models with different fidelities. The method extends the multi-fidelity approximation method developed in . By combining the efficiency of low-fidelity models and the accuracy of high-fidelity models, our method exhibits fast convergence with a limited number of high-fidelity simulations. We establish an error bound of the method and present several numerical examples to demonstrate the efficiency and applicability of the multi-fidelity algorithm.
NASA Astrophysics Data System (ADS)
Sanchez, Gustavo; Marcon, César; Agostini, Luciano Volcan
2018-01-01
The 3D-high efficiency video coding has introduced tools to obtain higher efficiency in 3-D video coding, and most of them are related to the depth maps coding. Among these tools, the depth modeling mode-1 (DMM-1) focuses on better encoding edges regions of depth maps. The large memory required for storing all wedgelet patterns is one of the bottlenecks in the DMM-1 hardware design of both encoder and decoder since many patterns must be stored. Three algorithms to reduce the DMM-1 memory requirements and a hardware design targeting the most efficient among these algorithms are presented. Experimental results demonstrate that the proposed solutions surpass related works reducing up to 78.8% of the wedgelet memory, without degrading the encoding efficiency. Synthesis results demonstrate that the proposed algorithm reduces almost 75% of the power dissipation when compared to the standard approach.
Efficient Parallel Kernel Solvers for Computational Fluid Dynamics Applications
NASA Technical Reports Server (NTRS)
Sun, Xian-He
1997-01-01
Distributed-memory parallel computers dominate today's parallel computing arena. These machines, such as Intel Paragon, IBM SP2, and Cray Origin2OO, have successfully delivered high performance computing power for solving some of the so-called "grand-challenge" problems. Despite initial success, parallel machines have not been widely accepted in production engineering environments due to the complexity of parallel programming. On a parallel computing system, a task has to be partitioned and distributed appropriately among processors to reduce communication cost and to attain load balance. More importantly, even with careful partitioning and mapping, the performance of an algorithm may still be unsatisfactory, since conventional sequential algorithms may be serial in nature and may not be implemented efficiently on parallel machines. In many cases, new algorithms have to be introduced to increase parallel performance. In order to achieve optimal performance, in addition to partitioning and mapping, a careful performance study should be conducted for a given application to find a good algorithm-machine combination. This process, however, is usually painful and elusive. The goal of this project is to design and develop efficient parallel algorithms for highly accurate Computational Fluid Dynamics (CFD) simulations and other engineering applications. The work plan is 1) developing highly accurate parallel numerical algorithms, 2) conduct preliminary testing to verify the effectiveness and potential of these algorithms, 3) incorporate newly developed algorithms into actual simulation packages. The work plan has well achieved. Two highly accurate, efficient Poisson solvers have been developed and tested based on two different approaches: (1) Adopting a mathematical geometry which has a better capacity to describe the fluid, (2) Using compact scheme to gain high order accuracy in numerical discretization. The previously developed Parallel Diagonal Dominant (PDD) algorithm and Reduced Parallel Diagonal Dominant (RPDD) algorithm have been carefully studied on different parallel platforms for different applications, and a NASA simulation code developed by Man M. Rai and his colleagues has been parallelized and implemented based on data dependency analysis. These achievements are addressed in detail in the paper.
Shi, Junwei; Zhang, Bin; Liu, Fei; Luo, Jianwen; Bai, Jing
2013-09-15
For the ill-posed fluorescent molecular tomography (FMT) inverse problem, the L1 regularization can protect the high-frequency information like edges while effectively reduce the image noise. However, the state-of-the-art L1 regularization-based algorithms for FMT reconstruction are expensive in memory, especially for large-scale problems. An efficient L1 regularization-based reconstruction algorithm based on nonlinear conjugate gradient with restarted strategy is proposed to increase the computational speed with low memory consumption. The reconstruction results from phantom experiments demonstrate that the proposed algorithm can obtain high spatial resolution and high signal-to-noise ratio, as well as high localization accuracy for fluorescence targets.
Using learning automata to determine proper subset size in high-dimensional spaces
NASA Astrophysics Data System (ADS)
Seyyedi, Seyyed Hossein; Minaei-Bidgoli, Behrouz
2017-03-01
In this paper, we offer a new method called FSLA (Finding the best candidate Subset using Learning Automata), which combines the filter and wrapper approaches for feature selection in high-dimensional spaces. Considering the difficulties of dimension reduction in high-dimensional spaces, FSLA's multi-objective functionality is to determine, in an efficient manner, a feature subset that leads to an appropriate tradeoff between the learning algorithm's accuracy and efficiency. First, using an existing weighting function, the feature list is sorted and selected subsets of the list of different sizes are considered. Then, a learning automaton verifies the performance of each subset when it is used as the input space of the learning algorithm and estimates its fitness upon the algorithm's accuracy and the subset size, which determines the algorithm's efficiency. Finally, FSLA introduces the fittest subset as the best choice. We tested FSLA in the framework of text classification. The results confirm its promising performance of attaining the identified goal.
Oyana, Tonny J; Achenie, Luke E K; Heo, Joon
2012-01-01
The objective of this paper is to introduce an efficient algorithm, namely, the mathematically improved learning-self organizing map (MIL-SOM) algorithm, which speeds up the self-organizing map (SOM) training process. In the proposed MIL-SOM algorithm, the weights of Kohonen's SOM are based on the proportional-integral-derivative (PID) controller. Thus, in a typical SOM learning setting, this improvement translates to faster convergence. The basic idea is primarily motivated by the urgent need to develop algorithms with the competence to converge faster and more efficiently than conventional techniques. The MIL-SOM algorithm is tested on four training geographic datasets representing biomedical and disease informatics application domains. Experimental results show that the MIL-SOM algorithm provides a competitive, better updating procedure and performance, good robustness, and it runs faster than Kohonen's SOM.
Oyana, Tonny J.; Achenie, Luke E. K.; Heo, Joon
2012-01-01
The objective of this paper is to introduce an efficient algorithm, namely, the mathematically improved learning-self organizing map (MIL-SOM) algorithm, which speeds up the self-organizing map (SOM) training process. In the proposed MIL-SOM algorithm, the weights of Kohonen's SOM are based on the proportional-integral-derivative (PID) controller. Thus, in a typical SOM learning setting, this improvement translates to faster convergence. The basic idea is primarily motivated by the urgent need to develop algorithms with the competence to converge faster and more efficiently than conventional techniques. The MIL-SOM algorithm is tested on four training geographic datasets representing biomedical and disease informatics application domains. Experimental results show that the MIL-SOM algorithm provides a competitive, better updating procedure and performance, good robustness, and it runs faster than Kohonen's SOM. PMID:22481977
Progress on a Taylor weak statement finite element algorithm for high-speed aerodynamic flows
NASA Technical Reports Server (NTRS)
Baker, A. J.; Freels, J. D.
1989-01-01
A new finite element numerical Computational Fluid Dynamics (CFD) algorithm has matured to the point of efficiently solving two-dimensional high speed real-gas compressible flow problems in generalized coordinates on modern vector computer systems. The algorithm employs a Taylor Weak Statement classical Galerkin formulation, a variably implicit Newton iteration, and a tensor matrix product factorization of the linear algebra Jacobian under a generalized coordinate transformation. Allowing for a general two-dimensional conservation law system, the algorithm has been exercised on the Euler and laminar forms of the Navier-Stokes equations. Real-gas fluid properties are admitted, and numerical results verify solution accuracy, efficiency, and stability over a range of test problem parameters.
Hwang, I-Shyan
2017-01-01
The K-coverage configuration that guarantees coverage of each location by at least K sensors is highly popular and is extensively used to monitor diversified applications in wireless sensor networks. Long network lifetime and high detection quality are the essentials of such K-covered sleep-scheduling algorithms. However, the existing sleep-scheduling algorithms either cause high cost or cannot preserve the detection quality effectively. In this paper, the Pre-Scheduling-based K-coverage Group Scheduling (PSKGS) and Self-Organized K-coverage Scheduling (SKS) algorithms are proposed to settle the problems in the existing sleep-scheduling algorithms. Simulation results show that our pre-scheduled-based KGS approach enhances the detection quality and network lifetime, whereas the self-organized-based SKS algorithm minimizes the computation and communication cost of the nodes and thereby is energy efficient. Besides, SKS outperforms PSKGS in terms of network lifetime and detection quality as it is self-organized. PMID:29257078
Audiovisual focus of attention and its application to Ultra High Definition video compression
NASA Astrophysics Data System (ADS)
Rerabek, Martin; Nemoto, Hiromi; Lee, Jong-Seok; Ebrahimi, Touradj
2014-02-01
Using Focus of Attention (FoA) as a perceptual process in image and video compression belongs to well-known approaches to increase coding efficiency. It has been shown that foveated coding, when compression quality varies across the image according to region of interest, is more efficient than the alternative coding, when all region are compressed in a similar way. However, widespread use of such foveated compression has been prevented due to two main conflicting causes, namely, the complexity and the efficiency of algorithms for FoA detection. One way around these is to use as much information as possible from the scene. Since most video sequences have an associated audio, and moreover, in many cases there is a correlation between the audio and the visual content, audiovisual FoA can improve efficiency of the detection algorithm while remaining of low complexity. This paper discusses a simple yet efficient audiovisual FoA algorithm based on correlation of dynamics between audio and video signal components. Results of audiovisual FoA detection algorithm are subsequently taken into account for foveated coding and compression. This approach is implemented into H.265/HEVC encoder producing a bitstream which is fully compliant to any H.265/HEVC decoder. The influence of audiovisual FoA in the perceived quality of high and ultra-high definition audiovisual sequences is explored and the amount of gain in compression efficiency is analyzed.
Efficient Parallel Algorithm For Direct Numerical Simulation of Turbulent Flows
NASA Technical Reports Server (NTRS)
Moitra, Stuti; Gatski, Thomas B.
1997-01-01
A distributed algorithm for a high-order-accurate finite-difference approach to the direct numerical simulation (DNS) of transition and turbulence in compressible flows is described. This work has two major objectives. The first objective is to demonstrate that parallel and distributed-memory machines can be successfully and efficiently used to solve computationally intensive and input/output intensive algorithms of the DNS class. The second objective is to show that the computational complexity involved in solving the tridiagonal systems inherent in the DNS algorithm can be reduced by algorithm innovations that obviate the need to use a parallelized tridiagonal solver.
Shah, Sohil Atul
2017-01-01
Clustering is a fundamental procedure in the analysis of scientific data. It is used ubiquitously across the sciences. Despite decades of research, existing clustering algorithms have limited effectiveness in high dimensions and often require tuning parameters for different domains and datasets. We present a clustering algorithm that achieves high accuracy across multiple domains and scales efficiently to high dimensions and large datasets. The presented algorithm optimizes a smooth continuous objective, which is based on robust statistics and allows heavily mixed clusters to be untangled. The continuous nature of the objective also allows clustering to be integrated as a module in end-to-end feature learning pipelines. We demonstrate this by extending the algorithm to perform joint clustering and dimensionality reduction by efficiently optimizing a continuous global objective. The presented approach is evaluated on large datasets of faces, hand-written digits, objects, newswire articles, sensor readings from the Space Shuttle, and protein expression levels. Our method achieves high accuracy across all datasets, outperforming the best prior algorithm by a factor of 3 in average rank. PMID:28851838
Efficient Retrieval of Massive Ocean Remote Sensing Images via a Cloud-Based Mean-Shift Algorithm.
Yang, Mengzhao; Song, Wei; Mei, Haibin
2017-07-23
The rapid development of remote sensing (RS) technology has resulted in the proliferation of high-resolution images. There are challenges involved in not only storing large volumes of RS images but also in rapidly retrieving the images for ocean disaster analysis such as for storm surges and typhoon warnings. In this paper, we present an efficient retrieval of massive ocean RS images via a Cloud-based mean-shift algorithm. Distributed construction method via the pyramid model is proposed based on the maximum hierarchical layer algorithm and used to realize efficient storage structure of RS images on the Cloud platform. We achieve high-performance processing of massive RS images in the Hadoop system. Based on the pyramid Hadoop distributed file system (HDFS) storage method, an improved mean-shift algorithm for RS image retrieval is presented by fusion with the canopy algorithm via Hadoop MapReduce programming. The results show that the new method can achieve better performance for data storage than HDFS alone and WebGIS-based HDFS. Speedup and scaleup are very close to linear changes with an increase of RS images, which proves that image retrieval using our method is efficient.
Efficient Retrieval of Massive Ocean Remote Sensing Images via a Cloud-Based Mean-Shift Algorithm
Song, Wei; Mei, Haibin
2017-01-01
The rapid development of remote sensing (RS) technology has resulted in the proliferation of high-resolution images. There are challenges involved in not only storing large volumes of RS images but also in rapidly retrieving the images for ocean disaster analysis such as for storm surges and typhoon warnings. In this paper, we present an efficient retrieval of massive ocean RS images via a Cloud-based mean-shift algorithm. Distributed construction method via the pyramid model is proposed based on the maximum hierarchical layer algorithm and used to realize efficient storage structure of RS images on the Cloud platform. We achieve high-performance processing of massive RS images in the Hadoop system. Based on the pyramid Hadoop distributed file system (HDFS) storage method, an improved mean-shift algorithm for RS image retrieval is presented by fusion with the canopy algorithm via Hadoop MapReduce programming. The results show that the new method can achieve better performance for data storage than HDFS alone and WebGIS-based HDFS. Speedup and scaleup are very close to linear changes with an increase of RS images, which proves that image retrieval using our method is efficient. PMID:28737699
A high-performance spatial database based approach for pathology imaging algorithm evaluation
Wang, Fusheng; Kong, Jun; Gao, Jingjing; Cooper, Lee A.D.; Kurc, Tahsin; Zhou, Zhengwen; Adler, David; Vergara-Niedermayr, Cristobal; Katigbak, Bryan; Brat, Daniel J.; Saltz, Joel H.
2013-01-01
Background: Algorithm evaluation provides a means to characterize variability across image analysis algorithms, validate algorithms by comparison with human annotations, combine results from multiple algorithms for performance improvement, and facilitate algorithm sensitivity studies. The sizes of images and image analysis results in pathology image analysis pose significant challenges in algorithm evaluation. We present an efficient parallel spatial database approach to model, normalize, manage, and query large volumes of analytical image result data. This provides an efficient platform for algorithm evaluation. Our experiments with a set of brain tumor images demonstrate the application, scalability, and effectiveness of the platform. Context: The paper describes an approach and platform for evaluation of pathology image analysis algorithms. The platform facilitates algorithm evaluation through a high-performance database built on the Pathology Analytic Imaging Standards (PAIS) data model. Aims: (1) Develop a framework to support algorithm evaluation by modeling and managing analytical results and human annotations from pathology images; (2) Create a robust data normalization tool for converting, validating, and fixing spatial data from algorithm or human annotations; (3) Develop a set of queries to support data sampling and result comparisons; (4) Achieve high performance computation capacity via a parallel data management infrastructure, parallel data loading and spatial indexing optimizations in this infrastructure. Materials and Methods: We have considered two scenarios for algorithm evaluation: (1) algorithm comparison where multiple result sets from different methods are compared and consolidated; and (2) algorithm validation where algorithm results are compared with human annotations. We have developed a spatial normalization toolkit to validate and normalize spatial boundaries produced by image analysis algorithms or human annotations. The validated data were formatted based on the PAIS data model and loaded into a spatial database. To support efficient data loading, we have implemented a parallel data loading tool that takes advantage of multi-core CPUs to accelerate data injection. The spatial database manages both geometric shapes and image features or classifications, and enables spatial sampling, result comparison, and result aggregation through expressive structured query language (SQL) queries with spatial extensions. To provide scalable and efficient query support, we have employed a shared nothing parallel database architecture, which distributes data homogenously across multiple database partitions to take advantage of parallel computation power and implements spatial indexing to achieve high I/O throughput. Results: Our work proposes a high performance, parallel spatial database platform for algorithm validation and comparison. This platform was evaluated by storing, managing, and comparing analysis results from a set of brain tumor whole slide images. The tools we develop are open source and available to download. Conclusions: Pathology image algorithm validation and comparison are essential to iterative algorithm development and refinement. One critical component is the support for queries involving spatial predicates and comparisons. In our work, we develop an efficient data model and parallel database approach to model, normalize, manage and query large volumes of analytical image result data. Our experiments demonstrate that the data partitioning strategy and the grid-based indexing result in good data distribution across database nodes and reduce I/O overhead in spatial join queries through parallel retrieval of relevant data and quick subsetting of datasets. The set of tools in the framework provide a full pipeline to normalize, load, manage and query analytical results for algorithm evaluation. PMID:23599905
Chen, Nan; Majda, Andrew J
2017-12-05
Solving the Fokker-Planck equation for high-dimensional complex dynamical systems is an important issue. Recently, the authors developed efficient statistically accurate algorithms for solving the Fokker-Planck equations associated with high-dimensional nonlinear turbulent dynamical systems with conditional Gaussian structures, which contain many strong non-Gaussian features such as intermittency and fat-tailed probability density functions (PDFs). The algorithms involve a hybrid strategy with a small number of samples [Formula: see text], where a conditional Gaussian mixture in a high-dimensional subspace via an extremely efficient parametric method is combined with a judicious Gaussian kernel density estimation in the remaining low-dimensional subspace. In this article, two effective strategies are developed and incorporated into these algorithms. The first strategy involves a judicious block decomposition of the conditional covariance matrix such that the evolutions of different blocks have no interactions, which allows an extremely efficient parallel computation due to the small size of each individual block. The second strategy exploits statistical symmetry for a further reduction of [Formula: see text] The resulting algorithms can efficiently solve the Fokker-Planck equation with strongly non-Gaussian PDFs in much higher dimensions even with orders in the millions and thus beat the curse of dimension. The algorithms are applied to a [Formula: see text]-dimensional stochastic coupled FitzHugh-Nagumo model for excitable media. An accurate recovery of both the transient and equilibrium non-Gaussian PDFs requires only [Formula: see text] samples! In addition, the block decomposition facilitates the algorithms to efficiently capture the distinct non-Gaussian features at different locations in a [Formula: see text]-dimensional two-layer inhomogeneous Lorenz 96 model, using only [Formula: see text] samples. Copyright © 2017 the Author(s). Published by PNAS.
NASA Technical Reports Server (NTRS)
Goodrich, John W.
1991-01-01
An algorithm is presented for unsteady two-dimensional incompressible Navier-Stokes calculations. This algorithm is based on the fourth order partial differential equation for incompressible fluid flow which uses the streamfunction as the only dependent variable. The algorithm is second order accurate in both time and space. It uses a multigrid solver at each time step. It is extremely efficient with respect to the use of both CPU time and physical memory. It is extremely robust with respect to Reynolds number.
The Speech multi features fusion perceptual hash algorithm based on tensor decomposition
NASA Astrophysics Data System (ADS)
Huang, Y. B.; Fan, M. H.; Zhang, Q. Y.
2018-03-01
With constant progress in modern speech communication technologies, the speech data is prone to be attacked by the noise or maliciously tampered. In order to make the speech perception hash algorithm has strong robustness and high efficiency, this paper put forward a speech perception hash algorithm based on the tensor decomposition and multi features is proposed. This algorithm analyses the speech perception feature acquires each speech component wavelet packet decomposition. LPCC, LSP and ISP feature of each speech component are extracted to constitute the speech feature tensor. Speech authentication is done by generating the hash values through feature matrix quantification which use mid-value. Experimental results showing that the proposed algorithm is robust for content to maintain operations compared with similar algorithms. It is able to resist the attack of the common background noise. Also, the algorithm is highly efficiency in terms of arithmetic, and is able to meet the real-time requirements of speech communication and complete the speech authentication quickly.
Visual saliency-based fast intracoding algorithm for high efficiency video coding
NASA Astrophysics Data System (ADS)
Zhou, Xin; Shi, Guangming; Zhou, Wei; Duan, Zhemin
2017-01-01
Intraprediction has been significantly improved in high efficiency video coding over H.264/AVC with quad-tree-based coding unit (CU) structure from size 64×64 to 8×8 and more prediction modes. However, these techniques cause a dramatic increase in computational complexity. An intracoding algorithm is proposed that consists of perceptual fast CU size decision algorithm and fast intraprediction mode decision algorithm. First, based on the visual saliency detection, an adaptive and fast CU size decision method is proposed to alleviate intraencoding complexity. Furthermore, a fast intraprediction mode decision algorithm with step halving rough mode decision method and early modes pruning algorithm is presented to selectively check the potential modes and effectively reduce the complexity of computation. Experimental results show that our proposed fast method reduces the computational complexity of the current HM to about 57% in encoding time with only 0.37% increases in BD rate. Meanwhile, the proposed fast algorithm has reasonable peak signal-to-noise ratio losses and nearly the same subjective perceptual quality.
Super-Nyquist shaping and processing technologies for high-spectral-efficiency optical systems
NASA Astrophysics Data System (ADS)
Jia, Zhensheng; Chien, Hung-Chang; Zhang, Junwen; Dong, Ze; Cai, Yi; Yu, Jianjun
2013-12-01
The implementations of super-Nyquist pulse generation, both in a digital field using a digital-to-analog converter (DAC) or an optical filter at transmitter side, are introduced. Three corresponding signal processing algorithms at receiver are presented and compared for high spectral-efficiency (SE) optical systems employing the spectral prefiltering. Those algorithms are designed for the mitigation towards inter-symbol-interference (ISI) and inter-channel-interference (ICI) impairments by the bandwidth constraint, including 1-tap constant modulus algorithm (CMA) and 3-tap maximum likelihood sequence estimation (MLSE), regular CMA and digital filter with 2-tap MLSE, and constant multi-modulus algorithm (CMMA) with 2-tap MLSE. The principles and prefiltering tolerance are given through numerical and experimental results.
Optimization of fiber grating couplers on SOI using advanced search algorithms.
Wohlfeil, Benjamin; Zimmermann, Lars; Petermann, Klaus
2014-06-01
A one-dimensional fiber grating coupler is derived from a waveguide with random etches using implementations of particle swarm and genetic algorithms. The resulting gratings yield a theoretical coupling efficiency of up to 1.1 dB and prompt clear design rules for the layout of highly efficient fiber grating couplers.
Efficient Bit-to-Symbol Likelihood Mappings
NASA Technical Reports Server (NTRS)
Moision, Bruce E.; Nakashima, Michael A.
2010-01-01
This innovation is an efficient algorithm designed to perform bit-to-symbol and symbol-to-bit likelihood mappings that represent a significant portion of the complexity of an error-correction code decoder for high-order constellations. Recent implementation of the algorithm in hardware has yielded an 8- percent reduction in overall area relative to the prior design.
NASA Astrophysics Data System (ADS)
Vinoth, S.; Kanimozhi, G.; Kumar, Harish; Srinadhu, E. S.; Satyanarayana, N.
2017-12-01
In the present investigation, the recently developed, simple, robust, and powerful metaheuristic symbiotic organism search (SOS) algorithm was used for simulation of J- V characteristics and optimizing the internal parameters of the dye-sensitized solar cells (DSSCs) fabricated using electrospun 1-D mesoporous TiO2 nanofibers as photoanode. The efficiency ( η = 5.80 %) of the DSSC made up of TiO2 nanofibers as photoanode is found to be ˜ 21.59% higher compared to the efficiency ( η = 4.77 %) of the DSSC made up of TiO2 nanoparticles as photoanode. The observed high efficiency can be attributed to high dye loading as well as high electron transport in the mesoporous 1-D TiO2 nanofibers. Further, the validity and advantage of SOS algorithm are verified by simulating J- V characteristics of DSSC with Lambert-W function.
An Object-Oriented Collection of Minimum Degree Algorithms: Design, Implementation, and Experiences
NASA Technical Reports Server (NTRS)
Kumfert, Gary; Pothen, Alex
1999-01-01
The multiple minimum degree (MMD) algorithm and its variants have enjoyed 20+ years of research and progress in generating fill-reducing orderings for sparse, symmetric positive definite matrices. Although conceptually simple, efficient implementations of these algorithms are deceptively complex and highly specialized. In this case study, we present an object-oriented library that implements several recent minimum degree-like algorithms. We discuss how object-oriented design forces us to decompose these algorithms in a different manner than earlier codes and demonstrate how this impacts the flexibility and efficiency of our C++ implementation. We compare the performance of our code against other implementations in C or Fortran.
Signal Partitioning Algorithm for Highly Efficient Gaussian Mixture Modeling in Mass Spectrometry
Polanski, Andrzej; Marczyk, Michal; Pietrowska, Monika; Widlak, Piotr; Polanska, Joanna
2015-01-01
Mixture - modeling of mass spectra is an approach with many potential applications including peak detection and quantification, smoothing, de-noising, feature extraction and spectral signal compression. However, existing algorithms do not allow for automated analyses of whole spectra. Therefore, despite highlighting potential advantages of mixture modeling of mass spectra of peptide/protein mixtures and some preliminary results presented in several papers, the mixture modeling approach was so far not developed to the stage enabling systematic comparisons with existing software packages for proteomic mass spectra analyses. In this paper we present an efficient algorithm for Gaussian mixture modeling of proteomic mass spectra of different types (e.g., MALDI-ToF profiling, MALDI-IMS). The main idea is automated partitioning of protein mass spectral signal into fragments. The obtained fragments are separately decomposed into Gaussian mixture models. The parameters of the mixture models of fragments are then aggregated to form the mixture model of the whole spectrum. We compare the elaborated algorithm to existing algorithms for peak detection and we demonstrate improvements of peak detection efficiency obtained by using Gaussian mixture modeling. We also show applications of the elaborated algorithm to real proteomic datasets of low and high resolution. PMID:26230717
Liu, Fei; Zhang, Xi; Jia, Yan
2015-01-01
In this paper, we propose a computer information processing algorithm that can be used for biomedical image processing and disease prediction. A biomedical image is considered a data object in a multi-dimensional space. Each dimension is a feature that can be used for disease diagnosis. We introduce a new concept of the top (k1,k2) outlier. It can be used to detect abnormal data objects in the multi-dimensional space. This technique focuses on uncertain space, where each data object has several possible instances with distinct probabilities. We design an efficient sampling algorithm for the top (k1,k2) outlier in uncertain space. Some improvement techniques are used for acceleration. Experiments show our methods' high accuracy and high efficiency.
Traveling-Wave Tube Efficiency Enhancement
NASA Technical Reports Server (NTRS)
Dayton, James A., Jr.
2011-01-01
Traveling-wave tubes (TWT's) are used to amplify microwave communication signals on virtually all NASA and commercial spacecraft. Because TWT's are a primary power user, increasing their power efficiency is important for reducing spacecraft weight and cost. NASA Glenn Research Center has played a major role in increasing TWT efficiency over the last thirty years. In particular, two types of efficiency optimization algorithms have been developed for coupled-cavity TWT's. The first is the phase-adjusted taper which was used to increase the RF power from 420 to 1000 watts and the RF efficiency from 9.6% to 22.6% for a Ka-band (29.5 GHz) TWT. This was a record efficiency at this frequency level. The second is an optimization algorithm based on simulated annealing. This improved algorithm is more general and can be used to optimize efficiency over a frequency bandwidth and to provide a robust design for very high frequency TWT's in which dimensional tolerance variations are significant.
Efficient iterative image reconstruction algorithm for dedicated breast CT
NASA Astrophysics Data System (ADS)
Antropova, Natalia; Sanchez, Adrian; Reiser, Ingrid S.; Sidky, Emil Y.; Boone, John; Pan, Xiaochuan
2016-03-01
Dedicated breast computed tomography (bCT) is currently being studied as a potential screening method for breast cancer. The X-ray exposure is set low to achieve an average glandular dose comparable to that of mammography, yielding projection data that contains high levels of noise. Iterative image reconstruction (IIR) algorithms may be well-suited for the system since they potentially reduce the effects of noise in the reconstructed images. However, IIR outcomes can be difficult to control since the algorithm parameters do not directly correspond to the image properties. Also, IIR algorithms are computationally demanding and have optimal parameter settings that depend on the size and shape of the breast and positioning of the patient. In this work, we design an efficient IIR algorithm with meaningful parameter specifications and that can be used on a large, diverse sample of bCT cases. The flexibility and efficiency of this method comes from having the final image produced by a linear combination of two separately reconstructed images - one containing gray level information and the other with enhanced high frequency components. Both of the images result from few iterations of separate IIR algorithms. The proposed algorithm depends on two parameters both of which have a well-defined impact on image quality. The algorithm is applied to numerous bCT cases from a dedicated bCT prototype system developed at University of California, Davis.
Hierarchical layered and semantic-based image segmentation using ergodicity map
NASA Astrophysics Data System (ADS)
Yadegar, Jacob; Liu, Xiaoqing
2010-04-01
Image segmentation plays a foundational role in image understanding and computer vision. Although great strides have been made and progress achieved on automatic/semi-automatic image segmentation algorithms, designing a generic, robust, and efficient image segmentation algorithm is still challenging. Human vision is still far superior compared to computer vision, especially in interpreting semantic meanings/objects in images. We present a hierarchical/layered semantic image segmentation algorithm that can automatically and efficiently segment images into hierarchical layered/multi-scaled semantic regions/objects with contextual topological relationships. The proposed algorithm bridges the gap between high-level semantics and low-level visual features/cues (such as color, intensity, edge, etc.) through utilizing a layered/hierarchical ergodicity map, where ergodicity is computed based on a space filling fractal concept and used as a region dissimilarity measurement. The algorithm applies a highly scalable, efficient, and adaptive Peano- Cesaro triangulation/tiling technique to decompose the given image into a set of similar/homogenous regions based on low-level visual cues in a top-down manner. The layered/hierarchical ergodicity map is built through a bottom-up region dissimilarity analysis. The recursive fractal sweep associated with the Peano-Cesaro triangulation provides efficient local multi-resolution refinement to any level of detail. The generated binary decomposition tree also provides efficient neighbor retrieval mechanisms for contextual topological object/region relationship generation. Experiments have been conducted within the maritime image environment where the segmented layered semantic objects include the basic level objects (i.e. sky/land/water) and deeper level objects in the sky/land/water surfaces. Experimental results demonstrate the proposed algorithm has the capability to robustly and efficiently segment images into layered semantic objects/regions with contextual topological relationships.
Fast algorithm of adaptive Fourier series
NASA Astrophysics Data System (ADS)
Gao, You; Ku, Min; Qian, Tao
2018-05-01
Adaptive Fourier decomposition (AFD, precisely 1-D AFD or Core-AFD) was originated for the goal of positive frequency representations of signals. It achieved the goal and at the same time offered fast decompositions of signals. There then arose several types of AFDs. AFD merged with the greedy algorithm idea, and in particular, motivated the so-called pre-orthogonal greedy algorithm (Pre-OGA) that was proven to be the most efficient greedy algorithm. The cost of the advantages of the AFD type decompositions is, however, the high computational complexity due to the involvement of maximal selections of the dictionary parameters. The present paper offers one formulation of the 1-D AFD algorithm by building the FFT algorithm into it. Accordingly, the algorithm complexity is reduced, from the original $\\mathcal{O}(M N^2)$ to $\\mathcal{O}(M N\\log_2 N)$, where $N$ denotes the number of the discretization points on the unit circle and $M$ denotes the number of points in $[0,1)$. This greatly enhances the applicability of AFD. Experiments are carried out to show the high efficiency of the proposed algorithm.
Andrianov, Alexey; Szabo, Aron; Sergeev, Alexander; Kim, Arkady; Chvykov, Vladimir; Kalashnikov, Mikhail
2016-11-14
We developed an improved approach to calculate the Fourier transform of signals with arbitrary large quadratic phase which can be efficiently implemented in numerical simulations utilizing Fast Fourier transform. The proposed algorithm significantly reduces the computational cost of Fourier transform of a highly chirped and stretched pulse by splitting it into two separate transforms of almost transform limited pulses, thereby reducing the required grid size roughly by a factor of the pulse stretching. The application of our improved Fourier transform algorithm in the split-step method for numerical modeling of CPA and OPCPA shows excellent agreement with standard algorithms.
High-efficiency induction motor drives using type-2 fuzzy logic
NASA Astrophysics Data System (ADS)
Khemis, A.; Benlaloui, I.; Drid, S.; Chrifi-Alaoui, L.; Khamari, D.; Menacer, A.
2018-03-01
In this work we propose to develop an algorithm for improving the efficiency of an induction motor using type-2 fuzzy logic. Vector control is used to control this motor due to the high performances of this strategy. The type-2 fuzzy logic regulators are developed to obtain the optimal rotor flux for each torque load by minimizing the copper losses. We have compared the performances of our fuzzy type-2 algorithm with the type-1 fuzzy one proposed in the literature. The proposed algorithm is tested with success on the dSPACE DS1104 system even if there is parameters variance.
High-speed cell recognition algorithm for ultrafast flow cytometer imaging system.
Zhao, Wanyue; Wang, Chao; Chen, Hongwei; Chen, Minghua; Yang, Sigang
2018-04-01
An optical time-stretch flow imaging system enables high-throughput examination of cells/particles with unprecedented high speed and resolution. A significant amount of raw image data is produced. A high-speed cell recognition algorithm is, therefore, highly demanded to analyze large amounts of data efficiently. A high-speed cell recognition algorithm consisting of two-stage cascaded detection and Gaussian mixture model (GMM) classification is proposed. The first stage of detection extracts cell regions. The second stage integrates distance transform and the watershed algorithm to separate clustered cells. Finally, the cells detected are classified by GMM. We compared the performance of our algorithm with support vector machine. Results show that our algorithm increases the running speed by over 150% without sacrificing the recognition accuracy. This algorithm provides a promising solution for high-throughput and automated cell imaging and classification in the ultrafast flow cytometer imaging platform. (2018) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE).
High-speed cell recognition algorithm for ultrafast flow cytometer imaging system
NASA Astrophysics Data System (ADS)
Zhao, Wanyue; Wang, Chao; Chen, Hongwei; Chen, Minghua; Yang, Sigang
2018-04-01
An optical time-stretch flow imaging system enables high-throughput examination of cells/particles with unprecedented high speed and resolution. A significant amount of raw image data is produced. A high-speed cell recognition algorithm is, therefore, highly demanded to analyze large amounts of data efficiently. A high-speed cell recognition algorithm consisting of two-stage cascaded detection and Gaussian mixture model (GMM) classification is proposed. The first stage of detection extracts cell regions. The second stage integrates distance transform and the watershed algorithm to separate clustered cells. Finally, the cells detected are classified by GMM. We compared the performance of our algorithm with support vector machine. Results show that our algorithm increases the running speed by over 150% without sacrificing the recognition accuracy. This algorithm provides a promising solution for high-throughput and automated cell imaging and classification in the ultrafast flow cytometer imaging platform.
NASA Technical Reports Server (NTRS)
Goodrich, John W.
1995-01-01
Two methods for developing high order single step explicit algorithms on symmetric stencils with data on only one time level are presented. Examples are given for the convection and linearized Euler equations with up to the eighth order accuracy in both space and time in one space dimension, and up to the sixth in two space dimensions. The method of characteristics is generalized to nondiagonalizable hyperbolic systems by using exact local polynominal solutions of the system, and the resulting exact propagator methods automatically incorporate the correct multidimensional wave propagation dynamics. Multivariate Taylor or Cauchy-Kowaleskaya expansions are also used to develop algorithms. Both of these methods can be applied to obtain algorithms of arbitrarily high order for hyperbolic systems in multiple space dimensions. Cross derivatives are included in the local approximations used to develop the algorithms in this paper in order to obtain high order accuracy, and improved isotropy and stability. Efficiency in meeting global error bounds is an important criterion for evaluating algorithms, and the higher order algorithms are shown to be up to several orders of magnitude more efficient even though they are more complex. Stable high order boundary conditions for the linearized Euler equations are developed in one space dimension, and demonstrated in two space dimensions.
Guan, Fada; Johns, Jesse M; Vasudevan, Latha; Zhang, Guoqing; Tang, Xiaobin; Poston, John W; Braby, Leslie A
2015-06-01
Coincident counts can be observed in experimental radiation spectroscopy. Accurate quantification of the radiation source requires the detection efficiency of the spectrometer, which is often experimentally determined. However, Monte Carlo analysis can be used to supplement experimental approaches to determine the detection efficiency a priori. The traditional Monte Carlo method overestimates the detection efficiency as a result of omitting coincident counts caused mainly by multiple cascade source particles. In this study, a novel "multi-primary coincident counting" algorithm was developed using the Geant4 Monte Carlo simulation toolkit. A high-purity Germanium detector for ⁶⁰Co gamma-ray spectroscopy problems was accurately modeled to validate the developed algorithm. The simulated pulse height spectrum agreed well qualitatively with the measured spectrum obtained using the high-purity Germanium detector. The developed algorithm can be extended to other applications, with a particular emphasis on challenging radiation fields, such as counting multiple types of coincident radiations released from nuclear fission or used nuclear fuel.
Utilization of high-frequency Rayleigh waves in near-surface geophysics
Xia, J.; Miller, R.D.; Park, C.B.; Ivanov, J.; Tian, G.; Chen, C.
2004-01-01
Shear-wave velocities can be derived from inverting the dispersive phase velocity of the surface. The multichannel analysis of surface waves (MASW) is one technique for inverting high-frequency Rayleigh waves. The process includes acquisition of high-frequency broad-band Rayleigh waves, efficient and accurate algorithms designed to extract Rayleigh-wave dispersion curves from Rayleigh waves, and stable and efficient inversion algorithms to obtain near-surface S-wave velocity profiles. MASW estimates S-wave velocity from multichannel vertical compoent data and consists of data acquisition, dispersion-curve picking, and inversion.
Traffic off-balancing algorithm for energy efficient networks
NASA Astrophysics Data System (ADS)
Kim, Junhyuk; Lee, Chankyun; Rhee, June-Koo Kevin
2011-12-01
Physical layer of high-end network system uses multiple interface arrays. Under the load-balancing perspective, light load can be distributed to multiple interfaces. However, it can cause energy inefficiency in terms of the number of poor utilization interfaces. To tackle this energy inefficiency, traffic off-balancing algorithm for traffic adaptive interface sleep/awake is investigated. As a reference model, 40G/100G Ethernet is investigated. We report that suggested algorithm can achieve energy efficiency while satisfying traffic transmission requirement.
Efficient implementation of the many-body Reactive Bond Order (REBO) potential on GPU
NASA Astrophysics Data System (ADS)
Trędak, Przemysław; Rudnicki, Witold R.; Majewski, Jacek A.
2016-09-01
The second generation Reactive Bond Order (REBO) empirical potential is commonly used to accurately model a wide range hydrocarbon materials. It is also extensible to other atom types and interactions. REBO potential assumes complex multi-body interaction model, that is difficult to represent efficiently in the SIMD or SIMT programming model. Hence, despite its importance, no efficient GPGPU implementation has been developed for this potential. Here we present a detailed description of a highly efficient GPGPU implementation of molecular dynamics algorithm using REBO potential. The presented algorithm takes advantage of rarely used properties of the SIMT architecture of a modern GPU to solve difficult synchronizations issues that arise in computations of multi-body potential. Techniques developed for this problem may be also used to achieve efficient solutions of different problems. The performance of proposed algorithm is assessed using a range of model systems. It is compared to highly optimized CPU implementation (both single core and OpenMP) available in LAMMPS package. These experiments show up to 6x improvement in forces computation time using single processor of the NVIDIA Tesla K80 compared to high end 16-core Intel Xeon processor.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Trędak, Przemysław, E-mail: przemyslaw.tredak@fuw.edu.pl; Rudnicki, Witold R.; Interdisciplinary Centre for Mathematical and Computational Modelling, University of Warsaw, ul. Pawińskiego 5a, 02-106 Warsaw
The second generation Reactive Bond Order (REBO) empirical potential is commonly used to accurately model a wide range hydrocarbon materials. It is also extensible to other atom types and interactions. REBO potential assumes complex multi-body interaction model, that is difficult to represent efficiently in the SIMD or SIMT programming model. Hence, despite its importance, no efficient GPGPU implementation has been developed for this potential. Here we present a detailed description of a highly efficient GPGPU implementation of molecular dynamics algorithm using REBO potential. The presented algorithm takes advantage of rarely used properties of the SIMT architecture of a modern GPUmore » to solve difficult synchronizations issues that arise in computations of multi-body potential. Techniques developed for this problem may be also used to achieve efficient solutions of different problems. The performance of proposed algorithm is assessed using a range of model systems. It is compared to highly optimized CPU implementation (both single core and OpenMP) available in LAMMPS package. These experiments show up to 6x improvement in forces computation time using single processor of the NVIDIA Tesla K80 compared to high end 16-core Intel Xeon processor.« less
Improving Design Efficiency for Large-Scale Heterogeneous Circuits
NASA Astrophysics Data System (ADS)
Gregerson, Anthony
Despite increases in logic density, many Big Data applications must still be partitioned across multiple computing devices in order to meet their strict performance requirements. Among the most demanding of these applications is high-energy physics (HEP), which uses complex computing systems consisting of thousands of FPGAs and ASICs to process the sensor data created by experiments at particles accelerators such as the Large Hadron Collider (LHC). Designing such computing systems is challenging due to the scale of the systems, the exceptionally high-throughput and low-latency performance constraints that necessitate application-specific hardware implementations, the requirement that algorithms are efficiently partitioned across many devices, and the possible need to update the implemented algorithms during the lifetime of the system. In this work, we describe our research to develop flexible architectures for implementing such large-scale circuits on FPGAs. In particular, this work is motivated by (but not limited in scope to) high-energy physics algorithms for the Compact Muon Solenoid (CMS) experiment at the LHC. To make efficient use of logic resources in multi-FPGA systems, we introduce Multi-Personality Partitioning, a novel form of the graph partitioning problem, and present partitioning algorithms that can significantly improve resource utilization on heterogeneous devices while also reducing inter-chip connections. To reduce the high communication costs of Big Data applications, we also introduce Information-Aware Partitioning, a partitioning method that analyzes the data content of application-specific circuits, characterizes their entropy, and selects circuit partitions that enable efficient compression of data between chips. We employ our information-aware partitioning method to improve the performance of the hardware validation platform for evaluating new algorithms for the CMS experiment. Together, these research efforts help to improve the efficiency and decrease the cost of the developing large-scale, heterogeneous circuits needed to enable large-scale application in high-energy physics and other important areas.
GROMACS 4: Algorithms for Highly Efficient, Load-Balanced, and Scalable Molecular Simulation.
Hess, Berk; Kutzner, Carsten; van der Spoel, David; Lindahl, Erik
2008-03-01
Molecular simulation is an extremely useful, but computationally very expensive tool for studies of chemical and biomolecular systems. Here, we present a new implementation of our molecular simulation toolkit GROMACS which now both achieves extremely high performance on single processors from algorithmic optimizations and hand-coded routines and simultaneously scales very well on parallel machines. The code encompasses a minimal-communication domain decomposition algorithm, full dynamic load balancing, a state-of-the-art parallel constraint solver, and efficient virtual site algorithms that allow removal of hydrogen atom degrees of freedom to enable integration time steps up to 5 fs for atomistic simulations also in parallel. To improve the scaling properties of the common particle mesh Ewald electrostatics algorithms, we have in addition used a Multiple-Program, Multiple-Data approach, with separate node domains responsible for direct and reciprocal space interactions. Not only does this combination of algorithms enable extremely long simulations of large systems but also it provides that simulation performance on quite modest numbers of standard cluster nodes.
Advanced time integration algorithms for dislocation dynamics simulations of work hardening
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sills, Ryan B.; Aghaei, Amin; Cai, Wei
Efficient time integration is a necessity for dislocation dynamics simulations of work hardening to achieve experimentally relevant strains. In this work, an efficient time integration scheme using a high order explicit method with time step subcycling and a newly-developed collision detection algorithm are evaluated. First, time integrator performance is examined for an annihilating Frank–Read source, showing the effects of dislocation line collision. The integrator with subcycling is found to significantly out-perform other integration schemes. The performance of the time integration and collision detection algorithms is then tested in a work hardening simulation. The new algorithms show a 100-fold speed-up relativemore » to traditional schemes. As a result, subcycling is shown to improve efficiency significantly while maintaining an accurate solution, and the new collision algorithm allows an arbitrarily large time step size without missing collisions.« less
Advanced time integration algorithms for dislocation dynamics simulations of work hardening
Sills, Ryan B.; Aghaei, Amin; Cai, Wei
2016-04-25
Efficient time integration is a necessity for dislocation dynamics simulations of work hardening to achieve experimentally relevant strains. In this work, an efficient time integration scheme using a high order explicit method with time step subcycling and a newly-developed collision detection algorithm are evaluated. First, time integrator performance is examined for an annihilating Frank–Read source, showing the effects of dislocation line collision. The integrator with subcycling is found to significantly out-perform other integration schemes. The performance of the time integration and collision detection algorithms is then tested in a work hardening simulation. The new algorithms show a 100-fold speed-up relativemore » to traditional schemes. As a result, subcycling is shown to improve efficiency significantly while maintaining an accurate solution, and the new collision algorithm allows an arbitrarily large time step size without missing collisions.« less
Yu, Yi; Wu, Yonggang; Hu, Binqi; Liu, Xinglong
2018-01-01
The dispatching of hydro-thermal system is a nonlinear programming problem with multiple constraints and high dimensions and the solution techniques of the model have been a hotspot in research. Based on the advantage of that the artificial bee colony algorithm (ABC) can efficiently solve the high-dimensional problem, an improved artificial bee colony algorithm has been proposed to solve DHTS problem in this paper. The improvements of the proposed algorithm include two aspects. On one hand, local search can be guided in efficiency by the information of the global optimal solution and its gradient in each generation. The global optimal solution improves the search efficiency of the algorithm but loses diversity, while the gradient can weaken the loss of diversity caused by the global optimal solution. On the other hand, inspired by genetic algorithm, the nectar resource which has not been updated in limit generation is transformed to a new one by using selection, crossover and mutation, which can ensure individual diversity and make full use of prior information for improving the global search ability of the algorithm. The two improvements of ABC algorithm are proved to be effective via a classical numeral example at last. Among which the genetic operator for the promotion of the ABC algorithm's performance is significant. The results are also compared with those of other state-of-the-art algorithms, the enhanced ABC algorithm has general advantages in minimum cost, average cost and maximum cost which shows its usability and effectiveness. The achievements in this paper provide a new method for solving the DHTS problems, and also offer a novel reference for the improvement of mechanism and the application of algorithms.
Tang, Wenming; Liu, Guixiong; Li, Yuzhong; Tan, Daji
2017-01-01
High data transmission efficiency is a key requirement for an ultrasonic phased array with multi-group ultrasonic sensors. Here, a novel FIFOs scheduling algorithm was proposed and the data transmission efficiency with hardware technology was improved. This algorithm includes FIFOs as caches for the ultrasonic scanning data obtained from the sensors with the output data in a bandwidth-sharing way, on the basis of which an optimal length ratio of all the FIFOs is achieved, allowing the reading operations to be switched among all the FIFOs without time slot waiting. Therefore, this algorithm enhances the utilization ratio of the reading bandwidth resources so as to obtain higher efficiency than the traditional scheduling algorithms. The reliability and validity of the algorithm are substantiated after its implementation in the field programmable gate array (FPGA) technology, and the bandwidth utilization ratio and the real-time performance of the ultrasonic phased array are enhanced. PMID:29035345
Geometry correction Algorithm for UAV Remote Sensing Image Based on Improved Neural Network
NASA Astrophysics Data System (ADS)
Liu, Ruian; Liu, Nan; Zeng, Beibei; Chen, Tingting; Yin, Ninghao
2018-03-01
Aiming at the disadvantage of current geometry correction algorithm for UAV remote sensing image, a new algorithm is proposed. Adaptive genetic algorithm (AGA) and RBF neural network are introduced into this algorithm. And combined with the geometry correction principle for UAV remote sensing image, the algorithm and solving steps of AGA-RBF are presented in order to realize geometry correction for UAV remote sensing. The correction accuracy and operational efficiency is improved through optimizing the structure and connection weight of RBF neural network separately with AGA and LMS algorithm. Finally, experiments show that AGA-RBF algorithm has the advantages of high correction accuracy, high running rate and strong generalization ability.
A fast and automatic fusion algorithm for unregistered multi-exposure image sequence
NASA Astrophysics Data System (ADS)
Liu, Yan; Yu, Feihong
2014-09-01
Human visual system (HVS) can visualize all the brightness levels of the scene through visual adaptation. However, the dynamic range of most commercial digital cameras and display devices are smaller than the dynamic range of human eye. This implies low dynamic range (LDR) images captured by normal digital camera may lose image details. We propose an efficient approach to high dynamic (HDR) image fusion that copes with image displacement and image blur degradation in a computationally efficient manner, which is suitable for implementation on mobile devices. The various image registration algorithms proposed in the previous literatures are unable to meet the efficiency and performance requirements in the application of mobile devices. In this paper, we selected Oriented Brief (ORB) detector to extract local image structures. The descriptor selected in multi-exposure image fusion algorithm has to be fast and robust to illumination variations and geometric deformations. ORB descriptor is the best candidate in our algorithm. Further, we perform an improved RANdom Sample Consensus (RANSAC) algorithm to reject incorrect matches. For the fusion of images, a new approach based on Stationary Wavelet Transform (SWT) is used. The experimental results demonstrate that the proposed algorithm generates high quality images at low computational cost. Comparisons with a number of other feature matching methods show that our method gets better performance.
Efficient statistically accurate algorithms for the Fokker-Planck equation in large dimensions
NASA Astrophysics Data System (ADS)
Chen, Nan; Majda, Andrew J.
2018-02-01
Solving the Fokker-Planck equation for high-dimensional complex turbulent dynamical systems is an important and practical issue. However, most traditional methods suffer from the curse of dimensionality and have difficulties in capturing the fat tailed highly intermittent probability density functions (PDFs) of complex systems in turbulence, neuroscience and excitable media. In this article, efficient statistically accurate algorithms are developed for solving both the transient and the equilibrium solutions of Fokker-Planck equations associated with high-dimensional nonlinear turbulent dynamical systems with conditional Gaussian structures. The algorithms involve a hybrid strategy that requires only a small number of ensembles. Here, a conditional Gaussian mixture in a high-dimensional subspace via an extremely efficient parametric method is combined with a judicious non-parametric Gaussian kernel density estimation in the remaining low-dimensional subspace. Particularly, the parametric method provides closed analytical formulae for determining the conditional Gaussian distributions in the high-dimensional subspace and is therefore computationally efficient and accurate. The full non-Gaussian PDF of the system is then given by a Gaussian mixture. Different from traditional particle methods, each conditional Gaussian distribution here covers a significant portion of the high-dimensional PDF. Therefore a small number of ensembles is sufficient to recover the full PDF, which overcomes the curse of dimensionality. Notably, the mixture distribution has significant skill in capturing the transient behavior with fat tails of the high-dimensional non-Gaussian PDFs, and this facilitates the algorithms in accurately describing the intermittency and extreme events in complex turbulent systems. It is shown in a stringent set of test problems that the method only requires an order of O (100) ensembles to successfully recover the highly non-Gaussian transient PDFs in up to 6 dimensions with only small errors.
NASA Astrophysics Data System (ADS)
Abdellah, Skoudarli; Mokhtar, Nibouche; Amina, Serir
2015-11-01
The H.264/AVC video coding standard is used in a wide range of applications from video conferencing to high-definition television according to its high compression efficiency. This efficiency is mainly acquired from the newly allowed prediction schemes including variable block modes. However, these schemes require a high complexity to select the optimal mode. Consequently, complexity reduction in the H.264/AVC encoder has recently become a very challenging task in the video compression domain, especially when implementing the encoder in real-time applications. Fast mode decision algorithms play an important role in reducing the overall complexity of the encoder. In this paper, we propose an adaptive fast intermode algorithm based on motion activity, temporal stationarity, and spatial homogeneity. This algorithm predicts the motion activity of the current macroblock from its neighboring blocks and identifies temporal stationary regions and spatially homogeneous regions using adaptive threshold values based on content video features. Extensive experimental work has been done in high profile, and results show that the proposed source-coding algorithm effectively reduces the computational complexity by 53.18% on average compared with the reference software encoder, while maintaining the high-coding efficiency of H.264/AVC by incurring only 0.097 dB in total peak signal-to-noise ratio and 0.228% increment on the total bit rate.
Yu, Yi; Hu, Binqi; Liu, Xinglong
2018-01-01
The dispatching of hydro-thermal system is a nonlinear programming problem with multiple constraints and high dimensions and the solution techniques of the model have been a hotspot in research. Based on the advantage of that the artificial bee colony algorithm (ABC) can efficiently solve the high-dimensional problem, an improved artificial bee colony algorithm has been proposed to solve DHTS problem in this paper. The improvements of the proposed algorithm include two aspects. On one hand, local search can be guided in efficiency by the information of the global optimal solution and its gradient in each generation. The global optimal solution improves the search efficiency of the algorithm but loses diversity, while the gradient can weaken the loss of diversity caused by the global optimal solution. On the other hand, inspired by genetic algorithm, the nectar resource which has not been updated in limit generation is transformed to a new one by using selection, crossover and mutation, which can ensure individual diversity and make full use of prior information for improving the global search ability of the algorithm. The two improvements of ABC algorithm are proved to be effective via a classical numeral example at last. Among which the genetic operator for the promotion of the ABC algorithm’s performance is significant. The results are also compared with those of other state-of-the-art algorithms, the enhanced ABC algorithm has general advantages in minimum cost, average cost and maximum cost which shows its usability and effectiveness. The achievements in this paper provide a new method for solving the DHTS problems, and also offer a novel reference for the improvement of mechanism and the application of algorithms. PMID:29324743
Trajectory Segmentation Map-Matching Approach for Large-Scale, High-Resolution GPS Data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhu, Lei; Holden, Jacob R.; Gonder, Jeffrey D.
With the development of smartphones and portable GPS devices, large-scale, high-resolution GPS data can be collected. Map matching is a critical step in studying vehicle driving activity and recognizing network traffic conditions from the data. A new trajectory segmentation map-matching algorithm is proposed to deal accurately and efficiently with large-scale, high-resolution GPS trajectory data. The new algorithm separated the GPS trajectory into segments. It found the shortest path for each segment in a scientific manner and ultimately generated a best-matched path for the entire trajectory. The similarity of a trajectory segment and its matched path is described by a similaritymore » score system based on the longest common subsequence. The numerical experiment indicated that the proposed map-matching algorithm was very promising in relation to accuracy and computational efficiency. Large-scale data set applications verified that the proposed method is robust and capable of dealing with real-world, large-scale GPS data in a computationally efficient and accurate manner.« less
Trajectory Segmentation Map-Matching Approach for Large-Scale, High-Resolution GPS Data
Zhu, Lei; Holden, Jacob R.; Gonder, Jeffrey D.
2017-01-01
With the development of smartphones and portable GPS devices, large-scale, high-resolution GPS data can be collected. Map matching is a critical step in studying vehicle driving activity and recognizing network traffic conditions from the data. A new trajectory segmentation map-matching algorithm is proposed to deal accurately and efficiently with large-scale, high-resolution GPS trajectory data. The new algorithm separated the GPS trajectory into segments. It found the shortest path for each segment in a scientific manner and ultimately generated a best-matched path for the entire trajectory. The similarity of a trajectory segment and its matched path is described by a similaritymore » score system based on the longest common subsequence. The numerical experiment indicated that the proposed map-matching algorithm was very promising in relation to accuracy and computational efficiency. Large-scale data set applications verified that the proposed method is robust and capable of dealing with real-world, large-scale GPS data in a computationally efficient and accurate manner.« less
A high throughput architecture for a low complexity soft-output demapping algorithm
NASA Astrophysics Data System (ADS)
Ali, I.; Wasenmüller, U.; Wehn, N.
2015-11-01
Iterative channel decoders such as Turbo-Code and LDPC decoders show exceptional performance and therefore they are a part of many wireless communication receivers nowadays. These decoders require a soft input, i.e., the logarithmic likelihood ratio (LLR) of the received bits with a typical quantization of 4 to 6 bits. For computing the LLR values from a received complex symbol, a soft demapper is employed in the receiver. The implementation cost of traditional soft-output demapping methods is relatively large in high order modulation systems, and therefore low complexity demapping algorithms are indispensable in low power receivers. In the presence of multiple wireless communication standards where each standard defines multiple modulation schemes, there is a need to have an efficient demapper architecture covering all the flexibility requirements of these standards. Another challenge associated with hardware implementation of the demapper is to achieve a very high throughput in double iterative systems, for instance, MIMO and Code-Aided Synchronization. In this paper, we present a comprehensive communication and hardware performance evaluation of low complexity soft-output demapping algorithms to select the best algorithm for implementation. The main goal of this work is to design a high throughput, flexible, and area efficient architecture. We describe architectures to execute the investigated algorithms. We implement these architectures on a FPGA device to evaluate their hardware performance. The work has resulted in a hardware architecture based on the figured out best low complexity algorithm delivering a high throughput of 166 Msymbols/second for Gray mapped 16-QAM modulation on Virtex-5. This efficient architecture occupies only 127 slice registers, 248 slice LUTs and 2 DSP48Es.
Highly Efficient Compression Algorithms for Multichannel EEG.
Shaw, Laxmi; Rahman, Daleef; Routray, Aurobinda
2018-05-01
The difficulty associated with processing and understanding the high dimensionality of electroencephalogram (EEG) data requires developing efficient and robust compression algorithms. In this paper, different lossless compression techniques of single and multichannel EEG data, including Huffman coding, arithmetic coding, Markov predictor, linear predictor, context-based error modeling, multivariate autoregression (MVAR), and a low complexity bivariate model have been examined and their performances have been compared. Furthermore, a high compression algorithm named general MVAR and a modified context-based error modeling for multichannel EEG have been proposed. The resulting compression algorithm produces a higher relative compression ratio of 70.64% on average compared with the existing methods, and in some cases, it goes up to 83.06%. The proposed methods are designed to compress a large amount of multichannel EEG data efficiently so that the data storage and transmission bandwidth can be effectively used. These methods have been validated using several experimental multichannel EEG recordings of different subjects and publicly available standard databases. The satisfactory parametric measures of these methods, namely percent-root-mean square distortion, peak signal-to-noise ratio, root-mean-square error, and cross correlation, show their superiority over the state-of-the-art compression methods.
Interaction sorting method for molecular dynamics on multi-core SIMD CPU architecture.
Matvienko, Sergey; Alemasov, Nikolay; Fomin, Eduard
2015-02-01
Molecular dynamics (MD) is widely used in computational biology for studying binding mechanisms of molecules, molecular transport, conformational transitions, protein folding, etc. The method is computationally expensive; thus, the demand for the development of novel, much more efficient algorithms is still high. Therefore, the new algorithm designed in 2007 and called interaction sorting (IS) clearly attracted interest, as it outperformed the most efficient MD algorithms. In this work, a new IS modification is proposed which allows the algorithm to utilize SIMD processor instructions. This paper shows that the improvement provides an additional gain in performance, 9% to 45% in comparison to the original IS method.
NASA Astrophysics Data System (ADS)
Wu, Leyuan
2018-01-01
We present a brief review of gravity forward algorithms in Cartesian coordinate system, including both space-domain and Fourier-domain approaches, after which we introduce a truly general and efficient algorithm, namely the convolution-type Gauss fast Fourier transform (Conv-Gauss-FFT) algorithm, for 2D and 3D modeling of gravity potential and its derivatives due to sources with arbitrary geometry and arbitrary density distribution which are defined either by discrete or by continuous functions. The Conv-Gauss-FFT algorithm is based on the combined use of a hybrid rectangle-Gaussian grid and the fast Fourier transform (FFT) algorithm. Since the gravity forward problem in Cartesian coordinate system can be expressed as continuous convolution-type integrals, we first approximate the continuous convolution by a weighted sum of a series of shifted discrete convolutions, and then each shifted discrete convolution, which is essentially a Toeplitz system, is calculated efficiently and accurately by combining circulant embedding with the FFT algorithm. Synthetic and real model tests show that the Conv-Gauss-FFT algorithm can obtain high-precision forward results very efficiently for almost any practical model, and it works especially well for complex 3D models when gravity fields on large 3D regular grids are needed.
Efficient irregular wavefront propagation algorithms on Intel® Xeon Phi™
Gomes, Jeremias M.; Teodoro, George; de Melo, Alba; Kong, Jun; Kurc, Tahsin; Saltz, Joel H.
2016-01-01
We investigate the execution of the Irregular Wavefront Propagation Pattern (IWPP), a fundamental computing structure used in several image analysis operations, on the Intel® Xeon Phi™ co-processor. An efficient implementation of IWPP on the Xeon Phi is a challenging problem because of IWPP’s irregularity and the use of atomic instructions in the original IWPP algorithm to resolve race conditions. On the Xeon Phi, the use of SIMD and vectorization instructions is critical to attain high performance. However, SIMD atomic instructions are not supported. Therefore, we propose a new IWPP algorithm that can take advantage of the supported SIMD instruction set. We also evaluate an alternate storage container (priority queue) to track active elements in the wavefront in an effort to improve the parallel algorithm efficiency. The new IWPP algorithm is evaluated with Morphological Reconstruction and Imfill operations as use cases. Our results show performance improvements of up to 5.63× on top of the original IWPP due to vectorization. Moreover, the new IWPP achieves speedups of 45.7× and 1.62×, respectively, as compared to efficient CPU and GPU implementations. PMID:27298591
Efficient irregular wavefront propagation algorithms on Intel® Xeon Phi™.
Gomes, Jeremias M; Teodoro, George; de Melo, Alba; Kong, Jun; Kurc, Tahsin; Saltz, Joel H
2015-10-01
We investigate the execution of the Irregular Wavefront Propagation Pattern (IWPP), a fundamental computing structure used in several image analysis operations, on the Intel ® Xeon Phi ™ co-processor. An efficient implementation of IWPP on the Xeon Phi is a challenging problem because of IWPP's irregularity and the use of atomic instructions in the original IWPP algorithm to resolve race conditions. On the Xeon Phi, the use of SIMD and vectorization instructions is critical to attain high performance. However, SIMD atomic instructions are not supported. Therefore, we propose a new IWPP algorithm that can take advantage of the supported SIMD instruction set. We also evaluate an alternate storage container (priority queue) to track active elements in the wavefront in an effort to improve the parallel algorithm efficiency. The new IWPP algorithm is evaluated with Morphological Reconstruction and Imfill operations as use cases. Our results show performance improvements of up to 5.63 × on top of the original IWPP due to vectorization. Moreover, the new IWPP achieves speedups of 45.7 × and 1.62 × , respectively, as compared to efficient CPU and GPU implementations.
High-Performance Computing for the Electromagnetic Modeling and Simulation of Interconnects
NASA Technical Reports Server (NTRS)
Schutt-Aine, Jose E.
1996-01-01
The electromagnetic modeling of packages and interconnects plays a very important role in the design of high-speed digital circuits, and is most efficiently performed by using computer-aided design algorithms. In recent years, packaging has become a critical area in the design of high-speed communication systems and fast computers, and the importance of the software support for their development has increased accordingly. Throughout this project, our efforts have focused on the development of modeling and simulation techniques and algorithms that permit the fast computation of the electrical parameters of interconnects and the efficient simulation of their electrical performance.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pastore, Giovanni; Rabiti, Cristian; Pizzocri, Davide
PolyPole is a numerical algorithm for the calculation of intra-granular fission gas release. In particular, the algorithm solves the gas diffusion problem in a fuel grain in time-varying conditions. The program has been extensively tested. PolyPole combines a high accuracy with a high computational efficiency and is ideally suited for application in fuel performance codes.
A finite element algorithm for high-lying eigenvalues with Neumann and Dirichlet boundary conditions
NASA Astrophysics Data System (ADS)
Báez, G.; Méndez-Sánchez, R. A.; Leyvraz, F.; Seligman, T. H.
2014-01-01
We present a finite element algorithm that computes eigenvalues and eigenfunctions of the Laplace operator for two-dimensional problems with homogeneous Neumann or Dirichlet boundary conditions, or combinations of either for different parts of the boundary. We use an inverse power plus Gauss-Seidel algorithm to solve the generalized eigenvalue problem. For Neumann boundary conditions the method is much more efficient than the equivalent finite difference algorithm. We checked the algorithm by comparing the cumulative level density of the spectrum obtained numerically with the theoretical prediction given by the Weyl formula. We found a systematic deviation due to the discretization, not to the algorithm itself.
Algorithm-Based Fault Tolerance Integrated with Replication
NASA Technical Reports Server (NTRS)
Some, Raphael; Rennels, David
2008-01-01
In a proposed approach to programming and utilization of commercial off-the-shelf computing equipment, a combination of algorithm-based fault tolerance (ABFT) and replication would be utilized to obtain high degrees of fault tolerance without incurring excessive costs. The basic idea of the proposed approach is to integrate ABFT with replication such that the algorithmic portions of computations would be protected by ABFT, and the logical portions by replication. ABFT is an extremely efficient, inexpensive, high-coverage technique for detecting and mitigating faults in computer systems used for algorithmic computations, but does not protect against errors in logical operations surrounding algorithms.
NASA Astrophysics Data System (ADS)
Shaat, Musbah; Bader, Faouzi
2010-12-01
Cognitive Radio (CR) systems have been proposed to increase the spectrum utilization by opportunistically access the unused spectrum. Multicarrier communication systems are promising candidates for CR systems. Due to its high spectral efficiency, filter bank multicarrier (FBMC) can be considered as an alternative to conventional orthogonal frequency division multiplexing (OFDM) for transmission over the CR networks. This paper addresses the problem of resource allocation in multicarrier-based CR networks. The objective is to maximize the downlink capacity of the network under both total power and interference introduced to the primary users (PUs) constraints. The optimal solution has high computational complexity which makes it unsuitable for practical applications and hence a low complexity suboptimal solution is proposed. The proposed algorithm utilizes the spectrum holes in PUs bands as well as active PU bands. The performance of the proposed algorithm is investigated for OFDM and FBMC based CR systems. Simulation results illustrate that the proposed resource allocation algorithm with low computational complexity achieves near optimal performance and proves the efficiency of using FBMC in CR context.
An O(Nm(sup 2)) Plane Solver for the Compressible Navier-Stokes Equations
NASA Technical Reports Server (NTRS)
Thomas, J. L.; Bonhaus, D. L.; Anderson, W. K.; Rumsey, C. L.; Biedron, R. T.
1999-01-01
A hierarchical multigrid algorithm for efficient steady solutions to the two-dimensional compressible Navier-Stokes equations is developed and demonstrated. The algorithm applies multigrid in two ways: a Full Approximation Scheme (FAS) for a nonlinear residual equation and a Correction Scheme (CS) for a linearized defect correction implicit equation. Multigrid analyses which include the effect of boundary conditions in one direction are used to estimate the convergence rate of the algorithm for a model convection equation. Three alternating-line- implicit algorithms are compared in terms of efficiency. The analyses indicate that full multigrid efficiency is not attained in the general case; the number of cycles to attain convergence is dependent on the mesh density for high-frequency cross-stream variations. However, the dependence is reasonably small and fast convergence is eventually attained for any given frequency with either the FAS or the CS scheme alone. The paper summarizes numerical computations for which convergence has been attained to within truncation error in a few multigrid cycles for both inviscid and viscous ow simulations on highly stretched meshes.
SPHINX--an algorithm for taxonomic binning of metagenomic sequences.
Mohammed, Monzoorul Haque; Ghosh, Tarini Shankar; Singh, Nitin Kumar; Mande, Sharmila S
2011-01-01
Compared with composition-based binning algorithms, the binning accuracy and specificity of alignment-based binning algorithms is significantly higher. However, being alignment-based, the latter class of algorithms require enormous amount of time and computing resources for binning huge metagenomic datasets. The motivation was to develop a binning approach that can analyze metagenomic datasets as rapidly as composition-based approaches, but nevertheless has the accuracy and specificity of alignment-based algorithms. This article describes a hybrid binning approach (SPHINX) that achieves high binning efficiency by utilizing the principles of both 'composition'- and 'alignment'-based binning algorithms. Validation results with simulated sequence datasets indicate that SPHINX is able to analyze metagenomic sequences as rapidly as composition-based algorithms. Furthermore, the binning efficiency (in terms of accuracy and specificity of assignments) of SPHINX is observed to be comparable with results obtained using alignment-based algorithms. A web server for the SPHINX algorithm is available at http://metagenomics.atc.tcs.com/SPHINX/.
NASA Astrophysics Data System (ADS)
Zhileykin, M. M.; Kotiev, G. O.; Nagatsev, M. V.
2018-02-01
In order to meet the growing mobility requirements for the wheeled vehicles on all types of terrain the engineers have to develop a large number of specialized control algorithms for the multi-axle wheeled vehicle (MWV) suspension improving such qualities as ride comfort, handling and stability. The authors have developed an adaptive algorithm of the dynamic damping of the MVW body oscillations. The algorithm provides high ride comfort and high mobility of the vehicle. The article discloses a method for synthesis of an adaptive dynamic continuous algorithm of the MVW body oscillation damping and provides simulation results proving high efficiency of the developed control algorithm.
Automated sequence-specific protein NMR assignment using the memetic algorithm MATCH.
Volk, Jochen; Herrmann, Torsten; Wüthrich, Kurt
2008-07-01
MATCH (Memetic Algorithm and Combinatorial Optimization Heuristics) is a new memetic algorithm for automated sequence-specific polypeptide backbone NMR assignment of proteins. MATCH employs local optimization for tracing partial sequence-specific assignments within a global, population-based search environment, where the simultaneous application of local and global optimization heuristics guarantees high efficiency and robustness. MATCH thus makes combined use of the two predominant concepts in use for automated NMR assignment of proteins. Dynamic transition and inherent mutation are new techniques that enable automatic adaptation to variable quality of the experimental input data. The concept of dynamic transition is incorporated in all major building blocks of the algorithm, where it enables switching between local and global optimization heuristics at any time during the assignment process. Inherent mutation restricts the intrinsically required randomness of the evolutionary algorithm to those regions of the conformation space that are compatible with the experimental input data. Using intact and artificially deteriorated APSY-NMR input data of proteins, MATCH performed sequence-specific resonance assignment with high efficiency and robustness.
Robust Optimization Design Algorithm for High-Frequency TWTs
NASA Technical Reports Server (NTRS)
Wilson, Jeffrey D.; Chevalier, Christine T.
2010-01-01
Traveling-wave tubes (TWTs), such as the Ka-band (26-GHz) model recently developed for the Lunar Reconnaissance Orbiter, are essential as communication amplifiers in spacecraft for virtually all near- and deep-space missions. This innovation is a computational design algorithm that, for the first time, optimizes the efficiency and output power of a TWT while taking into account the effects of dimensional tolerance variations. Because they are primary power consumers and power generation is very expensive in space, much effort has been exerted over the last 30 years to increase the power efficiency of TWTs. However, at frequencies higher than about 60 GHz, efficiencies of TWTs are still quite low. A major reason is that at higher frequencies, dimensional tolerance variations from conventional micromachining techniques become relatively large with respect to the circuit dimensions. When this is the case, conventional design- optimization procedures, which ignore dimensional variations, provide inaccurate designs for which the actual amplifier performance substantially under-performs that of the design. Thus, this new, robust TWT optimization design algorithm was created to take account of and ameliorate the deleterious effects of dimensional variations and to increase efficiency, power, and yield of high-frequency TWTs. This design algorithm can help extend the use of TWTs into the terahertz frequency regime of 300-3000 GHz. Currently, these frequencies are under-utilized because of the lack of efficient amplifiers, thus this regime is known as the "terahertz gap." The development of an efficient terahertz TWT amplifier could enable breakthrough applications in space science molecular spectroscopy, remote sensing, nondestructive testing, high-resolution "through-the-wall" imaging, biomedical imaging, and detection of explosives and toxic biochemical agents.
Reconstructing householder vectors from Tall-Skinny QR
Ballard, Grey Malone; Demmel, James; Grigori, Laura; ...
2015-08-05
The Tall-Skinny QR (TSQR) algorithm is more communication efficient than the standard Householder algorithm for QR decomposition of matrices with many more rows than columns. However, TSQR produces a different representation of the orthogonal factor and therefore requires more software development to support the new representation. Further, implicitly applying the orthogonal factor to the trailing matrix in the context of factoring a square matrix is more complicated and costly than with the Householder representation. We show how to perform TSQR and then reconstruct the Householder vector representation with the same asymptotic communication efficiency and little extra computational cost. We demonstratemore » the high performance and numerical stability of this algorithm both theoretically and empirically. The new Householder reconstruction algorithm allows us to design more efficient parallel QR algorithms, with significantly lower latency cost compared to Householder QR and lower bandwidth and latency costs compared with Communication-Avoiding QR (CAQR) algorithm. Experiments on supercomputers demonstrate the benefits of the communication cost improvements: in particular, our experiments show substantial improvements over tuned library implementations for tall-and-skinny matrices. Furthermore, we also provide algorithmic improvements to the Householder QR and CAQR algorithms, and we investigate several alternatives to the Householder reconstruction algorithm that sacrifice guarantees on numerical stability in some cases in order to obtain higher performance.« less
A highly efficient multi-core algorithm for clustering extremely large datasets
2010-01-01
Background In recent years, the demand for computational power in computational biology has increased due to rapidly growing data sets from microarray and other high-throughput technologies. This demand is likely to increase. Standard algorithms for analyzing data, such as cluster algorithms, need to be parallelized for fast processing. Unfortunately, most approaches for parallelizing algorithms largely rely on network communication protocols connecting and requiring multiple computers. One answer to this problem is to utilize the intrinsic capabilities in current multi-core hardware to distribute the tasks among the different cores of one computer. Results We introduce a multi-core parallelization of the k-means and k-modes cluster algorithms based on the design principles of transactional memory for clustering gene expression microarray type data and categorial SNP data. Our new shared memory parallel algorithms show to be highly efficient. We demonstrate their computational power and show their utility in cluster stability and sensitivity analysis employing repeated runs with slightly changed parameters. Computation speed of our Java based algorithm was increased by a factor of 10 for large data sets while preserving computational accuracy compared to single-core implementations and a recently published network based parallelization. Conclusions Most desktop computers and even notebooks provide at least dual-core processors. Our multi-core algorithms show that using modern algorithmic concepts, parallelization makes it possible to perform even such laborious tasks as cluster sensitivity and cluster number estimation on the laboratory computer. PMID:20370922
Small target detection using objectness and saliency
NASA Astrophysics Data System (ADS)
Zhang, Naiwen; Xiao, Yang; Fang, Zhiwen; Yang, Jian; Wang, Li; Li, Tao
2017-10-01
We are motived by the need for generic object detection algorithm which achieves high recall for small targets in complex scenes with acceptable computational efficiency. We propose a novel object detection algorithm, which has high localization quality with acceptable computational cost. Firstly, we obtain the objectness map as in BING[1] and use NMS to get the top N points. Then, k-means algorithm is used to cluster them into K classes according to their location. We set the center points of the K classes as seed points. For each seed point, an object potential region is extracted. Finally, a fast salient object detection algorithm[2] is applied to the object potential regions to highlight objectlike pixels, and a series of efficient post-processing operations are proposed to locate the targets. Our method runs at 5 FPS on 1000*1000 images, and significantly outperforms previous methods on small targets in cluttered background.
2011-01-01
Background Machine learning has a vast range of applications. In particular, advanced machine learning methods are routinely and increasingly used in quantitative structure activity relationship (QSAR) modeling. QSAR data sets often encompass tens of thousands of compounds and the size of proprietary, as well as public data sets, is rapidly growing. Hence, there is a demand for computationally efficient machine learning algorithms, easily available to researchers without extensive machine learning knowledge. In granting the scientific principles of transparency and reproducibility, Open Source solutions are increasingly acknowledged by regulatory authorities. Thus, an Open Source state-of-the-art high performance machine learning platform, interfacing multiple, customized machine learning algorithms for both graphical programming and scripting, to be used for large scale development of QSAR models of regulatory quality, is of great value to the QSAR community. Results This paper describes the implementation of the Open Source machine learning package AZOrange. AZOrange is specially developed to support batch generation of QSAR models in providing the full work flow of QSAR modeling, from descriptor calculation to automated model building, validation and selection. The automated work flow relies upon the customization of the machine learning algorithms and a generalized, automated model hyper-parameter selection process. Several high performance machine learning algorithms are interfaced for efficient data set specific selection of the statistical method, promoting model accuracy. Using the high performance machine learning algorithms of AZOrange does not require programming knowledge as flexible applications can be created, not only at a scripting level, but also in a graphical programming environment. Conclusions AZOrange is a step towards meeting the needs for an Open Source high performance machine learning platform, supporting the efficient development of highly accurate QSAR models fulfilling regulatory requirements. PMID:21798025
Stålring, Jonna C; Carlsson, Lars A; Almeida, Pedro; Boyer, Scott
2011-07-28
Machine learning has a vast range of applications. In particular, advanced machine learning methods are routinely and increasingly used in quantitative structure activity relationship (QSAR) modeling. QSAR data sets often encompass tens of thousands of compounds and the size of proprietary, as well as public data sets, is rapidly growing. Hence, there is a demand for computationally efficient machine learning algorithms, easily available to researchers without extensive machine learning knowledge. In granting the scientific principles of transparency and reproducibility, Open Source solutions are increasingly acknowledged by regulatory authorities. Thus, an Open Source state-of-the-art high performance machine learning platform, interfacing multiple, customized machine learning algorithms for both graphical programming and scripting, to be used for large scale development of QSAR models of regulatory quality, is of great value to the QSAR community. This paper describes the implementation of the Open Source machine learning package AZOrange. AZOrange is specially developed to support batch generation of QSAR models in providing the full work flow of QSAR modeling, from descriptor calculation to automated model building, validation and selection. The automated work flow relies upon the customization of the machine learning algorithms and a generalized, automated model hyper-parameter selection process. Several high performance machine learning algorithms are interfaced for efficient data set specific selection of the statistical method, promoting model accuracy. Using the high performance machine learning algorithms of AZOrange does not require programming knowledge as flexible applications can be created, not only at a scripting level, but also in a graphical programming environment. AZOrange is a step towards meeting the needs for an Open Source high performance machine learning platform, supporting the efficient development of highly accurate QSAR models fulfilling regulatory requirements.
An effective and efficient compression algorithm for ECG signals with irregular periods.
Chou, Hsiao-Hsuan; Chen, Ying-Jui; Shiau, Yu-Chien; Kuo, Te-Son
2006-06-01
This paper presents an effective and efficient preprocessing algorithm for two-dimensional (2-D) electrocardiogram (ECG) compression to better compress irregular ECG signals by exploiting their inter- and intra-beat correlations. To better reveal the correlation structure, we first convert the ECG signal into a proper 2-D representation, or image. This involves a few steps including QRS detection and alignment, period sorting, and length equalization. The resulting 2-D ECG representation is then ready to be compressed by an appropriate image compression algorithm. We choose the state-of-the-art JPEG2000 for its high efficiency and flexibility. In this way, the proposed algorithm is shown to outperform some existing arts in the literature by simultaneously achieving high compression ratio (CR), low percent root mean squared difference (PRD), low maximum error (MaxErr), and low standard derivation of errors (StdErr). In particular, because the proposed period sorting method rearranges the detected heartbeats into a smoother image that is easier to compress, this algorithm is insensitive to irregular ECG periods. Thus either the irregular ECG signals or the QRS false-detection cases can be better compressed. This is a significant improvement over existing 2-D ECG compression methods. Moreover, this algorithm is not tied exclusively to JPEG2000. It can also be combined with other 2-D preprocessing methods or appropriate codecs to enhance the compression performance in irregular ECG cases.
NASA Astrophysics Data System (ADS)
Chen, Gang; Yang, Bing; Zhang, Xiaoyun; Gao, Zhiyong
2017-07-01
The latest high efficiency video coding (HEVC) standard significantly increases the encoding complexity for improving its coding efficiency. Due to the limited computational capability of handheld devices, complexity constrained video coding has drawn great attention in recent years. A complexity control algorithm based on adaptive mode selection is proposed for interframe coding in HEVC. Considering the direct proportionality between encoding time and computational complexity, the computational complexity is measured in terms of encoding time. First, complexity is mapped to a target in terms of prediction modes. Then, an adaptive mode selection algorithm is proposed for the mode decision process. Specifically, the optimal mode combination scheme that is chosen through offline statistics is developed at low complexity. If the complexity budget has not been used up, an adaptive mode sorting method is employed to further improve coding efficiency. The experimental results show that the proposed algorithm achieves a very large complexity control range (as low as 10%) for the HEVC encoder while maintaining good rate-distortion performance. For the lowdelayP condition, compared with the direct resource allocation method and the state-of-the-art method, an average gain of 0.63 and 0.17 dB in BDPSNR is observed for 18 sequences when the target complexity is around 40%.
Novel and efficient tag SNPs selection algorithms.
Chen, Wen-Pei; Hung, Che-Lun; Tsai, Suh-Jen Jane; Lin, Yaw-Ling
2014-01-01
SNPs are the most abundant forms of genetic variations amongst species; the association studies between complex diseases and SNPs or haplotypes have received great attention. However, these studies are restricted by the cost of genotyping all SNPs; thus, it is necessary to find smaller subsets, or tag SNPs, representing the rest of the SNPs. In fact, the existing tag SNP selection algorithms are notoriously time-consuming. An efficient algorithm for tag SNP selection was presented, which was applied to analyze the HapMap YRI data. The experimental results show that the proposed algorithm can achieve better performance than the existing tag SNP selection algorithms; in most cases, this proposed algorithm is at least ten times faster than the existing methods. In many cases, when the redundant ratio of the block is high, the proposed algorithm can even be thousands times faster than the previously known methods. Tools and web services for haplotype block analysis integrated by hadoop MapReduce framework are also developed using the proposed algorithm as computation kernels.
Explicit symplectic algorithms based on generating functions for charged particle dynamics.
Zhang, Ruili; Qin, Hong; Tang, Yifa; Liu, Jian; He, Yang; Xiao, Jianyuan
2016-07-01
Dynamics of a charged particle in the canonical coordinates is a Hamiltonian system, and the well-known symplectic algorithm has been regarded as the de facto method for numerical integration of Hamiltonian systems due to its long-term accuracy and fidelity. For long-term simulations with high efficiency, explicit symplectic algorithms are desirable. However, it is generally believed that explicit symplectic algorithms are only available for sum-separable Hamiltonians, and this restriction limits the application of explicit symplectic algorithms to charged particle dynamics. To overcome this difficulty, we combine the familiar sum-split method and a generating function method to construct second- and third-order explicit symplectic algorithms for dynamics of charged particle. The generating function method is designed to generate explicit symplectic algorithms for product-separable Hamiltonian with form of H(x,p)=p_{i}f(x) or H(x,p)=x_{i}g(p). Applied to the simulations of charged particle dynamics, the explicit symplectic algorithms based on generating functions demonstrate superiorities in conservation and efficiency.
Explicit symplectic algorithms based on generating functions for charged particle dynamics
NASA Astrophysics Data System (ADS)
Zhang, Ruili; Qin, Hong; Tang, Yifa; Liu, Jian; He, Yang; Xiao, Jianyuan
2016-07-01
Dynamics of a charged particle in the canonical coordinates is a Hamiltonian system, and the well-known symplectic algorithm has been regarded as the de facto method for numerical integration of Hamiltonian systems due to its long-term accuracy and fidelity. For long-term simulations with high efficiency, explicit symplectic algorithms are desirable. However, it is generally believed that explicit symplectic algorithms are only available for sum-separable Hamiltonians, and this restriction limits the application of explicit symplectic algorithms to charged particle dynamics. To overcome this difficulty, we combine the familiar sum-split method and a generating function method to construct second- and third-order explicit symplectic algorithms for dynamics of charged particle. The generating function method is designed to generate explicit symplectic algorithms for product-separable Hamiltonian with form of H (x ,p ) =pif (x ) or H (x ,p ) =xig (p ) . Applied to the simulations of charged particle dynamics, the explicit symplectic algorithms based on generating functions demonstrate superiorities in conservation and efficiency.
A hardware-oriented concurrent TZ search algorithm for High-Efficiency Video Coding
NASA Astrophysics Data System (ADS)
Doan, Nghia; Kim, Tae Sung; Rhee, Chae Eun; Lee, Hyuk-Jae
2017-12-01
High-Efficiency Video Coding (HEVC) is the latest video coding standard, in which the compression performance is double that of its predecessor, the H.264/AVC standard, while the video quality remains unchanged. In HEVC, the test zone (TZ) search algorithm is widely used for integer motion estimation because it effectively searches the good-quality motion vector with a relatively small amount of computation. However, the complex computation structure of the TZ search algorithm makes it difficult to implement it in the hardware. This paper proposes a new integer motion estimation algorithm which is designed for hardware execution by modifying the conventional TZ search to allow parallel motion estimations of all prediction unit (PU) partitions. The algorithm consists of the three phases of zonal, raster, and refinement searches. At the beginning of each phase, the algorithm obtains the search points required by the original TZ search for all PU partitions in a coding unit (CU). Then, all redundant search points are removed prior to the estimation of the motion costs, and the best search points are then selected for all PUs. Compared to the conventional TZ search algorithm, experimental results show that the proposed algorithm significantly decreases the Bjøntegaard Delta bitrate (BD-BR) by 0.84%, and it also reduces the computational complexity by 54.54%.
Trellises and Trellis-Based Decoding Algorithms for Linear Block Codes. Part 3
NASA Technical Reports Server (NTRS)
Lin, Shu
1998-01-01
Decoding algorithms based on the trellis representation of a code (block or convolutional) drastically reduce decoding complexity. The best known and most commonly used trellis-based decoding algorithm is the Viterbi algorithm. It is a maximum likelihood decoding algorithm. Convolutional codes with the Viterbi decoding have been widely used for error control in digital communications over the last two decades. This chapter is concerned with the application of the Viterbi decoding algorithm to linear block codes. First, the Viterbi algorithm is presented. Then, optimum sectionalization of a trellis to minimize the computational complexity of a Viterbi decoder is discussed and an algorithm is presented. Some design issues for IC (integrated circuit) implementation of a Viterbi decoder are considered and discussed. Finally, a new decoding algorithm based on the principle of compare-select-add is presented. This new algorithm can be applied to both block and convolutional codes and is more efficient than the conventional Viterbi algorithm based on the add-compare-select principle. This algorithm is particularly efficient for rate 1/n antipodal convolutional codes and their high-rate punctured codes. It reduces computational complexity by one-third compared with the Viterbi algorithm.
NASA Astrophysics Data System (ADS)
Guo, Jie; Zhu, Chang`an
2016-01-01
The development of optics and computer technologies enables the application of the vision-based technique that uses digital cameras to the displacement measurement of large-scale structures. Compared with traditional contact measurements, vision-based technique allows for remote measurement, has a non-intrusive characteristic, and does not necessitate mass introduction. In this study, a high-speed camera system is developed to complete the displacement measurement in real time. The system consists of a high-speed camera and a notebook computer. The high-speed camera can capture images at a speed of hundreds of frames per second. To process the captured images in computer, the Lucas-Kanade template tracking algorithm in the field of computer vision is introduced. Additionally, a modified inverse compositional algorithm is proposed to reduce the computing time of the original algorithm and improve the efficiency further. The modified algorithm can rapidly accomplish one displacement extraction within 1 ms without having to install any pre-designed target panel onto the structures in advance. The accuracy and the efficiency of the system in the remote measurement of dynamic displacement are demonstrated in the experiments on motion platform and sound barrier on suspension viaduct. Experimental results show that the proposed algorithm can extract accurate displacement signal and accomplish the vibration measurement of large-scale structures.
An efficient dictionary learning algorithm and its application to 3-D medical image denoising.
Li, Shutao; Fang, Leyuan; Yin, Haitao
2012-02-01
In this paper, we propose an efficient dictionary learning algorithm for sparse representation of given data and suggest a way to apply this algorithm to 3-D medical image denoising. Our learning approach is composed of two main parts: sparse coding and dictionary updating. On the sparse coding stage, an efficient algorithm named multiple clusters pursuit (MCP) is proposed. The MCP first applies a dictionary structuring strategy to cluster the atoms with high coherence together, and then employs a multiple-selection strategy to select several competitive atoms at each iteration. These two strategies can greatly reduce the computation complexity of the MCP and assist it to obtain better sparse solution. On the dictionary updating stage, the alternating optimization that efficiently approximates the singular value decomposition is introduced. Furthermore, in the 3-D medical image denoising application, a joint 3-D operation is proposed for taking the learning capabilities of the presented algorithm to simultaneously capture the correlations within each slice and correlations across the nearby slices, thereby obtaining better denoising results. The experiments on both synthetically generated data and real 3-D medical images demonstrate that the proposed approach has superior performance compared to some well-known methods. © 2011 IEEE
DANoC: An Efficient Algorithm and Hardware Codesign of Deep Neural Networks on Chip.
Zhou, Xichuan; Li, Shengli; Tang, Fang; Hu, Shengdong; Lin, Zhi; Zhang, Lei
2017-07-18
Deep neural networks (NNs) are the state-of-the-art models for understanding the content of images and videos. However, implementing deep NNs in embedded systems is a challenging task, e.g., a typical deep belief network could exhaust gigabytes of memory and result in bandwidth and computational bottlenecks. To address this challenge, this paper presents an algorithm and hardware codesign for efficient deep neural computation. A hardware-oriented deep learning algorithm, named the deep adaptive network, is proposed to explore the sparsity of neural connections. By adaptively removing the majority of neural connections and robustly representing the reserved connections using binary integers, the proposed algorithm could save up to 99.9% memory utility and computational resources without undermining classification accuracy. An efficient sparse-mapping-memory-based hardware architecture is proposed to fully take advantage of the algorithmic optimization. Different from traditional Von Neumann architecture, the deep-adaptive network on chip (DANoC) brings communication and computation in close proximity to avoid power-hungry parameter transfers between on-board memory and on-chip computational units. Experiments over different image classification benchmarks show that the DANoC system achieves competitively high accuracy and efficiency comparing with the state-of-the-art approaches.
Bolding, Simon R.; Cleveland, Mathew Allen; Morel, Jim E.
2016-10-21
In this paper, we have implemented a new high-order low-order (HOLO) algorithm for solving thermal radiative transfer problems. The low-order (LO) system is based on the spatial and angular moments of the transport equation and a linear-discontinuous finite-element spatial representation, producing equations similar to the standard S 2 equations. The LO solver is fully implicit in time and efficiently resolves the nonlinear temperature dependence at each time step. The high-order (HO) solver utilizes exponentially convergent Monte Carlo (ECMC) to give a globally accurate solution for the angular intensity to a fixed-source pure-absorber transport problem. This global solution is used tomore » compute consistency terms, which require the HO and LO solutions to converge toward the same solution. The use of ECMC allows for the efficient reduction of statistical noise in the Monte Carlo solution, reducing inaccuracies introduced through the LO consistency terms. Finally, we compare results with an implicit Monte Carlo code for one-dimensional gray test problems and demonstrate the efficiency of ECMC over standard Monte Carlo in this HOLO algorithm.« less
An efficient group multicast routing for multimedia communication
NASA Astrophysics Data System (ADS)
Wang, Yanlin; Sun, Yugen; Yan, Xinfang
2004-04-01
Group multicasting is a kind of communication mechanism whereby each member of a group sends messages to all the other members of the same group. Group multicast routing algorithms capable of satisfying quality of service (QoS) requirements of multimedia applications are essential for high-speed networks. We present a heuristic algorithm for group multicast routing with end to end delay constraint. Source-specific routing trees for each member are generated in our algorithm, which satisfy member"s bandwidth and end to end delay requirements. Simulations over random network were carried out to compare proposed algorithm performance with Low and Song"s. The experimental results show that our proposed algorithm performs better in terms of network cost and ability in constructing feasible multicast trees for group members. Moreover, our algorithm achieves good performance in balancing traffic, which can avoid link blocking and enhance the network behavior efficiently.
NASA Astrophysics Data System (ADS)
Fu, Lin; Hu, Xiangyu Y.; Adams, Nikolaus A.
2017-12-01
We propose efficient single-step formulations for reinitialization and extending algorithms, which are critical components of level-set based interface-tracking methods. The level-set field is reinitialized with a single-step (non iterative) "forward tracing" algorithm. A minimum set of cells is defined that describes the interface, and reinitialization employs only data from these cells. Fluid states are extrapolated or extended across the interface by a single-step "backward tracing" algorithm. Both algorithms, which are motivated by analogy to ray-tracing, avoid multiple block-boundary data exchanges that are inevitable for iterative reinitialization and extending approaches within a parallel-computing environment. The single-step algorithms are combined with a multi-resolution conservative sharp-interface method and validated by a wide range of benchmark test cases. We demonstrate that the proposed reinitialization method achieves second-order accuracy in conserving the volume of each phase. The interface location is invariant to reapplication of the single-step reinitialization. Generally, we observe smaller absolute errors than for standard iterative reinitialization on the same grid. The computational efficiency is higher than for the standard and typical high-order iterative reinitialization methods. We observe a 2- to 6-times efficiency improvement over the standard method for serial execution. The proposed single-step extending algorithm, which is commonly employed for assigning data to ghost cells with ghost-fluid or conservative interface interaction methods, shows about 10-times efficiency improvement over the standard method while maintaining same accuracy. Despite their simplicity, the proposed algorithms offer an efficient and robust alternative to iterative reinitialization and extending methods for level-set based multi-phase simulations.
Scalable domain decomposition solvers for stochastic PDEs in high performance computing
Desai, Ajit; Khalil, Mohammad; Pettit, Chris; ...
2017-09-21
Stochastic spectral finite element models of practical engineering systems may involve solutions of linear systems or linearized systems for non-linear problems with billions of unknowns. For stochastic modeling, it is therefore essential to design robust, parallel and scalable algorithms that can efficiently utilize high-performance computing to tackle such large-scale systems. Domain decomposition based iterative solvers can handle such systems. And though these algorithms exhibit excellent scalabilities, significant algorithmic and implementational challenges exist to extend them to solve extreme-scale stochastic systems using emerging computing platforms. Intrusive polynomial chaos expansion based domain decomposition algorithms are extended here to concurrently handle high resolutionmore » in both spatial and stochastic domains using an in-house implementation. Sparse iterative solvers with efficient preconditioners are employed to solve the resulting global and subdomain level local systems through multi-level iterative solvers. We also use parallel sparse matrix–vector operations to reduce the floating-point operations and memory requirements. Numerical and parallel scalabilities of these algorithms are presented for the diffusion equation having spatially varying diffusion coefficient modeled by a non-Gaussian stochastic process. Scalability of the solvers with respect to the number of random variables is also investigated.« less
Scalable domain decomposition solvers for stochastic PDEs in high performance computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Desai, Ajit; Khalil, Mohammad; Pettit, Chris
Stochastic spectral finite element models of practical engineering systems may involve solutions of linear systems or linearized systems for non-linear problems with billions of unknowns. For stochastic modeling, it is therefore essential to design robust, parallel and scalable algorithms that can efficiently utilize high-performance computing to tackle such large-scale systems. Domain decomposition based iterative solvers can handle such systems. And though these algorithms exhibit excellent scalabilities, significant algorithmic and implementational challenges exist to extend them to solve extreme-scale stochastic systems using emerging computing platforms. Intrusive polynomial chaos expansion based domain decomposition algorithms are extended here to concurrently handle high resolutionmore » in both spatial and stochastic domains using an in-house implementation. Sparse iterative solvers with efficient preconditioners are employed to solve the resulting global and subdomain level local systems through multi-level iterative solvers. We also use parallel sparse matrix–vector operations to reduce the floating-point operations and memory requirements. Numerical and parallel scalabilities of these algorithms are presented for the diffusion equation having spatially varying diffusion coefficient modeled by a non-Gaussian stochastic process. Scalability of the solvers with respect to the number of random variables is also investigated.« less
Advanced Dynamically Adaptive Algorithms for Stochastic Simulations on Extreme Scales
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xiu, Dongbin
2017-03-03
The focus of the project is the development of mathematical methods and high-performance computational tools for stochastic simulations, with a particular emphasis on computations on extreme scales. The core of the project revolves around the design of highly efficient and scalable numerical algorithms that can adaptively and accurately, in high dimensional spaces, resolve stochastic problems with limited smoothness, even containing discontinuities.
High throughput light absorber discovery, Part 1: An algorithm for automated tauc analysis
Suram, Santosh K.; Newhouse, Paul F.; Gregoire, John M.
2016-09-23
High-throughput experimentation provides efficient mapping of composition-property relationships, and its implementation for the discovery of optical materials enables advancements in solar energy and other technologies. In a high throughput pipeline, automated data processing algorithms are often required to match experimental throughput, and we present an automated Tauc analysis algorithm for estimating band gap energies from optical spectroscopy data. The algorithm mimics the judgment of an expert scientist, which is demonstrated through its application to a variety of high throughput spectroscopy data, including the identification of indirect or direct band gaps in Fe 2O 3, Cu 2V 2O 7, and BiVOmore » 4. Here, the applicability of the algorithm to estimate a range of band gap energies for various materials is demonstrated by a comparison of direct-allowed band gaps estimated by expert scientists and by automated algorithm for 60 optical spectra.« less
Abd-Elhameed, Waleed M.; Doha, Eid H.; Bassuony, Mahmoud A.
2014-01-01
Two numerical algorithms based on dual-Petrov-Galerkin method are developed for solving the integrated forms of high odd-order boundary value problems (BVPs) governed by homogeneous and nonhomogeneous boundary conditions. Two different choices of trial functions and test functions which satisfy the underlying boundary conditions of the differential equations and the dual boundary conditions are used for this purpose. These choices lead to linear systems with specially structured matrices that can be efficiently inverted, hence greatly reducing the cost. The various matrix systems resulting from these discretizations are carefully investigated, especially their complexities and their condition numbers. Numerical results are given to illustrate the efficiency of the proposed algorithms, and some comparisons with some other methods are made. PMID:24616620
Efficient Statistically Accurate Algorithms for the Fokker-Planck Equation in Large Dimensions
NASA Astrophysics Data System (ADS)
Chen, N.; Majda, A.
2017-12-01
Solving the Fokker-Planck equation for high-dimensional complex turbulent dynamical systems is an important and practical issue. However, most traditional methods suffer from the curse of dimensionality and have difficulties in capturing the fat tailed highly intermittent probability density functions (PDFs) of complex systems in turbulence, neuroscience and excitable media. In this article, efficient statistically accurate algorithms are developed for solving both the transient and the equilibrium solutions of Fokker-Planck equations associated with high-dimensional nonlinear turbulent dynamical systems with conditional Gaussian structures. The algorithms involve a hybrid strategy that requires only a small number of ensembles. Here, a conditional Gaussian mixture in a high-dimensional subspace via an extremely efficient parametric method is combined with a judicious non-parametric Gaussian kernel density estimation in the remaining low-dimensional subspace. Particularly, the parametric method, which is based on an effective data assimilation framework, provides closed analytical formulae for determining the conditional Gaussian distributions in the high-dimensional subspace. Therefore, it is computationally efficient and accurate. The full non-Gaussian PDF of the system is then given by a Gaussian mixture. Different from the traditional particle methods, each conditional Gaussian distribution here covers a significant portion of the high-dimensional PDF. Therefore a small number of ensembles is sufficient to recover the full PDF, which overcomes the curse of dimensionality. Notably, the mixture distribution has a significant skill in capturing the transient behavior with fat tails of the high-dimensional non-Gaussian PDFs, and this facilitates the algorithms in accurately describing the intermittency and extreme events in complex turbulent systems. It is shown in a stringent set of test problems that the method only requires an order of O(100) ensembles to successfully recover the highly non-Gaussian transient PDFs in up to 6 dimensions with only small errors.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hayami, Masao; Seino, Junji; Nakai, Hiromi, E-mail: nakai@waseda.jp
An efficient algorithm for the rapid evaluation of electron repulsion integrals is proposed. The present method, denoted by accompanying coordinate expansion and transferred recurrence relation (ACE-TRR), is constructed using a transfer relation scheme based on the accompanying coordinate expansion and recurrence relation method. Furthermore, the ACE-TRR algorithm is extended for the general-contraction basis sets. Numerical assessments clarify the efficiency of the ACE-TRR method for the systems including heavy elements, whose orbitals have long contractions and high angular momenta, such as f- and g-orbitals.
Speedup of minimum discontinuity phase unwrapping algorithm with a reference phase distribution
NASA Astrophysics Data System (ADS)
Liu, Yihang; Han, Yu; Li, Fengjiao; Zhang, Qican
2018-06-01
In three-dimensional (3D) shape measurement based on phase analysis, the phase analysis process usually produces a wrapped phase map ranging from - π to π with some 2 π discontinuities, and thus a phase unwrapping algorithm is necessary to recover the continuous and nature phase map from which 3D height distribution can be restored. Usually, the minimum discontinuity phase unwrapping algorithm can be used to solve many different kinds of phase unwrapping problems, but its main drawback is that it requires a large amount of computations and has low efficiency in searching for the improving loop within the phase's discontinuity area. To overcome this drawback, an improvement to speedup of the minimum discontinuity phase unwrapping algorithm by using the phase distribution on reference plane is proposed. In this improved algorithm, before the minimum discontinuity phase unwrapping algorithm is carried out to unwrap phase, an integer number K was calculated from the ratio of the wrapped phase to the nature phase on a reference plane. And then the jump counts of the unwrapped phase can be reduced by adding 2K π, so the efficiency of the minimum discontinuity phase unwrapping algorithm is significantly improved. Both simulated and experimental data results verify the feasibility of the proposed improved algorithm, and both of them clearly show that the algorithm works very well and has high efficiency.
Improving and Evaluating Nested Sampling Algorithm for Marginal Likelihood Estimation
NASA Astrophysics Data System (ADS)
Ye, M.; Zeng, X.; Wu, J.; Wang, D.; Liu, J.
2016-12-01
With the growing impacts of climate change and human activities on the cycle of water resources, an increasing number of researches focus on the quantification of modeling uncertainty. Bayesian model averaging (BMA) provides a popular framework for quantifying conceptual model and parameter uncertainty. The ensemble prediction is generated by combining each plausible model's prediction, and each model is attached with a model weight which is determined by model's prior weight and marginal likelihood. Thus, the estimation of model's marginal likelihood is crucial for reliable and accurate BMA prediction. Nested sampling estimator (NSE) is a new proposed method for marginal likelihood estimation. The process of NSE is accomplished by searching the parameters' space from low likelihood area to high likelihood area gradually, and this evolution is finished iteratively via local sampling procedure. Thus, the efficiency of NSE is dominated by the strength of local sampling procedure. Currently, Metropolis-Hasting (M-H) algorithm is often used for local sampling. However, M-H is not an efficient sampling algorithm for high-dimensional or complicated parameter space. For improving the efficiency of NSE, it could be ideal to incorporate the robust and efficient sampling algorithm - DREAMzs into the local sampling of NSE. The comparison results demonstrated that the improved NSE could improve the efficiency of marginal likelihood estimation significantly. However, both improved and original NSEs suffer from heavy instability. In addition, the heavy computation cost of huge number of model executions is overcome by using an adaptive sparse grid surrogates.
The methodology of the gas turbine efficiency calculation
NASA Astrophysics Data System (ADS)
Kotowicz, Janusz; Job, Marcin; Brzęczek, Mateusz; Nawrat, Krzysztof; Mędrych, Janusz
2016-12-01
In the paper a calculation methodology of isentropic efficiency of a compressor and turbine in a gas turbine installation on the basis of polytropic efficiency characteristics is presented. A gas turbine model is developed into software for power plant simulation. There are shown the calculation algorithms based on iterative model for isentropic efficiency of the compressor and for isentropic efficiency of the turbine based on the turbine inlet temperature. The isentropic efficiency characteristics of the compressor and the turbine are developed by means of the above mentioned algorithms. The gas turbine development for the high compressor ratios was the main driving force for this analysis. The obtained gas turbine electric efficiency characteristics show that an increase of pressure ratio above 50 is not justified due to the slight increase in the efficiency with a significant increase of turbine inlet combustor outlet and temperature.
Improving Cancer Detection and Dose Efficiency in Dedicated Breast Cancer CT
2010-02-01
source trajectory and data truncation, which can however be solved with the back-projection filtration ( BPF ) algorithm [6,7]. I have used the BPF ...high to low radiation dose levels. I have investigated noise properties in images reconstructed by use of FDK and BPF algorithms at different noise...analytic algorithms such as the FDK and BPF algorithms are applied to sparse-view data, the reconstruction images will contain artifacts such as streak
A Novel Latin Hypercube Algorithm via Translational Propagation
Pan, Guang; Ye, Pengcheng
2014-01-01
Metamodels have been widely used in engineering design to facilitate analysis and optimization of complex systems that involve computationally expensive simulation programs. The accuracy of metamodels is directly related to the experimental designs used. Optimal Latin hypercube designs are frequently used and have been shown to have good space-filling and projective properties. However, the high cost in constructing them limits their use. In this paper, a methodology for creating novel Latin hypercube designs via translational propagation and successive local enumeration algorithm (TPSLE) is developed without using formal optimization. TPSLE algorithm is based on the inspiration that a near optimal Latin Hypercube design can be constructed by a simple initial block with a few points generated by algorithm SLE as a building block. In fact, TPSLE algorithm offers a balanced trade-off between the efficiency and sampling performance. The proposed algorithm is compared to two existing algorithms and is found to be much more efficient in terms of the computation time and has acceptable space-filling and projective properties. PMID:25276844
Applications and accuracy of the parallel diagonal dominant algorithm
NASA Technical Reports Server (NTRS)
Sun, Xian-He
1993-01-01
The Parallel Diagonal Dominant (PDD) algorithm is a highly efficient, ideally scalable tridiagonal solver. In this paper, a detailed study of the PDD algorithm is given. First the PDD algorithm is introduced. Then the algorithm is extended to solve periodic tridiagonal systems. A variant, the reduced PDD algorithm, is also proposed. Accuracy analysis is provided for a class of tridiagonal systems, the symmetric, and anti-symmetric Toeplitz tridiagonal systems. Implementation results show that the analysis gives a good bound on the relative error, and the algorithm is a good candidate for the emerging massively parallel machines.
Efficient and Flexible Computation of Many-Electron Wave Function Overlaps.
Plasser, Felix; Ruckenbauer, Matthias; Mai, Sebastian; Oppel, Markus; Marquetand, Philipp; González, Leticia
2016-03-08
A new algorithm for the computation of the overlap between many-electron wave functions is described. This algorithm allows for the extensive use of recurring intermediates and thus provides high computational efficiency. Because of the general formalism employed, overlaps can be computed for varying wave function types, molecular orbitals, basis sets, and molecular geometries. This paves the way for efficiently computing nonadiabatic interaction terms for dynamics simulations. In addition, other application areas can be envisaged, such as the comparison of wave functions constructed at different levels of theory. Aside from explaining the algorithm and evaluating the performance, a detailed analysis of the numerical stability of wave function overlaps is carried out, and strategies for overcoming potential severe pitfalls due to displaced atoms and truncated wave functions are presented.
NASA Technical Reports Server (NTRS)
Delaat, J. C.; Merrill, W. C.
1983-01-01
A sensor failure detection, isolation, and accommodation algorithm was developed which incorporates analytic sensor redundancy through software. This algorithm was implemented in a high level language on a microprocessor based controls computer. Parallel processing and state-of-the-art 16-bit microprocessors are used along with efficient programming practices to achieve real-time operation.
Efficient parallel implementation of active appearance model fitting algorithm on GPU.
Wang, Jinwei; Ma, Xirong; Zhu, Yuanping; Sun, Jizhou
2014-01-01
The active appearance model (AAM) is one of the most powerful model-based object detecting and tracking methods which has been widely used in various situations. However, the high-dimensional texture representation causes very time-consuming computations, which makes the AAM difficult to apply to real-time systems. The emergence of modern graphics processing units (GPUs) that feature a many-core, fine-grained parallel architecture provides new and promising solutions to overcome the computational challenge. In this paper, we propose an efficient parallel implementation of the AAM fitting algorithm on GPUs. Our design idea is fine grain parallelism in which we distribute the texture data of the AAM, in pixels, to thousands of parallel GPU threads for processing, which makes the algorithm fit better into the GPU architecture. We implement our algorithm using the compute unified device architecture (CUDA) on the Nvidia's GTX 650 GPU, which has the latest Kepler architecture. To compare the performance of our algorithm with different data sizes, we built sixteen face AAM models of different dimensional textures. The experiment results show that our parallel AAM fitting algorithm can achieve real-time performance for videos even on very high-dimensional textures.
Efficient Parallel Implementation of Active Appearance Model Fitting Algorithm on GPU
Wang, Jinwei; Ma, Xirong; Zhu, Yuanping; Sun, Jizhou
2014-01-01
The active appearance model (AAM) is one of the most powerful model-based object detecting and tracking methods which has been widely used in various situations. However, the high-dimensional texture representation causes very time-consuming computations, which makes the AAM difficult to apply to real-time systems. The emergence of modern graphics processing units (GPUs) that feature a many-core, fine-grained parallel architecture provides new and promising solutions to overcome the computational challenge. In this paper, we propose an efficient parallel implementation of the AAM fitting algorithm on GPUs. Our design idea is fine grain parallelism in which we distribute the texture data of the AAM, in pixels, to thousands of parallel GPU threads for processing, which makes the algorithm fit better into the GPU architecture. We implement our algorithm using the compute unified device architecture (CUDA) on the Nvidia's GTX 650 GPU, which has the latest Kepler architecture. To compare the performance of our algorithm with different data sizes, we built sixteen face AAM models of different dimensional textures. The experiment results show that our parallel AAM fitting algorithm can achieve real-time performance for videos even on very high-dimensional textures. PMID:24723812
Symmetry compression method for discovering network motifs.
Wang, Jianxin; Huang, Yuannan; Wu, Fang-Xiang; Pan, Yi
2012-01-01
Discovering network motifs could provide a significant insight into systems biology. Interestingly, many biological networks have been found to have a high degree of symmetry (automorphism), which is inherent in biological network topologies. The symmetry due to the large number of basic symmetric subgraphs (BSSs) causes a certain redundant calculation in discovering network motifs. Therefore, we compress all basic symmetric subgraphs before extracting compressed subgraphs and propose an efficient decompression algorithm to decompress all compressed subgraphs without loss of any information. In contrast to previous approaches, the novel Symmetry Compression method for Motif Detection, named as SCMD, eliminates most redundant calculations caused by widespread symmetry of biological networks. We use SCMD to improve three notable exact algorithms and two efficient sampling algorithms. Results of all exact algorithms with SCMD are the same as those of the original algorithms, since SCMD is a lossless method. The sampling results show that the use of SCMD almost does not affect the quality of sampling results. For highly symmetric networks, we find that SCMD used in both exact and sampling algorithms can help get a remarkable speedup. Furthermore, SCMD enables us to find larger motifs in biological networks with notable symmetry than previously possible.
Rapid Calculation of Max-Min Fair Rates for Multi-Commodity Flows in Fat-Tree Networks
Mollah, Md Atiqul; Yuan, Xin; Pakin, Scott; ...
2017-08-29
Max-min fairness is often used in the performance modeling of interconnection networks. Existing methods to compute max-min fair rates for multi-commodity flows have high complexity and are computationally infeasible for large networks. In this paper, we show that by considering topological features, this problem can be solved efficiently for the fat-tree topology that is widely used in data centers and high performance compute clusters. Several efficient new algorithms are developed for this problem, including a parallel algorithm that can take advantage of multi-core and shared-memory architectures. Using these algorithms, we demonstrate that it is possible to find the max-min fairmore » rate allocation for multi-commodity flows in fat-tree networks that support tens of thousands of nodes. We evaluate the run-time performance of the proposed algorithms and show improvement in orders of magnitude over the previously best known method. Finally, we further demonstrate a new application of max-min fair rate allocation that is only computationally feasible using our new algorithms.« less
Rapid Calculation of Max-Min Fair Rates for Multi-Commodity Flows in Fat-Tree Networks
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mollah, Md Atiqul; Yuan, Xin; Pakin, Scott
Max-min fairness is often used in the performance modeling of interconnection networks. Existing methods to compute max-min fair rates for multi-commodity flows have high complexity and are computationally infeasible for large networks. In this paper, we show that by considering topological features, this problem can be solved efficiently for the fat-tree topology that is widely used in data centers and high performance compute clusters. Several efficient new algorithms are developed for this problem, including a parallel algorithm that can take advantage of multi-core and shared-memory architectures. Using these algorithms, we demonstrate that it is possible to find the max-min fairmore » rate allocation for multi-commodity flows in fat-tree networks that support tens of thousands of nodes. We evaluate the run-time performance of the proposed algorithms and show improvement in orders of magnitude over the previously best known method. Finally, we further demonstrate a new application of max-min fair rate allocation that is only computationally feasible using our new algorithms.« less
An efficient three-dimensional Poisson solver for SIMD high-performance-computing architectures
NASA Technical Reports Server (NTRS)
Cohl, H.
1994-01-01
We present an algorithm that solves the three-dimensional Poisson equation on a cylindrical grid. The technique uses a finite-difference scheme with operator splitting. This splitting maps the banded structure of the operator matrix into a two-dimensional set of tridiagonal matrices, which are then solved in parallel. Our algorithm couples FFT techniques with the well-known ADI (Alternating Direction Implicit) method for solving Elliptic PDE's, and the implementation is extremely well suited for a massively parallel environment like the SIMD architecture of the MasPar MP-1. Due to the highly recursive nature of our problem, we believe that our method is highly efficient, as it avoids excessive interprocessor communication.
Sparse Regression as a Sparse Eigenvalue Problem
NASA Technical Reports Server (NTRS)
Moghaddam, Baback; Gruber, Amit; Weiss, Yair; Avidan, Shai
2008-01-01
We extend the l0-norm "subspectral" algorithms for sparse-LDA [5] and sparse-PCA [6] to general quadratic costs such as MSE in linear (kernel) regression. The resulting "Sparse Least Squares" (SLS) problem is also NP-hard, by way of its equivalence to a rank-1 sparse eigenvalue problem (e.g., binary sparse-LDA [7]). Specifically, for a general quadratic cost we use a highly-efficient technique for direct eigenvalue computation using partitioned matrix inverses which leads to dramatic x103 speed-ups over standard eigenvalue decomposition. This increased efficiency mitigates the O(n4) scaling behaviour that up to now has limited the previous algorithms' utility for high-dimensional learning problems. Moreover, the new computation prioritizes the role of the less-myopic backward elimination stage which becomes more efficient than forward selection. Similarly, branch-and-bound search for Exact Sparse Least Squares (ESLS) also benefits from partitioned matrix inverse techniques. Our Greedy Sparse Least Squares (GSLS) generalizes Natarajan's algorithm [9] also known as Order-Recursive Matching Pursuit (ORMP). Specifically, the forward half of GSLS is exactly equivalent to ORMP but more efficient. By including the backward pass, which only doubles the computation, we can achieve lower MSE than ORMP. Experimental comparisons to the state-of-the-art LARS algorithm [3] show forward-GSLS is faster, more accurate and more flexible in terms of choice of regularization
Efficiency of exchange schemes in replica exchange
NASA Astrophysics Data System (ADS)
Lingenheil, Martin; Denschlag, Robert; Mathias, Gerald; Tavan, Paul
2009-08-01
In replica exchange simulations a fast diffusion of the replicas through the temperature space maximizes the efficiency of the statistical sampling. Here, we compare the diffusion speed as measured by the round trip rates for four exchange algorithms. We find different efficiency profiles with optimal average acceptance probabilities ranging from 8% to 41%. The best performance is determined by benchmark simulations for the most widely used algorithm, which alternately tries to exchange all even and all odd replica pairs. By analytical mathematics we show that the excellent performance of this exchange scheme is due to the high diffusivity of the underlying random walk.
Al-Rajab, Murad; Lu, Joan; Xu, Qiang
2017-07-01
This paper examines the accuracy and efficiency (time complexity) of high performance genetic data feature selection and classification algorithms for colon cancer diagnosis. The need for this research derives from the urgent and increasing need for accurate and efficient algorithms. Colon cancer is a leading cause of death worldwide, hence it is vitally important for the cancer tissues to be expertly identified and classified in a rapid and timely manner, to assure both a fast detection of the disease and to expedite the drug discovery process. In this research, a three-phase approach was proposed and implemented: Phases One and Two examined the feature selection algorithms and classification algorithms employed separately, and Phase Three examined the performance of the combination of these. It was found from Phase One that the Particle Swarm Optimization (PSO) algorithm performed best with the colon dataset as a feature selection (29 genes selected) and from Phase Two that the Support Vector Machine (SVM) algorithm outperformed other classifications, with an accuracy of almost 86%. It was also found from Phase Three that the combined use of PSO and SVM surpassed other algorithms in accuracy and performance, and was faster in terms of time analysis (94%). It is concluded that applying feature selection algorithms prior to classification algorithms results in better accuracy than when the latter are applied alone. This conclusion is important and significant to industry and society. Copyright © 2017 Elsevier B.V. All rights reserved.
A lightweight QRS detector for single lead ECG signals using a max-min difference algorithm.
Pandit, Diptangshu; Zhang, Li; Liu, Chengyu; Chattopadhyay, Samiran; Aslam, Nauman; Lim, Chee Peng
2017-06-01
Detection of the R-peak pertaining to the QRS complex of an ECG signal plays an important role for the diagnosis of a patient's heart condition. To accurately identify the QRS locations from the acquired raw ECG signals, we need to handle a number of challenges, which include noise, baseline wander, varying peak amplitudes, and signal abnormality. This research aims to address these challenges by developing an efficient lightweight algorithm for QRS (i.e., R-peak) detection from raw ECG signals. A lightweight real-time sliding window-based Max-Min Difference (MMD) algorithm for QRS detection from Lead II ECG signals is proposed. Targeting to achieve the best trade-off between computational efficiency and detection accuracy, the proposed algorithm consists of five key steps for QRS detection, namely, baseline correction, MMD curve generation, dynamic threshold computation, R-peak detection, and error correction. Five annotated databases from Physionet are used for evaluating the proposed algorithm in R-peak detection. Integrated with a feature extraction technique and a neural network classifier, the proposed ORS detection algorithm has also been extended to undertake normal and abnormal heartbeat detection from ECG signals. The proposed algorithm exhibits a high degree of robustness in QRS detection and achieves an average sensitivity of 99.62% and an average positive predictivity of 99.67%. Its performance compares favorably with those from the existing state-of-the-art models reported in the literature. In regards to normal and abnormal heartbeat detection, the proposed QRS detection algorithm in combination with the feature extraction technique and neural network classifier achieves an overall accuracy rate of 93.44% based on an empirical evaluation using the MIT-BIH Arrhythmia data set with 10-fold cross validation. In comparison with other related studies, the proposed algorithm offers a lightweight adaptive alternative for R-peak detection with good computational efficiency. The empirical results indicate that it not only yields a high accuracy rate in QRS detection, but also exhibits efficient computational complexity at the order of O(n), where n is the length of an ECG signal. Copyright © 2017 Elsevier B.V. All rights reserved.
Versatile and efficient pore network extraction method using marker-based watershed segmentation
NASA Astrophysics Data System (ADS)
Gostick, Jeff T.
2017-08-01
Obtaining structural information from tomographic images of porous materials is a critical component of porous media research. Extracting pore networks is particularly valuable since it enables pore network modeling simulations which can be useful for a host of tasks from predicting transport properties to simulating performance of entire devices. This work reports an efficient algorithm for extracting networks using only standard image analysis techniques. The algorithm was applied to several standard porous materials ranging from sandstone to fibrous mats, and in all cases agreed very well with established or known values for pore and throat sizes, capillary pressure curves, and permeability. In the case of sandstone, the present algorithm was compared to the network obtained using the current state-of-the-art algorithm, and very good agreement was achieved. Most importantly, the network extracted from an image of fibrous media correctly predicted the anisotropic permeability tensor, demonstrating the critical ability to detect key structural features. The highly efficient algorithm allows extraction on fairly large images of 5003 voxels in just over 200 s. The ability for one algorithm to match materials as varied as sandstone with 20% porosity and fibrous media with 75% porosity is a significant advancement. The source code for this algorithm is provided.
An Improved Nested Sampling Algorithm for Model Selection and Assessment
NASA Astrophysics Data System (ADS)
Zeng, X.; Ye, M.; Wu, J.; WANG, D.
2017-12-01
Multimodel strategy is a general approach for treating model structure uncertainty in recent researches. The unknown groundwater system is represented by several plausible conceptual models. Each alternative conceptual model is attached with a weight which represents the possibility of this model. In Bayesian framework, the posterior model weight is computed as the product of model prior weight and marginal likelihood (or termed as model evidence). As a result, estimating marginal likelihoods is crucial for reliable model selection and assessment in multimodel analysis. Nested sampling estimator (NSE) is a new proposed algorithm for marginal likelihood estimation. The implementation of NSE comprises searching the parameters' space from low likelihood area to high likelihood area gradually, and this evolution is finished iteratively via local sampling procedure. Thus, the efficiency of NSE is dominated by the strength of local sampling procedure. Currently, Metropolis-Hasting (M-H) algorithm and its variants are often used for local sampling in NSE. However, M-H is not an efficient sampling algorithm for high-dimensional or complex likelihood function. For improving the performance of NSE, it could be feasible to integrate more efficient and elaborated sampling algorithm - DREAMzs into the local sampling. In addition, in order to overcome the computation burden problem of large quantity of repeating model executions in marginal likelihood estimation, an adaptive sparse grid stochastic collocation method is used to build the surrogates for original groundwater model.
An image segmentation method based on fuzzy C-means clustering and Cuckoo search algorithm
NASA Astrophysics Data System (ADS)
Wang, Mingwei; Wan, Youchuan; Gao, Xianjun; Ye, Zhiwei; Chen, Maolin
2018-04-01
Image segmentation is a significant step in image analysis and machine vision. Many approaches have been presented in this topic; among them, fuzzy C-means (FCM) clustering is one of the most widely used methods for its high efficiency and ambiguity of images. However, the success of FCM could not be guaranteed because it easily traps into local optimal solution. Cuckoo search (CS) is a novel evolutionary algorithm, which has been tested on some optimization problems and proved to be high-efficiency. Therefore, a new segmentation technique using FCM and blending of CS algorithm is put forward in the paper. Further, the proposed method has been measured on several images and compared with other existing FCM techniques such as genetic algorithm (GA) based FCM and particle swarm optimization (PSO) based FCM in terms of fitness value. Experimental results indicate that the proposed method is robust, adaptive and exhibits the better performance than other methods involved in the paper.
Li, Ning; Cürüklü, Baran; Bastos, Joaquim; Sucasas, Victor; Fernandez, Jose Antonio Sanchez; Rodriguez, Jonathan
2017-01-01
The aim of the Smart and Networking Underwater Robots in Cooperation Meshes (SWARMs) project is to make autonomous underwater vehicles (AUVs), remote operated vehicles (ROVs) and unmanned surface vehicles (USVs) more accessible and useful. To achieve cooperation and communication between different AUVs, these must be able to exchange messages, so an efficient and reliable communication network is necessary for SWARMs. In order to provide an efficient and reliable communication network for mission execution, one of the important and necessary issues is the topology control of the network of AUVs that are cooperating underwater. However, due to the specific properties of an underwater AUV cooperation network, such as the high mobility of AUVs, large transmission delays, low bandwidth, etc., the traditional topology control algorithms primarily designed for terrestrial wireless sensor networks cannot be used directly in the underwater environment. Moreover, these algorithms, in which the nodes adjust their transmission power once the current transmission power does not equal an optimal one, are costly in an underwater cooperating AUV network. Considering these facts, in this paper, we propose a Probabilistic Topology Control (PTC) algorithm for an underwater cooperating AUV network. In PTC, when the transmission power of an AUV is not equal to the optimal transmission power, then whether the transmission power needs to be adjusted or not will be determined based on the AUV’s parameters. Each AUV determines their own transmission power adjustment probability based on the parameter deviations. The larger the deviation, the higher the transmission power adjustment probability is, and vice versa. For evaluating the performance of PTC, we combine the PTC algorithm with the Fuzzy logic Topology Control (FTC) algorithm and compare the performance of these two algorithms. The simulation results have demonstrated that the PTC is efficient at reducing the transmission power adjustment ratio while improving the network performance. PMID:28471387
Li, Ning; Cürüklü, Baran; Bastos, Joaquim; Sucasas, Victor; Fernandez, Jose Antonio Sanchez; Rodriguez, Jonathan
2017-05-04
The aim of the Smart and Networking Underwater Robots in Cooperation Meshes (SWARMs) project is to make autonomous underwater vehicles (AUVs), remote operated vehicles (ROVs) and unmanned surface vehicles (USVs) more accessible and useful. To achieve cooperation and communication between different AUVs, these must be able to exchange messages, so an efficient and reliable communication network is necessary for SWARMs. In order to provide an efficient and reliable communication network for mission execution, one of the important and necessary issues is the topology control of the network of AUVs that are cooperating underwater. However, due to the specific properties of an underwater AUV cooperation network, such as the high mobility of AUVs, large transmission delays, low bandwidth, etc., the traditional topology control algorithms primarily designed for terrestrial wireless sensor networks cannot be used directly in the underwater environment. Moreover, these algorithms, in which the nodes adjust their transmission power once the current transmission power does not equal an optimal one, are costly in an underwater cooperating AUV network. Considering these facts, in this paper, we propose a Probabilistic Topology Control (PTC) algorithm for an underwater cooperating AUV network. In PTC, when the transmission power of an AUV is not equal to the optimal transmission power, then whether the transmission power needs to be adjusted or not will be determined based on the AUV's parameters. Each AUV determines their own transmission power adjustment probability based on the parameter deviations. The larger the deviation, the higher the transmission power adjustment probability is, and vice versa. For evaluating the performance of PTC, we combine the PTC algorithm with the Fuzzy logic Topology Control (FTC) algorithm and compare the performance of these two algorithms. The simulation results have demonstrated that the PTC is efficient at reducing the transmission power adjustment ratio while improving the network performance.
NASA Astrophysics Data System (ADS)
Rastogi, Richa; Londhe, Ashutosh; Srivastava, Abhishek; Sirasala, Kirannmayi M.; Khonde, Kiran
2017-03-01
In this article, a new scalable 3D Kirchhoff depth migration algorithm is presented on state of the art multicore CPU based cluster. Parallelization of 3D Kirchhoff depth migration is challenging due to its high demand of compute time, memory, storage and I/O along with the need of their effective management. The most resource intensive modules of the algorithm are traveltime calculations and migration summation which exhibit an inherent trade off between compute time and other resources. The parallelization strategy of the algorithm largely depends on the storage of calculated traveltimes and its feeding mechanism to the migration process. The presented work is an extension of our previous work, wherein a 3D Kirchhoff depth migration application for multicore CPU based parallel system had been developed. Recently, we have worked on improving parallel performance of this application by re-designing the parallelization approach. The new algorithm is capable to efficiently migrate both prestack and poststack 3D data. It exhibits flexibility for migrating large number of traces within the available node memory and with minimal requirement of storage, I/O and inter-node communication. The resultant application is tested using 3D Overthrust data on PARAM Yuva II, which is a Xeon E5-2670 based multicore CPU cluster with 16 cores/node and 64 GB shared memory. Parallel performance of the algorithm is studied using different numerical experiments and the scalability results show striking improvement over its previous version. An impressive 49.05X speedup with 76.64% efficiency is achieved for 3D prestack data and 32.00X speedup with 50.00% efficiency for 3D poststack data, using 64 nodes. The results also demonstrate the effectiveness and robustness of the improved algorithm with high scalability and efficiency on a multicore CPU cluster.
A Palmprint Recognition Algorithm Using Phase-Only Correlation
NASA Astrophysics Data System (ADS)
Ito, Koichi; Aoki, Takafumi; Nakajima, Hiroshi; Kobayashi, Koji; Higuchi, Tatsuo
This paper presents a palmprint recognition algorithm using Phase-Only Correlation (POC). The use of phase components in 2D (two-dimensional) discrete Fourier transforms of palmprint images makes it possible to achieve highly robust image registration and matching. In the proposed algorithm, POC is used to align scaling, rotation and translation between two palmprint images, and evaluate similarity between them. Experimental evaluation using a palmprint image database clearly demonstrates efficient matching performance of the proposed algorithm.
a Hadoop-Based Algorithm of Generating dem Grid from Point Cloud Data
NASA Astrophysics Data System (ADS)
Jian, X.; Xiao, X.; Chengfang, H.; Zhizhong, Z.; Zhaohui, W.; Dengzhong, Z.
2015-04-01
Airborne LiDAR technology has proven to be the most powerful tools to obtain high-density, high-accuracy and significantly detailed surface information of terrain and surface objects within a short time, and from which the Digital Elevation Model of high quality can be extracted. Point cloud data generated from the pre-processed data should be classified by segmentation algorithms, so as to differ the terrain points from disorganized points, then followed by a procedure of interpolating the selected points to turn points into DEM data. The whole procedure takes a long time and huge computing resource due to high-density, that is concentrated on by a number of researches. Hadoop is a distributed system infrastructure developed by the Apache Foundation, which contains a highly fault-tolerant distributed file system (HDFS) with high transmission rate and a parallel programming model (Map/Reduce). Such a framework is appropriate for DEM generation algorithms to improve efficiency. Point cloud data of Dongting Lake acquired by Riegl LMS-Q680i laser scanner was utilized as the original data to generate DEM by a Hadoop-based algorithms implemented in Linux, then followed by another traditional procedure programmed by C++ as the comparative experiment. Then the algorithm's efficiency, coding complexity, and performance-cost ratio were discussed for the comparison. The results demonstrate that the algorithm's speed depends on size of point set and density of DEM grid, and the non-Hadoop implementation can achieve a high performance when memory is big enough, but the multiple Hadoop implementation can achieve a higher performance-cost ratio, while point set is of vast quantities on the other hand.
Sparse RNA folding revisited: space-efficient minimum free energy structure prediction.
Will, Sebastian; Jabbari, Hosna
2016-01-01
RNA secondary structure prediction by energy minimization is the central computational tool for the analysis of structural non-coding RNAs and their interactions. Sparsification has been successfully applied to improve the time efficiency of various structure prediction algorithms while guaranteeing the same result; however, for many such folding problems, space efficiency is of even greater concern, particularly for long RNA sequences. So far, space-efficient sparsified RNA folding with fold reconstruction was solved only for simple base-pair-based pseudo-energy models. Here, we revisit the problem of space-efficient free energy minimization. Whereas the space-efficient minimization of the free energy has been sketched before, the reconstruction of the optimum structure has not even been discussed. We show that this reconstruction is not possible in trivial extension of the method for simple energy models. Then, we present the time- and space-efficient sparsified free energy minimization algorithm SparseMFEFold that guarantees MFE structure prediction. In particular, this novel algorithm provides efficient fold reconstruction based on dynamically garbage-collected trace arrows. The complexity of our algorithm depends on two parameters, the number of candidates Z and the number of trace arrows T; both are bounded by [Formula: see text], but are typically much smaller. The time complexity of RNA folding is reduced from [Formula: see text] to [Formula: see text]; the space complexity, from [Formula: see text] to [Formula: see text]. Our empirical results show more than 80 % space savings over RNAfold [Vienna RNA package] on the long RNAs from the RNA STRAND database (≥2500 bases). The presented technique is intentionally generalizable to complex prediction algorithms; due to their high space demands, algorithms like pseudoknot prediction and RNA-RNA-interaction prediction are expected to profit even stronger than "standard" MFE folding. SparseMFEFold is free software, available at http://www.bioinf.uni-leipzig.de/~will/Software/SparseMFEFold.
Vecharynski, Eugene; Yang, Chao; Pask, John E.
2015-02-25
Here, we present an iterative algorithm for computing an invariant subspace associated with the algebraically smallest eigenvalues of a large sparse or structured Hermitian matrix A. We are interested in the case in which the dimension of the invariant subspace is large (e.g., over several hundreds or thousands) even though it may still be small relative to the dimension of A. These problems arise from, for example, density functional theory (DFT) based electronic structure calculations for complex materials. The key feature of our algorithm is that it performs fewer Rayleigh–Ritz calculations compared to existing algorithms such as the locally optimalmore » block preconditioned conjugate gradient or the Davidson algorithm. It is a block algorithm, and hence can take advantage of efficient BLAS3 operations and be implemented with multiple levels of concurrency. We discuss a number of practical issues that must be addressed in order to implement the algorithm efficiently on a high performance computer.« less
NASA Technical Reports Server (NTRS)
Chen, C. P.; Wu, S. T.
1992-01-01
The objective of this investigation has been to develop an algorithm (or algorithms) for the improvement of the accuracy and efficiency of the computer fluid dynamics (CFD) models to study the fundamental physics of combustion chamber flows, which are necessary ultimately for the design of propulsion systems such as SSME and STME. During this three year study (May 19, 1978 - May 18, 1992), a unique algorithm was developed for all speed flows. This newly developed algorithm basically consists of two pressure-based algorithms (i.e. PISOC and MFICE). This PISOC is a non-iterative scheme and the FICE is an iterative scheme where PISOC has the characteristic advantages on low and high speed flows and the modified FICE has shown its efficiency and accuracy to compute the flows in the transonic region. A new algorithm is born from a combination of these two algorithms. This newly developed algorithm has general application in both time-accurate and steady state flows, and also was tested extensively for various flow conditions, such as turbulent flows, chemically reacting flows, and multiphase flows.
Biological network motif detection and evaluation
2011-01-01
Background Molecular level of biological data can be constructed into system level of data as biological networks. Network motifs are defined as over-represented small connected subgraphs in networks and they have been used for many biological applications. Since network motif discovery involves computationally challenging processes, previous algorithms have focused on computational efficiency. However, we believe that the biological quality of network motifs is also very important. Results We define biological network motifs as biologically significant subgraphs and traditional network motifs are differentiated as structural network motifs in this paper. We develop five algorithms, namely, EDGEGO-BNM, EDGEBETWEENNESS-BNM, NMF-BNM, NMFGO-BNM and VOLTAGE-BNM, for efficient detection of biological network motifs, and introduce several evaluation measures including motifs included in complex, motifs included in functional module and GO term clustering score in this paper. Experimental results show that EDGEGO-BNM and EDGEBETWEENNESS-BNM perform better than existing algorithms and all of our algorithms are applicable to find structural network motifs as well. Conclusion We provide new approaches to finding network motifs in biological networks. Our algorithms efficiently detect biological network motifs and further improve existing algorithms to find high quality structural network motifs, which would be impossible using existing algorithms. The performances of the algorithms are compared based on our new evaluation measures in biological contexts. We believe that our work gives some guidelines of network motifs research for the biological networks. PMID:22784624
NASA Astrophysics Data System (ADS)
Shayanfar, Mohsen Ali; Barkhordari, Mohammad Ali; Roudak, Mohammad Amin
2017-06-01
Monte Carlo simulation (MCS) is a useful tool for computation of probability of failure in reliability analysis. However, the large number of required random samples makes it time-consuming. Response surface method (RSM) is another common method in reliability analysis. Although RSM is widely used for its simplicity, it cannot be trusted in highly nonlinear problems due to its linear nature. In this paper, a new efficient algorithm, employing the combination of importance sampling, as a class of MCS, and RSM is proposed. In the proposed algorithm, analysis starts with importance sampling concepts and using a represented two-step updating rule of design point. This part finishes after a small number of samples are generated. Then RSM starts to work using Bucher experimental design, with the last design point and a represented effective length as the center point and radius of Bucher's approach, respectively. Through illustrative numerical examples, simplicity and efficiency of the proposed algorithm and the effectiveness of the represented rules are shown.
Efficient geometric rectification techniques for spectral analysis algorithm
NASA Technical Reports Server (NTRS)
Chang, C. Y.; Pang, S. S.; Curlander, J. C.
1992-01-01
The spectral analysis algorithm is a viable technique for processing synthetic aperture radar (SAR) data in near real time throughput rates by trading the image resolution. One major challenge of the spectral analysis algorithm is that the output image, often referred to as the range-Doppler image, is represented in the iso-range and iso-Doppler lines, a curved grid format. This phenomenon is known to be the fanshape effect. Therefore, resampling is required to convert the range-Doppler image into a rectangular grid format before the individual images can be overlaid together to form seamless multi-look strip imagery. An efficient algorithm for geometric rectification of the range-Doppler image is presented. The proposed algorithm, realized in two one-dimensional resampling steps, takes into consideration the fanshape phenomenon of the range-Doppler image as well as the high squint angle and updates of the cross-track and along-track Doppler parameters. No ground reference points are required.
Efficient implementation of parallel three-dimensional FFT on clusters of PCs
NASA Astrophysics Data System (ADS)
Takahashi, Daisuke
2003-05-01
In this paper, we propose a high-performance parallel three-dimensional fast Fourier transform (FFT) algorithm on clusters of PCs. The three-dimensional FFT algorithm can be altered into a block three-dimensional FFT algorithm to reduce the number of cache misses. We show that the block three-dimensional FFT algorithm improves performance by utilizing the cache memory effectively. We use the block three-dimensional FFT algorithm to implement the parallel three-dimensional FFT algorithm. We succeeded in obtaining performance of over 1.3 GFLOPS on an 8-node dual Pentium III 1 GHz PC SMP cluster.
Subband Image Coding with Jointly Optimized Quantizers
NASA Technical Reports Server (NTRS)
Kossentini, Faouzi; Chung, Wilson C.; Smith Mark J. T.
1995-01-01
An iterative design algorithm for the joint design of complexity- and entropy-constrained subband quantizers and associated entropy coders is proposed. Unlike conventional subband design algorithms, the proposed algorithm does not require the use of various bit allocation algorithms. Multistage residual quantizers are employed here because they provide greater control of the complexity-performance tradeoffs, and also because they allow efficient and effective high-order statistical modeling. The resulting subband coder exploits statistical dependencies within subbands, across subbands, and across stages, mainly through complexity-constrained high-order entropy coding. Experimental results demonstrate that the complexity-rate-distortion performance of the new subband coder is exceptional.
3-D CSEM data inversion algorithm based on simultaneously active multiple transmitters concept
NASA Astrophysics Data System (ADS)
Dehiya, Rahul; Singh, Arun; Gupta, Pravin Kumar; Israil, Mohammad
2017-05-01
We present an algorithm for efficient 3-D inversion of marine controlled-source electromagnetic data. The efficiency is achieved by exploiting the redundancy in data. The data redundancy is reduced by compressing the data through stacking of the response of transmitters which are in close proximity. This stacking is equivalent to synthesizing the data as if the multiple transmitters are simultaneously active. The redundancy in data, arising due to close transmitter spacing, has been studied through singular value analysis of the Jacobian formed in 1-D inversion. This study reveals that the transmitter spacing of 100 m, typically used in marine data acquisition, does result in redundancy in the data. In the proposed algorithm, the data are compressed through stacking which leads to both computational advantage and reduction in noise. The performance of the algorithm for noisy data is demonstrated through the studies on two types of noise, viz., uncorrelated additive noise and correlated non-additive noise. It is observed that in case of uncorrelated additive noise, up to a moderately high (10 percent) noise level the algorithm addresses the noise as effectively as the traditional full data inversion. However, when the noise level in the data is high (20 percent), the algorithm outperforms the traditional full data inversion in terms of data misfit. Similar results are obtained in case of correlated non-additive noise and the algorithm performs better if the level of noise is high. The inversion results of a real field data set are also presented to demonstrate the robustness of the algorithm. The significant computational advantage in all cases presented makes this algorithm a better choice.
Dimension-Factorized Range Migration Algorithm for Regularly Distributed Array Imaging
Guo, Qijia; Wang, Jie; Chang, Tianying
2017-01-01
The two-dimensional planar MIMO array is a popular approach for millimeter wave imaging applications. As a promising practical alternative, sparse MIMO arrays have been devised to reduce the number of antenna elements and transmitting/receiving channels with predictable and acceptable loss in image quality. In this paper, a high precision three-dimensional imaging algorithm is proposed for MIMO arrays of the regularly distributed type, especially the sparse varieties. Termed the Dimension-Factorized Range Migration Algorithm, the new imaging approach factorizes the conventional MIMO Range Migration Algorithm into multiple operations across the sparse dimensions. The thinner the sparse dimensions of the array, the more efficient the new algorithm will be. Advantages of the proposed approach are demonstrated by comparison with the conventional MIMO Range Migration Algorithm and its non-uniform fast Fourier transform based variant in terms of all the important characteristics of the approaches, especially the anti-noise capability. The computation cost is analyzed as well to evaluate the efficiency quantitatively. PMID:29113083
Hardware architecture design of image restoration based on time-frequency domain computation
NASA Astrophysics Data System (ADS)
Wen, Bo; Zhang, Jing; Jiao, Zipeng
2013-10-01
The image restoration algorithms based on time-frequency domain computation is high maturity and applied widely in engineering. To solve the high-speed implementation of these algorithms, the TFDC hardware architecture is proposed. Firstly, the main module is designed, by analyzing the common processing and numerical calculation. Then, to improve the commonality, the iteration control module is planed for iterative algorithms. In addition, to reduce the computational cost and memory requirements, the necessary optimizations are suggested for the time-consuming module, which include two-dimensional FFT/IFFT and the plural calculation. Eventually, the TFDC hardware architecture is adopted for hardware design of real-time image restoration system. The result proves that, the TFDC hardware architecture and its optimizations can be applied to image restoration algorithms based on TFDC, with good algorithm commonality, hardware realizability and high efficiency.
Approximate Algorithms for Computing Spatial Distance Histograms with Accuracy Guarantees
Grupcev, Vladimir; Yuan, Yongke; Tu, Yi-Cheng; Huang, Jin; Chen, Shaoping; Pandit, Sagar; Weng, Michael
2014-01-01
Particle simulation has become an important research tool in many scientific and engineering fields. Data generated by such simulations impose great challenges to database storage and query processing. One of the queries against particle simulation data, the spatial distance histogram (SDH) query, is the building block of many high-level analytics, and requires quadratic time to compute using a straightforward algorithm. Previous work has developed efficient algorithms that compute exact SDHs. While beating the naive solution, such algorithms are still not practical in processing SDH queries against large-scale simulation data. In this paper, we take a different path to tackle this problem by focusing on approximate algorithms with provable error bounds. We first present a solution derived from the aforementioned exact SDH algorithm, and this solution has running time that is unrelated to the system size N. We also develop a mathematical model to analyze the mechanism that leads to errors in the basic approximate algorithm. Our model provides insights on how the algorithm can be improved to achieve higher accuracy and efficiency. Such insights give rise to a new approximate algorithm with improved time/accuracy tradeoff. Experimental results confirm our analysis. PMID:24693210
Lim, Byoung-Gyun; Woo, Jea-Choon; Lee, Hee-Young; Kim, Young-Soo
2008-01-01
Synthetic wideband waveforms (SWW) combine a stepped frequency CW waveform and a chirp signal waveform to achieve high range resolution without requiring a large bandwidth or the consequent very high sampling rate. If an efficient algorithm like the range-Doppler algorithm (RDA) is used to acquire the SAR images for synthetic wideband signals, errors occur due to approximations, so the images may not show the best possible result. This paper proposes a modified subpulse SAR processing algorithm for synthetic wideband signals which is based on RDA. An experiment with an automobile-based SAR system showed that the proposed algorithm is quite accurate with a considerable improvement in resolution and quality of the obtained SAR image. PMID:27873984
Multi-Complementary Model for Long-Term Tracking
Zhang, Deng; Zhang, Junchang; Xia, Chenyang
2018-01-01
In recent years, video target tracking algorithms have been widely used. However, many tracking algorithms do not achieve satisfactory performance, especially when dealing with problems such as object occlusions, background clutters, motion blur, low illumination color images, and sudden illumination changes in real scenes. In this paper, we incorporate an object model based on contour information into a Staple tracker that combines the correlation filter model and color model to greatly improve the tracking robustness. Since each model is responsible for tracking specific features, the three complementary models combine for more robust tracking. In addition, we propose an efficient object detection model with contour and color histogram features, which has good detection performance and better detection efficiency compared to the traditional target detection algorithm. Finally, we optimize the traditional scale calculation, which greatly improves the tracking execution speed. We evaluate our tracker on the Object Tracking Benchmarks 2013 (OTB-13) and Object Tracking Benchmarks 2015 (OTB-15) benchmark datasets. With the OTB-13 benchmark datasets, our algorithm is improved by 4.8%, 9.6%, and 10.9% on the success plots of OPE, TRE and SRE, respectively, in contrast to another classic LCT (Long-term Correlation Tracking) algorithm. On the OTB-15 benchmark datasets, when compared with the LCT algorithm, our algorithm achieves 10.4%, 12.5%, and 16.1% improvement on the success plots of OPE, TRE, and SRE, respectively. At the same time, it needs to be emphasized that, due to the high computational efficiency of the color model and the object detection model using efficient data structures, and the speed advantage of the correlation filters, our tracking algorithm could still achieve good tracking speed. PMID:29425170
An efficient interpolation filter VLSI architecture for HEVC standard
NASA Astrophysics Data System (ADS)
Zhou, Wei; Zhou, Xin; Lian, Xiaocong; Liu, Zhenyu; Liu, Xiaoxiang
2015-12-01
The next-generation video coding standard of High-Efficiency Video Coding (HEVC) is especially efficient for coding high-resolution video such as 8K-ultra-high-definition (UHD) video. Fractional motion estimation in HEVC presents a significant challenge in clock latency and area cost as it consumes more than 40 % of the total encoding time and thus results in high computational complexity. With aims at supporting 8K-UHD video applications, an efficient interpolation filter VLSI architecture for HEVC is proposed in this paper. Firstly, a new interpolation filter algorithm based on the 8-pixel interpolation unit is proposed in this paper. It can save 19.7 % processing time on average with acceptable coding quality degradation. Based on the proposed algorithm, an efficient interpolation filter VLSI architecture, composed of a reused data path of interpolation, an efficient memory organization, and a reconfigurable pipeline interpolation filter engine, is presented to reduce the implement hardware area and achieve high throughput. The final VLSI implementation only requires 37.2k gates in a standard 90-nm CMOS technology at an operating frequency of 240 MHz. The proposed architecture can be reused for either half-pixel interpolation or quarter-pixel interpolation, which can reduce the area cost for about 131,040 bits RAM. The processing latency of our proposed VLSI architecture can support the real-time processing of 4:2:0 format 7680 × 4320@78fps video sequences.
Qi, Hong; Qiao, Yao-Bin; Ren, Ya-Tao; Shi, Jing-Wen; Zhang, Ze-Yu; Ruan, Li-Ming
2016-10-17
Sequential quadratic programming (SQP) is used as an optimization algorithm to reconstruct the optical parameters based on the time-domain radiative transfer equation (TD-RTE). Numerous time-resolved measurement signals are obtained using the TD-RTE as forward model. For a high computational efficiency, the gradient of objective function is calculated using an adjoint equation technique. SQP algorithm is employed to solve the inverse problem and the regularization term based on the generalized Gaussian Markov random field (GGMRF) model is used to overcome the ill-posed problem. Simulated results show that the proposed reconstruction scheme performs efficiently and accurately.
Tracking fronts in solutions of the shallow-water equations
NASA Astrophysics Data System (ADS)
Bennett, Andrew F.; Cummins, Patrick F.
1988-02-01
A front-tracking algorithm of Chern et al. (1986) is tested on the shallow-water equations, using the Parrett and Cullen (1984) and Williams and Hori (1970) initial state, consisting of smooth finite amplitude waves depending on one space dimension alone. At high resolution the solution is almost indistinguishable from that obtained with the Glimm algorithm. The latter is known to converge to the true frontal solution, but is 20 times less efficient at the same resolution. The solutions obtained using the front-tracking algorithm at 8 times coarser resolution are quite acceptable, indicating a very substantial gain in efficiency, which encourages application in realistic ocean models possessing two or three space dimensions.
Computational efficiency for the surface renewal method
NASA Astrophysics Data System (ADS)
Kelley, Jason; Higgins, Chad
2018-04-01
Measuring surface fluxes using the surface renewal (SR) method requires programmatic algorithms for tabulation, algebraic calculation, and data quality control. A number of different methods have been published describing automated calibration of SR parameters. Because the SR method utilizes high-frequency (10 Hz+) measurements, some steps in the flux calculation are computationally expensive, especially when automating SR to perform many iterations of these calculations. Several new algorithms were written that perform the required calculations more efficiently and rapidly, and that tested for sensitivity to length of flux averaging period, ability to measure over a large range of lag timescales, and overall computational efficiency. These algorithms utilize signal processing techniques and algebraic simplifications that demonstrate simple modifications that dramatically improve computational efficiency. The results here complement efforts by other authors to standardize a robust and accurate computational SR method. Increased speed of computation time grants flexibility to implementing the SR method, opening new avenues for SR to be used in research, for applied monitoring, and in novel field deployments.
NASA Astrophysics Data System (ADS)
Qyyum, Muhammad Abdul; Long, Nguyen Van Duc; Minh, Le Quang; Lee, Moonyong
2018-01-01
Design optimization of the single mixed refrigerant (SMR) natural gas liquefaction (LNG) process involves highly non-linear interactions between decision variables, constraints, and the objective function. These non-linear interactions lead to an irreversibility, which deteriorates the energy efficiency of the LNG process. In this study, a simple and highly efficient hybrid modified coordinate descent (HMCD) algorithm was proposed to cope with the optimization of the natural gas liquefaction process. The single mixed refrigerant process was modeled in Aspen Hysys® and then connected to a Microsoft Visual Studio environment. The proposed optimization algorithm provided an improved result compared to the other existing methodologies to find the optimal condition of the complex mixed refrigerant natural gas liquefaction process. By applying the proposed optimization algorithm, the SMR process can be designed with the 0.2555 kW specific compression power which is equivalent to 44.3% energy saving as compared to the base case. Furthermore, in terms of coefficient of performance (COP), it can be enhanced up to 34.7% as compared to the base case. The proposed optimization algorithm provides a deep understanding of the optimization of the liquefaction process in both technical and numerical perspectives. In addition, the HMCD algorithm can be employed to any mixed refrigerant based liquefaction process in the natural gas industry.
NASA Astrophysics Data System (ADS)
Zheng, Yan
2015-03-01
Internet of things (IoT), focusing on providing users with information exchange and intelligent control, attracts a lot of attention of researchers from all over the world since the beginning of this century. IoT is consisted of large scale of sensor nodes and data processing units, and the most important features of IoT can be illustrated as energy confinement, efficient communication and high redundancy. With the sensor nodes increment, the communication efficiency and the available communication band width become bottle necks. Many research work is based on the instance which the number of joins is less. However, it is not proper to the increasing multi-join query in whole internet of things. To improve the communication efficiency between parallel units in the distributed sensor network, this paper proposed parallel query optimization algorithm based on distribution attributes cost graph. The storage information relations and the network communication cost are considered in this algorithm, and an optimized information changing rule is established. The experimental result shows that the algorithm has good performance, and it would effectively use the resource of each node in the distributed sensor network. Therefore, executive efficiency of multi-join query between different nodes could be improved.
A hierarchical word-merging algorithm with class separability measure.
Wang, Lei; Zhou, Luping; Shen, Chunhua; Liu, Lingqiao; Liu, Huan
2014-03-01
In image recognition with the bag-of-features model, a small-sized visual codebook is usually preferred to obtain a low-dimensional histogram representation and high computational efficiency. Such a visual codebook has to be discriminative enough to achieve excellent recognition performance. To create a compact and discriminative codebook, in this paper we propose to merge the visual words in a large-sized initial codebook by maximally preserving class separability. We first show that this results in a difficult optimization problem. To deal with this situation, we devise a suboptimal but very efficient hierarchical word-merging algorithm, which optimally merges two words at each level of the hierarchy. By exploiting the characteristics of the class separability measure and designing a novel indexing structure, the proposed algorithm can hierarchically merge 10,000 visual words down to two words in merely 90 seconds. Also, to show the properties of the proposed algorithm and reveal its advantages, we conduct detailed theoretical analysis to compare it with another hierarchical word-merging algorithm that maximally preserves mutual information, obtaining interesting findings. Experimental studies are conducted to verify the effectiveness of the proposed algorithm on multiple benchmark data sets. As shown, it can efficiently produce more compact and discriminative codebooks than the state-of-the-art hierarchical word-merging algorithms, especially when the size of the codebook is significantly reduced.
Wang, Jun; Zhou, Bihua; Zhou, Shudao
2016-01-01
This paper proposes an improved cuckoo search (ICS) algorithm to establish the parameters of chaotic systems. In order to improve the optimization capability of the basic cuckoo search (CS) algorithm, the orthogonal design and simulated annealing operation are incorporated in the CS algorithm to enhance the exploitation search ability. Then the proposed algorithm is used to establish parameters of the Lorenz chaotic system and Chen chaotic system under the noiseless and noise condition, respectively. The numerical results demonstrate that the algorithm can estimate parameters with high accuracy and reliability. Finally, the results are compared with the CS algorithm, genetic algorithm, and particle swarm optimization algorithm, and the compared results demonstrate the method is energy-efficient and superior. PMID:26880874
NASA Astrophysics Data System (ADS)
Han, Yishi; Luo, Zhixiao; Wang, Jianhua; Min, Zhixuan; Qin, Xinyu; Sun, Yunlong
2014-09-01
In general, context-based adaptive variable length coding (CAVLC) decoding in H.264/AVC standard requires frequent access to the unstructured variable length coding tables (VLCTs) and significant memory accesses are consumed. Heavy memory accesses will cause high power consumption and time delays, which are serious problems for applications in portable multimedia devices. We propose a method for high-efficiency CAVLC decoding by using a program instead of all the VLCTs. The decoded codeword from VLCTs can be obtained without any table look-up and memory access. The experimental results show that the proposed algorithm achieves 100% memory access saving and 40% decoding time saving without degrading video quality. Additionally, the proposed algorithm shows a better performance compared with conventional CAVLC decoding, such as table look-up by sequential search, table look-up by binary search, Moon's method, and Kim's method.
Global Contrast Based Salient Region Detection.
Cheng, Ming-Ming; Mitra, Niloy J; Huang, Xiaolei; Torr, Philip H S; Hu, Shi-Min
2015-03-01
Automatic estimation of salient object regions across images, without any prior assumption or knowledge of the contents of the corresponding scenes, enhances many computer vision and computer graphics applications. We introduce a regional contrast based salient object detection algorithm, which simultaneously evaluates global contrast differences and spatial weighted coherence scores. The proposed algorithm is simple, efficient, naturally multi-scale, and produces full-resolution, high-quality saliency maps. These saliency maps are further used to initialize a novel iterative version of GrabCut, namely SaliencyCut, for high quality unsupervised salient object segmentation. We extensively evaluated our algorithm using traditional salient object detection datasets, as well as a more challenging Internet image dataset. Our experimental results demonstrate that our algorithm consistently outperforms 15 existing salient object detection and segmentation methods, yielding higher precision and better recall rates. We also show that our algorithm can be used to efficiently extract salient object masks from Internet images, enabling effective sketch-based image retrieval (SBIR) via simple shape comparisons. Despite such noisy internet images, where the saliency regions are ambiguous, our saliency guided image retrieval achieves a superior retrieval rate compared with state-of-the-art SBIR methods, and additionally provides important target object region information.
Zhou, Hong; Zhou, Michael; Li, Daisy; Manthey, Joseph; Lioutikova, Ekaterina; Wang, Hong; Zeng, Xiao
2017-11-17
The beauty and power of the genome editing mechanism, CRISPR Cas9 endonuclease system, lies in the fact that it is RNA-programmable such that Cas9 can be guided to any genomic loci complementary to a 20-nt RNA, single guide RNA (sgRNA), to cleave double stranded DNA, allowing the introduction of wanted mutations. Unfortunately, it has been reported repeatedly that the sgRNA can also guide Cas9 to off-target sites where the DNA sequence is homologous to sgRNA. Using human genome and Streptococcus pyogenes Cas9 (SpCas9) as an example, this article mathematically analyzed the probabilities of off-target homologies of sgRNAs and discovered that for large genome size such as human genome, potential off-target homologies are inevitable for sgRNA selection. A highly efficient computationl algorithm was developed for whole genome sgRNA design and off-target homology searches. By means of a dynamically constructed sequence-indexed database and a simplified sequence alignment method, this algorithm achieves very high efficiency while guaranteeing the identification of all existing potential off-target homologies. Via this algorithm, 1,876,775 sgRNAs were designed for the 19,153 human mRNA genes and only two sgRNAs were found to be free of off-target homology. By means of the novel and efficient sgRNA homology search algorithm introduced in this article, genome wide sgRNA design and off-target analysis were conducted and the results confirmed the mathematical analysis that for a sgRNA sequence, it is almost impossible to escape potential off-target homologies. Future innovations on the CRISPR Cas9 gene editing technology need to focus on how to eliminate the Cas9 off-target activity.
A High-Performance Genetic Algorithm: Using Traveling Salesman Problem as a Case
Tsai, Chun-Wei; Tseng, Shih-Pang; Yang, Chu-Sing
2014-01-01
This paper presents a simple but efficient algorithm for reducing the computation time of genetic algorithm (GA) and its variants. The proposed algorithm is motivated by the observation that genes common to all the individuals of a GA have a high probability of surviving the evolution and ending up being part of the final solution; as such, they can be saved away to eliminate the redundant computations at the later generations of a GA. To evaluate the performance of the proposed algorithm, we use it not only to solve the traveling salesman problem but also to provide an extensive analysis on the impact it may have on the quality of the end result. Our experimental results indicate that the proposed algorithm can significantly reduce the computation time of GA and GA-based algorithms while limiting the degradation of the quality of the end result to a very small percentage compared to traditional GA. PMID:24892038
A high-performance genetic algorithm: using traveling salesman problem as a case.
Tsai, Chun-Wei; Tseng, Shih-Pang; Chiang, Ming-Chao; Yang, Chu-Sing; Hong, Tzung-Pei
2014-01-01
This paper presents a simple but efficient algorithm for reducing the computation time of genetic algorithm (GA) and its variants. The proposed algorithm is motivated by the observation that genes common to all the individuals of a GA have a high probability of surviving the evolution and ending up being part of the final solution; as such, they can be saved away to eliminate the redundant computations at the later generations of a GA. To evaluate the performance of the proposed algorithm, we use it not only to solve the traveling salesman problem but also to provide an extensive analysis on the impact it may have on the quality of the end result. Our experimental results indicate that the proposed algorithm can significantly reduce the computation time of GA and GA-based algorithms while limiting the degradation of the quality of the end result to a very small percentage compared to traditional GA.
Efficient experimental design of high-fidelity three-qubit quantum gates via genetic programming
NASA Astrophysics Data System (ADS)
Devra, Amit; Prabhu, Prithviraj; Singh, Harpreet; Arvind; Dorai, Kavita
2018-03-01
We have designed efficient quantum circuits for the three-qubit Toffoli (controlled-controlled-NOT) and the Fredkin (controlled-SWAP) gate, optimized via genetic programming methods. The gates thus obtained were experimentally implemented on a three-qubit NMR quantum information processor, with a high fidelity. Toffoli and Fredkin gates in conjunction with the single-qubit Hadamard gates form a universal gate set for quantum computing and are an essential component of several quantum algorithms. Genetic algorithms are stochastic search algorithms based on the logic of natural selection and biological genetics and have been widely used for quantum information processing applications. We devised a new selection mechanism within the genetic algorithm framework to select individuals from a population. We call this mechanism the "Luck-Choose" mechanism and were able to achieve faster convergence to a solution using this mechanism, as compared to existing selection mechanisms. The optimization was performed under the constraint that the experimentally implemented pulses are of short duration and can be implemented with high fidelity. We demonstrate the advantage of our pulse sequences by comparing our results with existing experimental schemes and other numerical optimization methods.
Scheduling for energy and reliability management on multiprocessor real-time systems
NASA Astrophysics Data System (ADS)
Qi, Xuan
Scheduling algorithms for multiprocessor real-time systems have been studied for years with many well-recognized algorithms proposed. However, it is still an evolving research area and many problems remain open due to their intrinsic complexities. With the emergence of multicore processors, it is necessary to re-investigate the scheduling problems and design/develop efficient algorithms for better system utilization, low scheduling overhead, high energy efficiency, and better system reliability. Focusing cluster schedulings with optimal global schedulers, we study the utilization bound and scheduling overhead for a class of cluster-optimal schedulers. Then, taking energy/power consumption into consideration, we developed energy-efficient scheduling algorithms for real-time systems, especially for the proliferating embedded systems with limited energy budget. As the commonly deployed energy-saving technique (e.g. dynamic voltage frequency scaling (DVFS)) will significantly affect system reliability, we study schedulers that have intelligent mechanisms to recuperate system reliability to satisfy the quality assurance requirements. Extensive simulation is conducted to evaluate the performance of the proposed algorithms on reduction of scheduling overhead, energy saving, and reliability improvement. The simulation results show that the proposed reliability-aware power management schemes could preserve the system reliability while still achieving substantial energy saving.
Three-dimensional near-field MIMO array imaging using range migration techniques.
Zhuge, Xiaodong; Yarovoy, Alexander G
2012-06-01
This paper presents a 3-D near-field imaging algorithm that is formulated for 2-D wideband multiple-input-multiple-output (MIMO) imaging array topology. The proposed MIMO range migration technique performs the image reconstruction procedure in the frequency-wavenumber domain. The algorithm is able to completely compensate the curvature of the wavefront in the near-field through a specifically defined interpolation process and provides extremely high computational efficiency by the application of the fast Fourier transform. The implementation aspects of the algorithm and the sampling criteria of a MIMO aperture are discussed. The image reconstruction performance and computational efficiency of the algorithm are demonstrated both with numerical simulations and measurements using 2-D MIMO arrays. Real-time 3-D near-field imaging can be achieved with a real-aperture array by applying the proposed MIMO range migration techniques.
NASA Technical Reports Server (NTRS)
Gedney, Stephen D.; Lansing, Faiza
1993-01-01
The generalized Yee-algorithm is presented for the temporal full-wave analysis of planar microstrip devices. This algorithm has the significant advantage over the traditional Yee-algorithm in that it is based on unstructured and irregular grids. The robustness of the generalized Yee-algorithm is that structures that contain curved conductors or complex three-dimensional geometries can be more accurately, and much more conveniently modeled using standard automatic grid generation techniques. This generalized Yee-algorithm is based on the the time-marching solution of the discrete form of Maxwell's equations in their integral form. To this end, the electric and magnetic fields are discretized over a dual, irregular, and unstructured grid. The primary grid is assumed to be composed of general fitted polyhedra distributed throughout the volume. The secondary grid (or dual grid) is built up of the closed polyhedra whose edges connect the centroid's of adjacent primary cells, penetrating shared faces. Faraday's law and Ampere's law are used to update the fields normal to the primary and secondary grid faces, respectively. Subsequently, a correction scheme is introduced to project the normal fields onto the grid edges. It is shown that this scheme is stable, maintains second-order accuracy, and preserves the divergenceless nature of the flux densities. Finally, for computational efficiency the algorithm is structured as a series of sparse matrix-vector multiplications. Based on this scheme, the generalized Yee-algorithm has been implemented on vector and parallel high performance computers in a highly efficient manner.
NASA Astrophysics Data System (ADS)
Work, Paul R.
1991-12-01
This thesis investigates the parallelization of existing serial programs in computational electromagnetics for use in a parallel environment. Existing algorithms for calculating the radar cross section of an object are covered, and a ray-tracing code is chosen for implementation on a parallel machine. Current parallel architectures are introduced and a suitable parallel machine is selected for the implementation of the chosen ray-tracing algorithm. The standard techniques for the parallelization of serial codes are discussed, including load balancing and decomposition considerations, and appropriate methods for the parallelization effort are selected. A load balancing algorithm is modified to increase the efficiency of the application, and a high level design of the structure of the serial program is presented. A detailed design of the modifications for the parallel implementation is also included, with both the high level and the detailed design specified in a high level design language called UNITY. The correctness of the design is proven using UNITY and standard logic operations. The theoretical and empirical results show that it is possible to achieve an efficient parallel application for a serial computational electromagnetic program where the characteristics of the algorithm and the target architecture critically influence the development of such an implementation.
Aggregated Indexing of Biomedical Time Series Data
Woodbridge, Jonathan; Mortazavi, Bobak; Sarrafzadeh, Majid; Bui, Alex A.T.
2016-01-01
Remote and wearable medical sensing has the potential to create very large and high dimensional datasets. Medical time series databases must be able to efficiently store, index, and mine these datasets to enable medical professionals to effectively analyze data collected from their patients. Conventional high dimensional indexing methods are a two stage process. First, a superset of the true matches is efficiently extracted from the database. Second, supersets are pruned by comparing each of their objects to the query object and rejecting any objects falling outside a predetermined radius. This pruning stage heavily dominates the computational complexity of most conventional search algorithms. Therefore, indexing algorithms can be significantly improved by reducing the amount of pruning. This paper presents an online algorithm to aggregate biomedical times series data to significantly reduce the search space (index size) without compromising the quality of search results. This algorithm is built on the observation that biomedical time series signals are composed of cyclical and often similar patterns. This algorithm takes in a stream of segments and groups them to highly concentrated collections. Locality Sensitive Hashing (LSH) is used to reduce the overall complexity of the algorithm, allowing it to run online. The output of this aggregation is used to populate an index. The proposed algorithm yields logarithmic growth of the index (with respect to the total number of objects) while keeping sensitivity and specificity simultaneously above 98%. Both memory and runtime complexities of time series search are improved when using aggregated indexes. In addition, data mining tasks, such as clustering, exhibit runtimes that are orders of magnitudes faster when run on aggregated indexes. PMID:27617298
Multiscale high-order/low-order (HOLO) algorithms and applications
NASA Astrophysics Data System (ADS)
Chacón, L.; Chen, G.; Knoll, D. A.; Newman, C.; Park, H.; Taitano, W.; Willert, J. A.; Womeldorff, G.
2017-02-01
We review the state of the art in the formulation, implementation, and performance of so-called high-order/low-order (HOLO) algorithms for challenging multiscale problems. HOLO algorithms attempt to couple one or several high-complexity physical models (the high-order model, HO) with low-complexity ones (the low-order model, LO). The primary goal of HOLO algorithms is to achieve nonlinear convergence between HO and LO components while minimizing memory footprint and managing the computational complexity in a practical manner. Key to the HOLO approach is the use of the LO representations to address temporal stiffness, effectively accelerating the convergence of the HO/LO coupled system. The HOLO approach is broadly underpinned by the concept of nonlinear elimination, which enables segregation of the HO and LO components in ways that can effectively use heterogeneous architectures. The accuracy and efficiency benefits of HOLO algorithms are demonstrated with specific applications to radiation transport, gas dynamics, plasmas (both Eulerian and Lagrangian formulations), and ocean modeling. Across this broad application spectrum, HOLO algorithms achieve significant accuracy improvements at a fraction of the cost compared to conventional approaches. It follows that HOLO algorithms hold significant potential for high-fidelity system scale multiscale simulations leveraging exascale computing.
Cross-indexing of binary SIFT codes for large-scale image search.
Liu, Zhen; Li, Houqiang; Zhang, Liyan; Zhou, Wengang; Tian, Qi
2014-05-01
In recent years, there has been growing interest in mapping visual features into compact binary codes for applications on large-scale image collections. Encoding high-dimensional data as compact binary codes reduces the memory cost for storage. Besides, it benefits the computational efficiency since the computation of similarity can be efficiently measured by Hamming distance. In this paper, we propose a novel flexible scale invariant feature transform (SIFT) binarization (FSB) algorithm for large-scale image search. The FSB algorithm explores the magnitude patterns of SIFT descriptor. It is unsupervised and the generated binary codes are demonstrated to be dispreserving. Besides, we propose a new searching strategy to find target features based on the cross-indexing in the binary SIFT space and original SIFT space. We evaluate our approach on two publicly released data sets. The experiments on large-scale partial duplicate image retrieval system demonstrate the effectiveness and efficiency of the proposed algorithm.
Efficient Exact Inference With Loss Augmented Objective in Structured Learning.
Bauer, Alexander; Nakajima, Shinichi; Muller, Klaus-Robert
2016-08-19
Structural support vector machine (SVM) is an elegant approach for building complex and accurate models with structured outputs. However, its applicability relies on the availability of efficient inference algorithms--the state-of-the-art training algorithms repeatedly perform inference to compute a subgradient or to find the most violating configuration. In this paper, we propose an exact inference algorithm for maximizing nondecomposable objectives due to special type of a high-order potential having a decomposable internal structure. As an important application, our method covers the loss augmented inference, which enables the slack and margin scaling formulations of structural SVM with a variety of dissimilarity measures, e.g., Hamming loss, precision and recall, Fβ-loss, intersection over union, and many other functions that can be efficiently computed from the contingency table. We demonstrate the advantages of our approach in natural language parsing and sequence segmentation applications.
FPGA implementation of sparse matrix algorithm for information retrieval
NASA Astrophysics Data System (ADS)
Bojanic, Slobodan; Jevtic, Ruzica; Nieto-Taladriz, Octavio
2005-06-01
Information text data retrieval requires a tremendous amount of processing time because of the size of the data and the complexity of information retrieval algorithms. In this paper the solution to this problem is proposed via hardware supported information retrieval algorithms. Reconfigurable computing may adopt frequent hardware modifications through its tailorable hardware and exploits parallelism for a given application through reconfigurable and flexible hardware units. The degree of the parallelism can be tuned for data. In this work we implemented standard BLAS (basic linear algebra subprogram) sparse matrix algorithm named Compressed Sparse Row (CSR) that is showed to be more efficient in terms of storage space requirement and query-processing timing over the other sparse matrix algorithms for information retrieval application. Although inverted index algorithm is treated as the de facto standard for information retrieval for years, an alternative approach to store the index of text collection in a sparse matrix structure gains more attention. This approach performs query processing using sparse matrix-vector multiplication and due to parallelization achieves a substantial efficiency over the sequential inverted index. The parallel implementations of information retrieval kernel are presented in this work targeting the Virtex II Field Programmable Gate Arrays (FPGAs) board from Xilinx. A recent development in scientific applications is the use of FPGA to achieve high performance results. Computational results are compared to implementations on other platforms. The design achieves a high level of parallelism for the overall function while retaining highly optimised hardware within processing unit.
Geostatistical models are appropriate for spatially distributed data measured at irregularly spaced locations. We propose an efficient Markov chain Monte Carlo (MCMC) algorithm for fitting Bayesian geostatistical models with substantial numbers of unknown parameters to sizable...
[GNU Pattern: open source pattern hunter for biological sequences based on SPLASH algorithm].
Xu, Ying; Li, Yi-xue; Kong, Xiang-yin
2005-06-01
To construct a high performance open source software engine based on IBM SPLASH algorithm for later research on pattern discovery. Gpat, which is based on SPLASH algorithm, was developed by using open source software. GNU Pattern (Gpat) software was developped, which efficiently implemented the core part of SPLASH algorithm. Full source code of Gpat was also available for other researchers to modify the program under the GNU license. Gpat is a successful implementation of SPLASH algorithm and can be used as a basic framework for later research on pattern recognition in biological sequences.
High resolution x-ray CMT: Reconstruction methods
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brown, J.K.
This paper qualitatively discusses the primary characteristics of methods for reconstructing tomographic images from a set of projections. These reconstruction methods can be categorized as either {open_quotes}analytic{close_quotes} or {open_quotes}iterative{close_quotes} techniques. Analytic algorithms are derived from the formal inversion of equations describing the imaging process, while iterative algorithms incorporate a model of the imaging process and provide a mechanism to iteratively improve image estimates. Analytic reconstruction algorithms are typically computationally more efficient than iterative methods; however, analytic algorithms are available for a relatively limited set of imaging geometries and situations. Thus, the framework of iterative reconstruction methods is better suited formore » high accuracy, tomographic reconstruction codes.« less
Denoising of polychromatic CT images based on their own noise properties
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kim, Ji Hye; Chang, Yongjin; Ra, Jong Beom, E-mail: jbra@kaist.ac.kr
Purpose: Because of high diagnostic accuracy and fast scan time, computed tomography (CT) has been widely used in various clinical applications. Since the CT scan introduces radiation exposure to patients, however, dose reduction has recently been recognized as an important issue in CT imaging. However, low-dose CT causes an increase of noise in the image and thereby deteriorates the accuracy of diagnosis. In this paper, the authors develop an efficient denoising algorithm for low-dose CT images obtained using a polychromatic x-ray source. The algorithm is based on two steps: (i) estimation of space variant noise statistics, which are uniquely determinedmore » according to the system geometry and scanned object, and (ii) subsequent novel conversion of the estimated noise to Gaussian noise so that an existing high performance Gaussian noise filtering algorithm can be directly applied to CT images with non-Gaussian noise. Methods: For efficient polychromatic CT image denoising, the authors first reconstruct an image with the iterative maximum-likelihood polychromatic algorithm for CT to alleviate the beam-hardening problem. We then estimate the space-variant noise variance distribution on the image domain. Since there are many high performance denoising algorithms available for the Gaussian noise, image denoising can become much more efficient if they can be used. Hence, the authors propose a novel conversion scheme to transform the estimated space-variant noise to near Gaussian noise. In the suggested scheme, the authors first convert the image so that its mean and variance can have a linear relationship, and then produce a Gaussian image via variance stabilizing transform. The authors then apply a block matching 4D algorithm that is optimized for noise reduction of the Gaussian image, and reconvert the result to obtain a final denoised image. To examine the performance of the proposed method, an XCAT phantom simulation and a physical phantom experiment were conducted. Results: Both simulation and experimental results show that, unlike the existing denoising algorithms, the proposed algorithm can effectively reduce the noise over the whole region of CT images while preventing degradation of image resolution. Conclusions: To effectively denoise polychromatic low-dose CT images, a novel denoising algorithm is proposed. Because this algorithm is based on the noise statistics of a reconstructed polychromatic CT image, the spatially varying noise on the image is effectively reduced so that the denoised image will have homogeneous quality over the image domain. Through a simulation and a real experiment, it is verified that the proposed algorithm can deliver considerably better performance compared to the existing denoising algorithms.« less
Apriori Versions Based on MapReduce for Mining Frequent Patterns on Big Data.
Luna, Jose Maria; Padillo, Francisco; Pechenizkiy, Mykola; Ventura, Sebastian
2017-09-27
Pattern mining is one of the most important tasks to extract meaningful and useful information from raw data. This task aims to extract item-sets that represent any type of homogeneity and regularity in data. Although many efficient algorithms have been developed in this regard, the growing interest in data has caused the performance of existing pattern mining techniques to be dropped. The goal of this paper is to propose new efficient pattern mining algorithms to work in big data. To this aim, a series of algorithms based on the MapReduce framework and the Hadoop open-source implementation have been proposed. The proposed algorithms can be divided into three main groups. First, two algorithms [Apriori MapReduce (AprioriMR) and iterative AprioriMR] with no pruning strategy are proposed, which extract any existing item-set in data. Second, two algorithms (space pruning AprioriMR and top AprioriMR) that prune the search space by means of the well-known anti-monotone property are proposed. Finally, a last algorithm (maximal AprioriMR) is also proposed for mining condensed representations of frequent patterns. To test the performance of the proposed algorithms, a varied collection of big data datasets have been considered, comprising up to 3 · 10#x00B9;⁸ transactions and more than 5 million of distinct single-items. The experimental stage includes comparisons against highly efficient and well-known pattern mining algorithms. Results reveal the interest of applying MapReduce versions when complex problems are considered, and also the unsuitability of this paradigm when dealing with small data.
Lee, Chankyun; Cao, Xiaoyuan; Yoshikane, Noboru; Tsuritani, Takehiro; Rhee, June-Koo Kevin
2015-10-19
The feasibility of software-defined optical networking (SDON) for a practical application critically depends on scalability of centralized control performance. The paper, highly scalable routing and wavelength assignment (RWA) algorithms are investigated on an OpenFlow-based SDON testbed for proof-of-concept demonstration. Efficient RWA algorithms are proposed to achieve high performance in achieving network capacity with reduced computation cost, which is a significant attribute in a scalable centralized-control SDON. The proposed heuristic RWA algorithms differ in the orders of request processes and in the procedures of routing table updates. Combined in a shortest-path-based routing algorithm, a hottest-request-first processing policy that considers demand intensity and end-to-end distance information offers both the highest throughput of networks and acceptable computation scalability. We further investigate trade-off relationship between network throughput and computation complexity in routing table update procedure by a simulation study.
Luo, Y.; Xu, Y.; Liu, Q.; Xia, J.
2008-01-01
In recent years, multichannel analysis of surface waves (MASW) has been increasingly used for obtaining vertical shear-wave velocity profiles within near-surface materials. MASW uses a multichannel recording approach to capture the time-variant, full-seismic wavefield where dispersive surface waves can be used to estimate near-surface S-wave velocity. The technique consists of (1) acquisition of broadband, high-frequency ground roll using a multichannel recording system; (2) efficient and accurate algorithms that allow the extraction and analysis of 1D Rayleigh-wave dispersion curves; (3) stable and efficient inversion algorithms for estimating S-wave velocity profiles; and (4) construction of the 2D S-wave velocity field map.
Total variation-based neutron computed tomography
NASA Astrophysics Data System (ADS)
Barnard, Richard C.; Bilheux, Hassina; Toops, Todd; Nafziger, Eric; Finney, Charles; Splitter, Derek; Archibald, Rick
2018-05-01
We perform the neutron computed tomography reconstruction problem via an inverse problem formulation with a total variation penalty. In the case of highly under-resolved angular measurements, the total variation penalty suppresses high-frequency artifacts which appear in filtered back projections. In order to efficiently compute solutions for this problem, we implement a variation of the split Bregman algorithm; due to the error-forgetting nature of the algorithm, the computational cost of updating can be significantly reduced via very inexact approximate linear solvers. We present the effectiveness of the algorithm in the significantly low-angular sampling case using synthetic test problems as well as data obtained from a high flux neutron source. The algorithm removes artifacts and can even roughly capture small features when an extremely low number of angles are used.
An intercomparison study of TSM, SEBS, and SEBAL using high-resolution imagery and lysimetric data
USDA-ARS?s Scientific Manuscript database
Over the past three decades, numerous remote sensing based ET mapping algorithms were developed. These algorithms provided a robust, economical, and efficient tool for ET estimations at field and regional scales. The Two Source Model (TSM), Surface Energy Balance System (SEBS), and Surface Energy Ba...
Inverse problem of radiofrequency sounding of ionosphere
NASA Astrophysics Data System (ADS)
Velichko, E. N.; Yu. Grishentsev, A.; Korobeynikov, A. G.
2016-01-01
An algorithm for the solution of the inverse problem of vertical ionosphere sounding and a mathematical model of noise filtering are presented. An automated system for processing and analysis of spectrograms of vertical ionosphere sounding based on our algorithm is described. It is shown that the algorithm we suggest has a rather high efficiency. This is supported by the data obtained at the ionospheric stations of the so-called “AIS-M” type.
A controlled genetic algorithm by fuzzy logic and belief functions for job-shop scheduling.
Hajri, S; Liouane, N; Hammadi, S; Borne, P
2000-01-01
Most scheduling problems are highly complex combinatorial problems. However, stochastic methods such as genetic algorithm yield good solutions. In this paper, we present a controlled genetic algorithm (CGA) based on fuzzy logic and belief functions to solve job-shop scheduling problems. For better performance, we propose an efficient representational scheme, heuristic rules for creating the initial population, and a new methodology for mixing and computing genetic operator probabilities.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Buzatu, Adrian; /McGill U.
2006-08-01
Improving our ability to identify the top quark pair (t{bar t}) primary vertex (PV) on an event-by-event basis is essential for many analyses in the lepton-plus-jets channel performed by the Collider Detector at Fermilab (CDF) Collaboration. We compare the algorithm currently used by CDF (A1) with another algorithm (A2) using Monte Carlo simulation at high instantaneous luminosities. We confirm that A1 is more efficient than A2 at selecting the t{bar t} PV at all PV multiplicities, both with efficiencies larger than 99%. Event selection rejects events with a distance larger than 5 cm along the proton beam between the t{barmore » t} PV and the charged lepton. We find flat distributions for the signal over background significance of this cut for all cut values larger than 1 cm, for all PV multiplicities and for both algorithms. We conclude that any cut value larger than 1 cm is acceptable for both algorithms under the Tevatron's expected instantaneous luminosity improvements.« less
WS-BP: An efficient wolf search based back-propagation algorithm
NASA Astrophysics Data System (ADS)
Nawi, Nazri Mohd; Rehman, M. Z.; Khan, Abdullah
2015-05-01
Wolf Search (WS) is a heuristic based optimization algorithm. Inspired by the preying and survival capabilities of the wolves, this algorithm is highly capable to search large spaces in the candidate solutions. This paper investigates the use of WS algorithm in combination with back-propagation neural network (BPNN) algorithm to overcome the local minima problem and to improve convergence in gradient descent. The performance of the proposed Wolf Search based Back-Propagation (WS-BP) algorithm is compared with Artificial Bee Colony Back-Propagation (ABC-BP), Bat Based Back-Propagation (Bat-BP), and conventional BPNN algorithms. Specifically, OR and XOR datasets are used for training the network. The simulation results show that the WS-BP algorithm effectively avoids the local minima and converge to global minima.
The high performance parallel algorithm for Unified Gas-Kinetic Scheme
NASA Astrophysics Data System (ADS)
Li, Shiyi; Li, Qibing; Fu, Song; Xu, Jinxiu
2016-11-01
A high performance parallel algorithm for UGKS is developed to simulate three-dimensional flows internal and external on arbitrary grid system. The physical domain and velocity domain are divided into different blocks and distributed according to the two-dimensional Cartesian topology with intra-communicators in physical domain for data exchange and other intra-communicators in velocity domain for sum reduction to moment integrals. Numerical results of three-dimensional cavity flow and flow past a sphere agree well with the results from the existing studies and validate the applicability of the algorithm. The scalability of the algorithm is tested both on small (1-16) and large (729-5832) scale processors. The tested speed-up ratio is near linear ashind thus the efficiency is around 1, which reveals the good scalability of the present algorithm.
NASA Astrophysics Data System (ADS)
Xue, Wei; Wang, Qi; Wang, Tianyu
2018-04-01
This paper presents an improved parallel combinatory spread spectrum (PC/SS) communication system with the method of double information matching (DIM). Compared with conventional PC/SS system, the new model inherits the advantage of high transmission speed, large information capacity and high security. Besides, the problem traditional system will face is the high bit error rate (BER) and since its data-sequence mapping algorithm. Hence the new model presented shows lower BER and higher efficiency by its optimization of mapping algorithm.
Li, Nailu; Mu, Anle; Yang, Xiyun; Magar, Kaman T; Liu, Chao
2018-05-01
The optimal tuning of adaptive flap controller can improve adaptive flap control performance on uncertain operating environments, but the optimization process is usually time-consuming and it is difficult to design proper optimal tuning strategy for the flap control system (FCS). To solve this problem, a novel adaptive flap controller is designed based on a high-efficient differential evolution (DE) identification technique and composite adaptive internal model control (CAIMC) strategy. The optimal tuning can be easily obtained by DE identified inverse of the FCS via CAIMC structure. To achieve fast tuning, a high-efficient modified adaptive DE algorithm is proposed with new mutant operator and varying range adaptive mechanism for the FCS identification. A tradeoff between optimized adaptive flap control and low computation cost is successfully achieved by proposed controller. Simulation results show the robustness of proposed method and its superiority to conventional adaptive IMC (AIMC) flap controller and the CAIMC flap controllers using other DE algorithms on various uncertain operating conditions. The high computation efficiency of proposed controller is also verified based on the computation time on those operating cases. Copyright © 2018 ISA. Published by Elsevier Ltd. All rights reserved.
Monochromatic-beam-based dynamic X-ray microtomography based on OSEM-TV algorithm.
Xu, Liang; Chen, Rongchang; Yang, Yiming; Deng, Biao; Du, Guohao; Xie, Honglan; Xiao, Tiqiao
2017-01-01
Monochromatic-beam-based dynamic X-ray computed microtomography (CT) was developed to observe evolution of microstructure inside samples. However, the low flux density results in low efficiency in data collection. To increase efficiency, reducing the number of projections should be a practical solution. However, it has disadvantages of low image reconstruction quality using the traditional filtered back projection (FBP) algorithm. In this study, an iterative reconstruction method using an ordered subset expectation maximization-total variation (OSEM-TV) algorithm was employed to address and solve this problem. The simulated results demonstrated that normalized mean square error of the image slices reconstructed by the OSEM-TV algorithm was about 1/4 of that by FBP. Experimental results also demonstrated that the density resolution of OSEM-TV was high enough to resolve different materials with the number of projections less than 100. As a result, with the introduction of OSEM-TV, the monochromatic-beam-based dynamic X-ray microtomography is potentially practicable for the quantitative and non-destructive analysis to the evolution of microstructure with acceptable efficiency in data collection and reconstructed image quality.
Pirbhulal, Sandeep; Zhang, Heye; Mukhopadhyay, Subhas Chandra; Li, Chunyue; Wang, Yumei; Li, Guanglin; Wu, Wanqing; Zhang, Yuan-Ting
2015-01-01
Body Sensor Network (BSN) is a network of several associated sensor nodes on, inside or around the human body to monitor vital signals, such as, Electroencephalogram (EEG), Photoplethysmography (PPG), Electrocardiogram (ECG), etc. Each sensor node in BSN delivers major information; therefore, it is very significant to provide data confidentiality and security. All existing approaches to secure BSN are based on complex cryptographic key generation procedures, which not only demands high resource utilization and computation time, but also consumes large amount of energy, power and memory during data transmission. However, it is indispensable to put forward energy efficient and computationally less complex authentication technique for BSN. In this paper, a novel biometric-based algorithm is proposed, which utilizes Heart Rate Variability (HRV) for simple key generation process to secure BSN. Our proposed algorithm is compared with three data authentication techniques, namely Physiological Signal based Key Agreement (PSKA), Data Encryption Standard (DES) and Rivest Shamir Adleman (RSA). Simulation is performed in Matlab and results suggest that proposed algorithm is quite efficient in terms of transmission time utilization, average remaining energy and total power consumption. PMID:26131666
Pirbhulal, Sandeep; Zhang, Heye; Mukhopadhyay, Subhas Chandra; Li, Chunyue; Wang, Yumei; Li, Guanglin; Wu, Wanqing; Zhang, Yuan-Ting
2015-06-26
Body Sensor Network (BSN) is a network of several associated sensor nodes on, inside or around the human body to monitor vital signals, such as, Electroencephalogram (EEG), Photoplethysmography (PPG), Electrocardiogram (ECG), etc. Each sensor node in BSN delivers major information; therefore, it is very significant to provide data confidentiality and security. All existing approaches to secure BSN are based on complex cryptographic key generation procedures, which not only demands high resource utilization and computation time, but also consumes large amount of energy, power and memory during data transmission. However, it is indispensable to put forward energy efficient and computationally less complex authentication technique for BSN. In this paper, a novel biometric-based algorithm is proposed, which utilizes Heart Rate Variability (HRV) for simple key generation process to secure BSN. Our proposed algorithm is compared with three data authentication techniques, namely Physiological Signal based Key Agreement (PSKA), Data Encryption Standard (DES) and Rivest Shamir Adleman (RSA). Simulation is performed in Matlab and results suggest that proposed algorithm is quite efficient in terms of transmission time utilization, average remaining energy and total power consumption.
Evolving aerodynamic airfoils for wind turbines through a genetic algorithm
NASA Astrophysics Data System (ADS)
Hernández, J. J.; Gómez, E.; Grageda, J. I.; Couder, C.; Solís, A.; Hanotel, C. L.; Ledesma, JI
2017-01-01
Nowadays, genetic algorithms stand out for airfoil optimisation, due to the virtues of mutation and crossing-over techniques. In this work we propose a genetic algorithm with arithmetic crossover rules. The optimisation criteria are taken to be the maximisation of both aerodynamic efficiency and lift coefficient, while minimising drag coefficient. Such algorithm shows greatly improvements in computational costs, as well as a high performance by obtaining optimised airfoils for Mexico City's specific wind conditions from generic wind turbines designed for higher Reynolds numbers, in few iterations.
Clustering PPI data by combining FA and SHC method.
Lei, Xiujuan; Ying, Chao; Wu, Fang-Xiang; Xu, Jin
2015-01-01
Clustering is one of main methods to identify functional modules from protein-protein interaction (PPI) data. Nevertheless traditional clustering methods may not be effective for clustering PPI data. In this paper, we proposed a novel method for clustering PPI data by combining firefly algorithm (FA) and synchronization-based hierarchical clustering (SHC) algorithm. Firstly, the PPI data are preprocessed via spectral clustering (SC) which transforms the high-dimensional similarity matrix into a low dimension matrix. Then the SHC algorithm is used to perform clustering. In SHC algorithm, hierarchical clustering is achieved by enlarging the neighborhood radius of synchronized objects continuously, while the hierarchical search is very difficult to find the optimal neighborhood radius of synchronization and the efficiency is not high. So we adopt the firefly algorithm to determine the optimal threshold of the neighborhood radius of synchronization automatically. The proposed algorithm is tested on the MIPS PPI dataset. The results show that our proposed algorithm is better than the traditional algorithms in precision, recall and f-measure value.
Clustering PPI data by combining FA and SHC method
2015-01-01
Clustering is one of main methods to identify functional modules from protein-protein interaction (PPI) data. Nevertheless traditional clustering methods may not be effective for clustering PPI data. In this paper, we proposed a novel method for clustering PPI data by combining firefly algorithm (FA) and synchronization-based hierarchical clustering (SHC) algorithm. Firstly, the PPI data are preprocessed via spectral clustering (SC) which transforms the high-dimensional similarity matrix into a low dimension matrix. Then the SHC algorithm is used to perform clustering. In SHC algorithm, hierarchical clustering is achieved by enlarging the neighborhood radius of synchronized objects continuously, while the hierarchical search is very difficult to find the optimal neighborhood radius of synchronization and the efficiency is not high. So we adopt the firefly algorithm to determine the optimal threshold of the neighborhood radius of synchronization automatically. The proposed algorithm is tested on the MIPS PPI dataset. The results show that our proposed algorithm is better than the traditional algorithms in precision, recall and f-measure value. PMID:25707632
Sand, Andreas; Kristiansen, Martin; Pedersen, Christian N S; Mailund, Thomas
2013-11-22
Hidden Markov models are widely used for genome analysis as they combine ease of modelling with efficient analysis algorithms. Calculating the likelihood of a model using the forward algorithm has worst case time complexity linear in the length of the sequence and quadratic in the number of states in the model. For genome analysis, however, the length runs to millions or billions of observations, and when maximising the likelihood hundreds of evaluations are often needed. A time efficient forward algorithm is therefore a key ingredient in an efficient hidden Markov model library. We have built a software library for efficiently computing the likelihood of a hidden Markov model. The library exploits commonly occurring substrings in the input to reuse computations in the forward algorithm. In a pre-processing step our library identifies common substrings and builds a structure over the computations in the forward algorithm which can be reused. This analysis can be saved between uses of the library and is independent of concrete hidden Markov models so one preprocessing can be used to run a number of different models.Using this library, we achieve up to 78 times shorter wall-clock time for realistic whole-genome analyses with a real and reasonably complex hidden Markov model. In one particular case the analysis was performed in less than 8 minutes compared to 9.6 hours for the previously fastest library. We have implemented the preprocessing procedure and forward algorithm as a C++ library, zipHMM, with Python bindings for use in scripts. The library is available at http://birc.au.dk/software/ziphmm/.
NASA Astrophysics Data System (ADS)
Bansal, Shonak; Singh, Arun Kumar; Gupta, Neena
2017-02-01
In real-life, multi-objective engineering design problems are very tough and time consuming optimization problems due to their high degree of nonlinearities, complexities and inhomogeneity. Nature-inspired based multi-objective optimization algorithms are now becoming popular for solving multi-objective engineering design problems. This paper proposes original multi-objective Bat algorithm (MOBA) and its extended form, namely, novel parallel hybrid multi-objective Bat algorithm (PHMOBA) to generate shortest length Golomb ruler called optimal Golomb ruler (OGR) sequences at a reasonable computation time. The OGRs found their application in optical wavelength division multiplexing (WDM) systems as channel-allocation algorithm to reduce the four-wave mixing (FWM) crosstalk. The performances of both the proposed algorithms to generate OGRs as optical WDM channel-allocation is compared with other existing classical computing and nature-inspired algorithms, including extended quadratic congruence (EQC), search algorithm (SA), genetic algorithms (GAs), biogeography based optimization (BBO) and big bang-big crunch (BB-BC) optimization algorithms. Simulations conclude that the proposed parallel hybrid multi-objective Bat algorithm works efficiently as compared to original multi-objective Bat algorithm and other existing algorithms to generate OGRs for optical WDM systems. The algorithm PHMOBA to generate OGRs, has higher convergence and success rate than original MOBA. The efficiency improvement of proposed PHMOBA to generate OGRs up to 20-marks, in terms of ruler length and total optical channel bandwidth (TBW) is 100 %, whereas for original MOBA is 85 %. Finally the implications for further research are also discussed.
Structural optimisation of cage induction motors using finite element analysis
NASA Astrophysics Data System (ADS)
Palko, S.
The current trend in motor design is to have highly efficient, low noise, low cost, and modular motors with a high power factor. High torque motors are useful in applications like servo motors, lifts, cranes, and rolling mills. This report contains a detailed review of different optimization methods applicable in various design problems. Special attention is given to the performance of different methods, when they are used with finite element analysis (FEA) as an objective function, and accuracy problems arising from the numerical simulations. Also an effective method for designing high starting torque and high efficiency motors is presented. The method described in this work utilizes FEA combined with algorithms for the optimization of the slot geometry. The optimization algorithm modifies the position of the nodal points in the element mesh. The number of independent variables ranges from 14 to 140 in this work.
Computing Project, Marc develops high-fidelity turbulence models to enhance simulation accuracy and efficient numerical algorithms for future high performance computing hardware architectures. Research Interests High performance computing High order numerical methods for computational fluid dynamics Fluid
An adaptive ANOVA-based PCKF for high-dimensional nonlinear inverse modeling
DOE Office of Scientific and Technical Information (OSTI.GOV)
Li, Weixuan, E-mail: weixuan.li@usc.edu; Lin, Guang, E-mail: guang.lin@pnnl.gov; Zhang, Dongxiao, E-mail: dxz@pku.edu.cn
2014-02-01
The probabilistic collocation-based Kalman filter (PCKF) is a recently developed approach for solving inverse problems. It resembles the ensemble Kalman filter (EnKF) in every aspect—except that it represents and propagates model uncertainty by polynomial chaos expansion (PCE) instead of an ensemble of model realizations. Previous studies have shown PCKF is a more efficient alternative to EnKF for many data assimilation problems. However, the accuracy and efficiency of PCKF depends on an appropriate truncation of the PCE series. Having more polynomial chaos basis functions in the expansion helps to capture uncertainty more accurately but increases computational cost. Selection of basis functionsmore » is particularly important for high-dimensional stochastic problems because the number of polynomial chaos basis functions required to represent model uncertainty grows dramatically as the number of input parameters (random dimensions) increases. In classic PCKF algorithms, the PCE basis functions are pre-set based on users' experience. Also, for sequential data assimilation problems, the basis functions kept in PCE expression remain unchanged in different Kalman filter loops, which could limit the accuracy and computational efficiency of classic PCKF algorithms. To address this issue, we present a new algorithm that adaptively selects PCE basis functions for different problems and automatically adjusts the number of basis functions in different Kalman filter loops. The algorithm is based on adaptive functional ANOVA (analysis of variance) decomposition, which approximates a high-dimensional function with the summation of a set of low-dimensional functions. Thus, instead of expanding the original model into PCE, we implement the PCE expansion on these low-dimensional functions, which is much less costly. We also propose a new adaptive criterion for ANOVA that is more suited for solving inverse problems. The new algorithm was tested with different examples and demonstrated great effectiveness in comparison with non-adaptive PCKF and EnKF algorithms.« less
An Adaptive ANOVA-based PCKF for High-Dimensional Nonlinear Inverse Modeling
DOE Office of Scientific and Technical Information (OSTI.GOV)
LI, Weixuan; Lin, Guang; Zhang, Dongxiao
2014-02-01
The probabilistic collocation-based Kalman filter (PCKF) is a recently developed approach for solving inverse problems. It resembles the ensemble Kalman filter (EnKF) in every aspect—except that it represents and propagates model uncertainty by polynomial chaos expansion (PCE) instead of an ensemble of model realizations. Previous studies have shown PCKF is a more efficient alternative to EnKF for many data assimilation problems. However, the accuracy and efficiency of PCKF depends on an appropriate truncation of the PCE series. Having more polynomial chaos bases in the expansion helps to capture uncertainty more accurately but increases computational cost. Bases selection is particularly importantmore » for high-dimensional stochastic problems because the number of polynomial chaos bases required to represent model uncertainty grows dramatically as the number of input parameters (random dimensions) increases. In classic PCKF algorithms, the PCE bases are pre-set based on users’ experience. Also, for sequential data assimilation problems, the bases kept in PCE expression remain unchanged in different Kalman filter loops, which could limit the accuracy and computational efficiency of classic PCKF algorithms. To address this issue, we present a new algorithm that adaptively selects PCE bases for different problems and automatically adjusts the number of bases in different Kalman filter loops. The algorithm is based on adaptive functional ANOVA (analysis of variance) decomposition, which approximates a high-dimensional function with the summation of a set of low-dimensional functions. Thus, instead of expanding the original model into PCE, we implement the PCE expansion on these low-dimensional functions, which is much less costly. We also propose a new adaptive criterion for ANOVA that is more suited for solving inverse problems. The new algorithm is tested with different examples and demonstrated great effectiveness in comparison with non-adaptive PCKF and EnKF algorithms.« less
A Subsystem Test Bed for Chinese Spectral Radioheliograph
NASA Astrophysics Data System (ADS)
Zhao, An; Yan, Yihua; Wang, Wei
2014-11-01
The Chinese Spectral Radioheliograph is a solar dedicated radio interferometric array that will produce high spatial resolution, high temporal resolution, and high spectral resolution images of the Sun simultaneously in decimetre and centimetre wave range. Digital processing of intermediate frequency signal is an important part in a radio telescope. This paper describes a flexible and high-speed digital down conversion system for the CSRH by applying complex mixing, parallel filtering, and extracting algorithms to process IF signal at the time of being designed and incorporates canonic-signed digit coding and bit-plane method to improve program efficiency. The DDC system is intended to be a subsystem test bed for simulation and testing for CSRH. Software algorithms for simulation and hardware language algorithms based on FPGA are written which use less hardware resources and at the same time achieve high performances such as processing high-speed data flow (1 GHz) with 10 MHz spectral resolution. An experiment with the test bed is illustrated by using geostationary satellite data observed on March 20, 2014. Due to the easy alterability of the algorithms on FPGA, the data can be recomputed with different digital signal processing algorithms for selecting optimum algorithm.
Uni10: an open-source library for tensor network algorithms
NASA Astrophysics Data System (ADS)
Kao, Ying-Jer; Hsieh, Yun-Da; Chen, Pochung
2015-09-01
We present an object-oriented open-source library for developing tensor network algorithms written in C++ called Uni10. With Uni10, users can build a symmetric tensor from a collection of bonds, while the bonds are constructed from a list of quantum numbers associated with different quantum states. It is easy to label and permute the indices of the tensors and access a block associated with a particular quantum number. Furthermore a network class is used to describe arbitrary tensor network structure and to perform network contractions efficiently. We give an overview of the basic structure of the library and the hierarchy of the classes. We present examples of the construction of a spin-1 Heisenberg Hamiltonian and the implementation of the tensor renormalization group algorithm to illustrate the basic usage of the library. The library described here is particularly well suited to explore and fast prototype novel tensor network algorithms and to implement highly efficient codes for existing algorithms.
Highly parallel sparse Cholesky factorization
NASA Technical Reports Server (NTRS)
Gilbert, John R.; Schreiber, Robert
1990-01-01
Several fine grained parallel algorithms were developed and compared to compute the Cholesky factorization of a sparse matrix. The experimental implementations are on the Connection Machine, a distributed memory SIMD machine whose programming model conceptually supplies one processor per data element. In contrast to special purpose algorithms in which the matrix structure conforms to the connection structure of the machine, the focus is on matrices with arbitrary sparsity structure. The most promising algorithm is one whose inner loop performs several dense factorizations simultaneously on a 2-D grid of processors. Virtually any massively parallel dense factorization algorithm can be used as the key subroutine. The sparse code attains execution rates comparable to those of the dense subroutine. Although at present architectural limitations prevent the dense factorization from realizing its potential efficiency, it is concluded that a regular data parallel architecture can be used efficiently to solve arbitrarily structured sparse problems. A performance model is also presented and it is used to analyze the algorithms.
An Efficient Moving Target Detection Algorithm Based on Sparsity-Aware Spectrum Estimation
Shen, Mingwei; Wang, Jie; Wu, Di; Zhu, Daiyin
2014-01-01
In this paper, an efficient direct data domain space-time adaptive processing (STAP) algorithm for moving targets detection is proposed, which is achieved based on the distinct spectrum features of clutter and target signals in the angle-Doppler domain. To reduce the computational complexity, the high-resolution angle-Doppler spectrum is obtained by finding the sparsest coefficients in the angle domain using the reduced-dimension data within each Doppler bin. Moreover, we will then present a knowledge-aided block-size detection algorithm that can discriminate between the moving targets and the clutter based on the extracted spectrum features. The feasibility and effectiveness of the proposed method are validated through both numerical simulations and raw data processing results. PMID:25222035
NASA Astrophysics Data System (ADS)
Chen, Xiao; Li, Yaan; Yu, Jing; Li, Yuxing
2018-01-01
For fast and more effective implementation of tracking multiple targets in a cluttered environment, we propose a multiple targets tracking (MTT) algorithm called maximum entropy fuzzy c-means clustering joint probabilistic data association that combines fuzzy c-means clustering and the joint probabilistic data association (PDA) algorithm. The algorithm uses the membership value to express the probability of the target originating from measurement. The membership value is obtained through fuzzy c-means clustering objective function optimized by the maximum entropy principle. When considering the effect of the public measurement, we use a correction factor to adjust the association probability matrix to estimate the state of the target. As this algorithm avoids confirmation matrix splitting, it can solve the high computational load problem of the joint PDA algorithm. The results of simulations and analysis conducted for tracking neighbor parallel targets and cross targets in a different density cluttered environment show that the proposed algorithm can realize MTT quickly and efficiently in a cluttered environment. Further, the performance of the proposed algorithm remains constant with increasing process noise variance. The proposed algorithm has the advantages of efficiency and low computational load, which can ensure optimum performance when tracking multiple targets in a dense cluttered environment.
A Hybrid CPU/GPU Pattern-Matching Algorithm for Deep Packet Inspection
Chen, Yaw-Chung
2015-01-01
The large quantities of data now being transferred via high-speed networks have made deep packet inspection indispensable for security purposes. Scalable and low-cost signature-based network intrusion detection systems have been developed for deep packet inspection for various software platforms. Traditional approaches that only involve central processing units (CPUs) are now considered inadequate in terms of inspection speed. Graphic processing units (GPUs) have superior parallel processing power, but transmission bottlenecks can reduce optimal GPU efficiency. In this paper we describe our proposal for a hybrid CPU/GPU pattern-matching algorithm (HPMA) that divides and distributes the packet-inspecting workload between a CPU and GPU. All packets are initially inspected by the CPU and filtered using a simple pre-filtering algorithm, and packets that might contain malicious content are sent to the GPU for further inspection. Test results indicate that in terms of random payload traffic, the matching speed of our proposed algorithm was 3.4 times and 2.7 times faster than those of the AC-CPU and AC-GPU algorithms, respectively. Further, HPMA achieved higher energy efficiency than the other tested algorithms. PMID:26437335
A Hybrid CPU/GPU Pattern-Matching Algorithm for Deep Packet Inspection.
Lee, Chun-Liang; Lin, Yi-Shan; Chen, Yaw-Chung
2015-01-01
The large quantities of data now being transferred via high-speed networks have made deep packet inspection indispensable for security purposes. Scalable and low-cost signature-based network intrusion detection systems have been developed for deep packet inspection for various software platforms. Traditional approaches that only involve central processing units (CPUs) are now considered inadequate in terms of inspection speed. Graphic processing units (GPUs) have superior parallel processing power, but transmission bottlenecks can reduce optimal GPU efficiency. In this paper we describe our proposal for a hybrid CPU/GPU pattern-matching algorithm (HPMA) that divides and distributes the packet-inspecting workload between a CPU and GPU. All packets are initially inspected by the CPU and filtered using a simple pre-filtering algorithm, and packets that might contain malicious content are sent to the GPU for further inspection. Test results indicate that in terms of random payload traffic, the matching speed of our proposed algorithm was 3.4 times and 2.7 times faster than those of the AC-CPU and AC-GPU algorithms, respectively. Further, HPMA achieved higher energy efficiency than the other tested algorithms.
NASA Astrophysics Data System (ADS)
Khuwaileh, Bassam
High fidelity simulation of nuclear reactors entails large scale applications characterized with high dimensionality and tremendous complexity where various physics models are integrated in the form of coupled models (e.g. neutronic with thermal-hydraulic feedback). Each of the coupled modules represents a high fidelity formulation of the first principles governing the physics of interest. Therefore, new developments in high fidelity multi-physics simulation and the corresponding sensitivity/uncertainty quantification analysis are paramount to the development and competitiveness of reactors achieved through enhanced understanding of the design and safety margins. Accordingly, this dissertation introduces efficient and scalable algorithms for performing efficient Uncertainty Quantification (UQ), Data Assimilation (DA) and Target Accuracy Assessment (TAA) for large scale, multi-physics reactor design and safety problems. This dissertation builds upon previous efforts for adaptive core simulation and reduced order modeling algorithms and extends these efforts towards coupled multi-physics models with feedback. The core idea is to recast the reactor physics analysis in terms of reduced order models. This can be achieved via identifying the important/influential degrees of freedom (DoF) via the subspace analysis, such that the required analysis can be recast by considering the important DoF only. In this dissertation, efficient algorithms for lower dimensional subspace construction have been developed for single physics and multi-physics applications with feedback. Then the reduced subspace is used to solve realistic, large scale forward (UQ) and inverse problems (DA and TAA). Once the elite set of DoF is determined, the uncertainty/sensitivity/target accuracy assessment and data assimilation analysis can be performed accurately and efficiently for large scale, high dimensional multi-physics nuclear engineering applications. Hence, in this work a Karhunen-Loeve (KL) based algorithm previously developed to quantify the uncertainty for single physics models is extended for large scale multi-physics coupled problems with feedback effect. Moreover, a non-linear surrogate based UQ approach is developed, used and compared to performance of the KL approach and brute force Monte Carlo (MC) approach. On the other hand, an efficient Data Assimilation (DA) algorithm is developed to assess information about model's parameters: nuclear data cross-sections and thermal-hydraulics parameters. Two improvements are introduced in order to perform DA on the high dimensional problems. First, a goal-oriented surrogate model can be used to replace the original models in the depletion sequence (MPACT -- COBRA-TF - ORIGEN). Second, approximating the complex and high dimensional solution space with a lower dimensional subspace makes the sampling process necessary for DA possible for high dimensional problems. Moreover, safety analysis and design optimization depend on the accurate prediction of various reactor attributes. Predictions can be enhanced by reducing the uncertainty associated with the attributes of interest. Accordingly, an inverse problem can be defined and solved to assess the contributions from sources of uncertainty; and experimental effort can be subsequently directed to further improve the uncertainty associated with these sources. In this dissertation a subspace-based gradient-free and nonlinear algorithm for inverse uncertainty quantification namely the Target Accuracy Assessment (TAA) has been developed and tested. The ideas proposed in this dissertation were first validated using lattice physics applications simulated using SCALE6.1 package (Pressurized Water Reactor (PWR) and Boiling Water Reactor (BWR) lattice models). Ultimately, the algorithms proposed her were applied to perform UQ and DA for assembly level (CASL progression problem number 6) and core wide problems representing Watts Bar Nuclear 1 (WBN1) for cycle 1 of depletion (CASL Progression Problem Number 9) modeled via simulated using VERA-CS which consists of several multi-physics coupled models. The analysis and algorithms developed in this dissertation were encoded and implemented in a newly developed tool kit algorithms for Reduced Order Modeling based Uncertainty/Sensitivity Estimator (ROMUSE).
A High Performance Cloud-Based Protein-Ligand Docking Prediction Algorithm
Chen, Jui-Le; Yang, Chu-Sing
2013-01-01
The potential of predicting druggability for a particular disease by integrating biological and computer science technologies has witnessed success in recent years. Although the computer science technologies can be used to reduce the costs of the pharmaceutical research, the computation time of the structure-based protein-ligand docking prediction is still unsatisfied until now. Hence, in this paper, a novel docking prediction algorithm, named fast cloud-based protein-ligand docking prediction algorithm (FCPLDPA), is presented to accelerate the docking prediction algorithm. The proposed algorithm works by leveraging two high-performance operators: (1) the novel migration (information exchange) operator is designed specially for cloud-based environments to reduce the computation time; (2) the efficient operator is aimed at filtering out the worst search directions. Our simulation results illustrate that the proposed method outperforms the other docking algorithms compared in this paper in terms of both the computation time and the quality of the end result. PMID:23762864
Unconventional Hamilton-type variational principle in phase space and symplectic algorithm
NASA Astrophysics Data System (ADS)
Luo, En; Huang, Weijiang; Zhang, Hexin
2003-06-01
By a novel approach proposed by Luo, the unconventional Hamilton-type variational principle in phase space for elastodynamics of multidegree-of-freedom system is established in this paper. It not only can fully characterize the initial-value problem of this dynamic, but also has a natural symplectic structure. Based on this variational principle, a symplectic algorithm which is called a symplectic time-subdomain method is proposed. A non-difference scheme is constructed by applying Lagrange interpolation polynomial to the time subdomain. Furthermore, it is also proved that the presented symplectic algorithm is an unconditionally stable one. From the results of the two numerical examples of different types, it can be seen that the accuracy and the computational efficiency of the new method excel obviously those of widely used Wilson-θ and Newmark-β methods. Therefore, this new algorithm is a highly efficient one with better computational performance.
Algorithm for Controlling a Centrifugal Compressor
NASA Technical Reports Server (NTRS)
Benedict, Scott M.
2004-01-01
An algorithm has been developed for controlling a centrifugal compressor that serves as the prime mover in a heatpump system. Experimental studies have shown that the operating conditions for maximum compressor efficiency are close to the boundary beyond which surge occurs. Compressor surge is a destructive condition in which there are instantaneous reversals of flow associated with a high outlet-to-inlet pressure differential. For a given cooling load, the algorithm sets the compressor speed at the lowest possible value while adjusting the inlet guide vane angle and diffuser vane angle to maximize efficiency, subject to an overriding requirement to prevent surge. The onset of surge is detected via the onset of oscillations of the electric current supplied to the compressor motor, associated with surge-induced oscillations of the torque exerted by and on the compressor rotor. The algorithm can be implemented in any of several computer languages.
Sugisaki, Kenji; Yamamoto, Satoru; Nakazawa, Shigeaki; Toyota, Kazuo; Sato, Kazunobu; Shiomi, Daisuke; Takui, Takeji
2016-08-18
Quantum computers are capable to efficiently perform full configuration interaction (FCI) calculations of atoms and molecules by using the quantum phase estimation (QPE) algorithm. Because the success probability of the QPE depends on the overlap between approximate and exact wave functions, efficient methods to prepare accurate initial guess wave functions enough to have sufficiently large overlap with the exact ones are highly desired. Here, we propose a quantum algorithm to construct the wave function consisting of one configuration state function, which is suitable for the initial guess wave function in QPE-based FCI calculations of open-shell molecules, based on the addition theorem of angular momentum. The proposed quantum algorithm enables us to prepare the wave function consisting of an exponential number of Slater determinants only by a polynomial number of quantum operations.
Clustering algorithm for determining community structure in large networks
NASA Astrophysics Data System (ADS)
Pujol, Josep M.; Béjar, Javier; Delgado, Jordi
2006-07-01
We propose an algorithm to find the community structure in complex networks based on the combination of spectral analysis and modularity optimization. The clustering produced by our algorithm is as accurate as the best algorithms on the literature of modularity optimization; however, the main asset of the algorithm is its efficiency. The best match for our algorithm is Newman’s fast algorithm, which is the reference algorithm for clustering in large networks due to its efficiency. When both algorithms are compared, our algorithm outperforms the fast algorithm both in efficiency and accuracy of the clustering, in terms of modularity. Thus, the results suggest that the proposed algorithm is a good choice to analyze the community structure of medium and large networks in the range of tens and hundreds of thousand vertices.
A nonlinear relaxation/quasi-Newton algorithm for the compressible Navier-Stokes equations
NASA Technical Reports Server (NTRS)
Edwards, Jack R.; Mcrae, D. S.
1992-01-01
A highly efficient implicit method for the computation of steady, two-dimensional compressible Navier-Stokes flowfields is presented. The discretization of the governing equations is hybrid in nature, with flux-vector splitting utilized in the streamwise direction and central differences with flux-limited artificial dissipation used for the transverse fluxes. Line Jacobi relaxation is used to provide a suitable initial guess for a new nonlinear iteration strategy based on line Gauss-Seidel sweeps. The applicability of quasi-Newton methods as convergence accelerators for this and other line relaxation algorithms is discussed, and efficient implementations of such techniques are presented. Convergence histories and comparisons with experimental data are presented for supersonic flow over a flat plate and for several high-speed compression corner interactions. Results indicate a marked improvement in computational efficiency over more conventional upwind relaxation strategies, particularly for flowfields containing large pockets of streamwise subsonic flow.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cox, R.G.
Much controversy surrounds government regulation of routing and scheduling of Hazardous Materials Transportation (HMT). Increases in operating costs must be balanced against expected benefits from local HMT bans and curfews when promulgating or preempting HMT regulations. Algorithmic approaches for evaluating HMT routing and scheduling regulatory policy are described. A review of current US HMT regulatory policy is presented to provide a context for the analysis. Next, a multiobjective shortest path algorithm to find the set of efficient routes under conflicting objectives is presented. This algorithm generates all efficient routes under any partial ordering in a single pass through the network.more » Also, scheduling algorithms are presented to estimate the travel time delay due to HMT curfews along a route. Algorithms are presented assuming either deterministic or stochastic travel times between curfew cities and also possible rerouting to avoid such cities. These algorithms are applied to the case study of US highway transport of spent nuclear fuel from reactors to permanent repositories. Two data sets were used. One data set included the US Interstate Highway System (IHS) network with reactor locations, possible repository sites, and 150 heavily populated areas (HPAs). The other data set contained estimates of the population residing with 0.5 miles of the IHS and the Eastern US. Curfew delay is dramatically reduced by optimally scheduling departure times unless inter-HPA travel times are highly uncertain. Rerouting shipments to avoid HPAs is a less efficient approach to reducing delay.« less
FPGA based charge acquisition algorithm for soft x-ray diagnostics system
NASA Astrophysics Data System (ADS)
Wojenski, A.; Kasprowicz, G.; Pozniak, K. T.; Zabolotny, W.; Byszuk, A.; Juszczyk, B.; Kolasinski, P.; Krawczyk, R. D.; Zienkiewicz, P.; Chernyshova, M.; Czarski, T.
2015-09-01
Soft X-ray (SXR) measurement systems working in tokamaks or with laser generated plasma can expect high photon fluxes. Therefore it is necessary to focus on data processing algorithms to have the best possible efficiency in term of processed photon events per second. This paper refers to recently designed algorithm and data-flow for implementation of charge data acquisition in FPGA. The algorithms are currently on implementation stage for the soft X-ray diagnostics system. In this paper despite of the charge processing algorithm is also described general firmware overview, data storage methods and other key components of the measurement system. The simulation section presents algorithm performance and expected maximum photon rate.
Enhanced Particle Swarm Optimization Algorithm: Efficient Training of ReaxFF Reactive Force Fields.
Furman, David; Carmeli, Benny; Zeiri, Yehuda; Kosloff, Ronnie
2018-06-12
Particle swarm optimization (PSO) is a powerful metaheuristic population-based global optimization algorithm. However, when it is applied to nonseparable objective functions, its performance on multimodal landscapes is significantly degraded. Here we show that a significant improvement in the search quality and efficiency on multimodal functions can be achieved by enhancing the basic rotation-invariant PSO algorithm with isotropic Gaussian mutation operators. The new algorithm demonstrates superior performance across several nonlinear, multimodal benchmark functions compared with the rotation-invariant PSO algorithm and the well-established simulated annealing and sequential one-parameter parabolic interpolation methods. A search for the optimal set of parameters for the dispersion interaction model in the ReaxFF- lg reactive force field was carried out with respect to accurate DFT-TS calculations. The resulting optimized force field accurately describes the equations of state of several high-energy molecular crystals where such interactions are of crucial importance. The improved algorithm also presents better performance compared to a genetic algorithm optimization method in the optimization of the parameters of a ReaxFF- lg correction model. The computational framework is implemented in a stand-alone C++ code that allows the straightforward development of ReaxFF reactive force fields.
Fuzzy-Logic Based Distributed Energy-Efficient Clustering Algorithm for Wireless Sensor Networks.
Zhang, Ying; Wang, Jun; Han, Dezhi; Wu, Huafeng; Zhou, Rundong
2017-07-03
Due to the high-energy efficiency and scalability, the clustering routing algorithm has been widely used in wireless sensor networks (WSNs). In order to gather information more efficiently, each sensor node transmits data to its Cluster Head (CH) to which it belongs, by multi-hop communication. However, the multi-hop communication in the cluster brings the problem of excessive energy consumption of the relay nodes which are closer to the CH. These nodes' energy will be consumed more quickly than the farther nodes, which brings the negative influence on load balance for the whole networks. Therefore, we propose an energy-efficient distributed clustering algorithm based on fuzzy approach with non-uniform distribution (EEDCF). During CHs' election, we take nodes' energies, nodes' degree and neighbor nodes' residual energies into consideration as the input parameters. In addition, we take advantage of Takagi, Sugeno and Kang (TSK) fuzzy model instead of traditional method as our inference system to guarantee the quantitative analysis more reasonable. In our scheme, each sensor node calculates the probability of being as CH with the help of fuzzy inference system in a distributed way. The experimental results indicate EEDCF algorithm is better than some current representative methods in aspects of data transmission, energy consumption and lifetime of networks.
A community detection algorithm based on structural similarity
NASA Astrophysics Data System (ADS)
Guo, Xuchao; Hao, Xia; Liu, Yaqiong; Zhang, Li; Wang, Lu
2017-09-01
In order to further improve the efficiency and accuracy of community detection algorithm, a new algorithm named SSTCA (the community detection algorithm based on structural similarity with threshold) is proposed. In this algorithm, the structural similarities are taken as the weights of edges, and the threshold k is considered to remove multiple edges whose weights are less than the threshold, and improve the computational efficiency. Tests were done on the Zachary’s network, Dolphins’ social network and Football dataset by the proposed algorithm, and compared with GN and SSNCA algorithm. The results show that the new algorithm is superior to other algorithms in accuracy for the dense networks and the operating efficiency is improved obviously.
The whole space three-dimensional magnetotelluric inversion algorithm with static shift correction
NASA Astrophysics Data System (ADS)
Zhang, K.
2016-12-01
Base on the previous studies on the static shift correction and 3D inversion algorithms, we improve the NLCG 3D inversion method and propose a new static shift correction method which work in the inversion. The static shift correction method is based on the 3D theory and real data. The static shift can be detected by the quantitative analysis of apparent parameters (apparent resistivity and impedance phase) of MT in high frequency range, and completed correction with inversion. The method is an automatic processing technology of computer with 0 cost, and avoids the additional field work and indoor processing with good results.The 3D inversion algorithm is improved (Zhang et al., 2013) base on the NLCG method of Newman & Alumbaugh (2000) and Rodi & Mackie (2001). For the algorithm, we added the parallel structure, improved the computational efficiency, reduced the memory of computer and added the topographic and marine factors. So the 3D inversion could work in general PC with high efficiency and accuracy. And all the MT data of surface stations, seabed stations and underground stations can be used in the inversion algorithm. The verification and application example of 3D inversion algorithm is shown in Figure 1. From the comparison of figure 1, the inversion model can reflect all the abnormal bodies and terrain clearly regardless of what type of data (impedance/tipper/impedance and tipper). And the resolution of the bodies' boundary can be improved by using tipper data. The algorithm is very effective for terrain inversion. So it is very useful for the study of continental shelf with continuous exploration of land, marine and underground.The three-dimensional electrical model of the ore zone reflects the basic information of stratum, rock and structure. Although it cannot indicate the ore body position directly, the important clues are provided for prospecting work by the delineation of diorite pluton uplift range. The test results show that, the high quality of the data processing and efficient inversion method for electromagnetic method is an important guarantee for porphyry ore.
Dynamic Synchronous Capture Algorithm for an Electromagnetic Flowmeter.
Fanjiang, Yong-Yi; Lu, Shih-Wei
2017-04-10
This paper proposes a dynamic synchronous capture (DSC) algorithm to calculate the flow rate for an electromagnetic flowmeter. The characteristics of the DSC algorithm can accurately calculate the flow rate signal and efficiently convert an analog signal to upgrade the execution performance of a microcontroller unit (MCU). Furthermore, it can reduce interference from abnormal noise. It is extremely steady and independent of fluctuations in the flow measurement. Moreover, it can calculate the current flow rate signal immediately (m/s). The DSC algorithm can be applied to the current general MCU firmware platform without using DSP (Digital Signal Processing) or a high-speed and high-end MCU platform, and signal amplification by hardware reduces the demand for ADC accuracy, which reduces the cost.
Dynamic Synchronous Capture Algorithm for an Electromagnetic Flowmeter
Fanjiang, Yong-Yi; Lu, Shih-Wei
2017-01-01
This paper proposes a dynamic synchronous capture (DSC) algorithm to calculate the flow rate for an electromagnetic flowmeter. The characteristics of the DSC algorithm can accurately calculate the flow rate signal and efficiently convert an analog signal to upgrade the execution performance of a microcontroller unit (MCU). Furthermore, it can reduce interference from abnormal noise. It is extremely steady and independent of fluctuations in the flow measurement. Moreover, it can calculate the current flow rate signal immediately (m/s). The DSC algorithm can be applied to the current general MCU firmware platform without using DSP (Digital Signal Processing) or a high-speed and high-end MCU platform, and signal amplification by hardware reduces the demand for ADC accuracy, which reduces the cost. PMID:28394306
Improving Remote Health Monitoring: A Low-Complexity ECG Compression Approach
Al-Ali, Abdulla; Mohamed, Amr; Ward, Rabab
2018-01-01
Recent advances in mobile technology have created a shift towards using battery-driven devices in remote monitoring settings and smart homes. Clinicians are carrying out diagnostic and screening procedures based on the electrocardiogram (ECG) signals collected remotely for outpatients who need continuous monitoring. High-speed transmission and analysis of large recorded ECG signals are essential, especially with the increased use of battery-powered devices. Exploring low-power alternative compression methodologies that have high efficiency and that enable ECG signal collection, transmission, and analysis in a smart home or remote location is required. Compression algorithms based on adaptive linear predictors and decimation by a factor B/K are evaluated based on compression ratio (CR), percentage root-mean-square difference (PRD), and heartbeat detection accuracy of the reconstructed ECG signal. With two databases (153 subjects), the new algorithm demonstrates the highest compression performance (CR=6 and PRD=1.88) and overall detection accuracy (99.90% sensitivity, 99.56% positive predictivity) over both databases. The proposed algorithm presents an advantage for the real-time transmission of ECG signals using a faster and more efficient method, which meets the growing demand for more efficient remote health monitoring. PMID:29337892
Improving Remote Health Monitoring: A Low-Complexity ECG Compression Approach.
Elgendi, Mohamed; Al-Ali, Abdulla; Mohamed, Amr; Ward, Rabab
2018-01-16
Recent advances in mobile technology have created a shift towards using battery-driven devices in remote monitoring settings and smart homes. Clinicians are carrying out diagnostic and screening procedures based on the electrocardiogram (ECG) signals collected remotely for outpatients who need continuous monitoring. High-speed transmission and analysis of large recorded ECG signals are essential, especially with the increased use of battery-powered devices. Exploring low-power alternative compression methodologies that have high efficiency and that enable ECG signal collection, transmission, and analysis in a smart home or remote location is required. Compression algorithms based on adaptive linear predictors and decimation by a factor B / K are evaluated based on compression ratio (CR), percentage root-mean-square difference (PRD), and heartbeat detection accuracy of the reconstructed ECG signal. With two databases (153 subjects), the new algorithm demonstrates the highest compression performance ( CR = 6 and PRD = 1.88 ) and overall detection accuracy (99.90% sensitivity, 99.56% positive predictivity) over both databases. The proposed algorithm presents an advantage for the real-time transmission of ECG signals using a faster and more efficient method, which meets the growing demand for more efficient remote health monitoring.
On Efficient Multigrid Methods for Materials Processing Flows with Small Particles
NASA Technical Reports Server (NTRS)
Thomas, James (Technical Monitor); Diskin, Boris; Harik, VasylMichael
2004-01-01
Multiscale modeling of materials requires simulations of multiple levels of structural hierarchy. The computational efficiency of numerical methods becomes a critical factor for simulating large physical systems with highly desperate length scales. Multigrid methods are known for their superior efficiency in representing/resolving different levels of physical details. The efficiency is achieved by employing interactively different discretizations on different scales (grids). To assist optimization of manufacturing conditions for materials processing with numerous particles (e.g., dispersion of particles, controlling flow viscosity and clusters), a new multigrid algorithm has been developed for a case of multiscale modeling of flows with small particles that have various length scales. The optimal efficiency of the algorithm is crucial for accurate predictions of the effect of processing conditions (e.g., pressure and velocity gradients) on the local flow fields that control the formation of various microstructures or clusters.
A study of metaheuristic algorithms for high dimensional feature selection on microarray data
NASA Astrophysics Data System (ADS)
Dankolo, Muhammad Nasiru; Radzi, Nor Haizan Mohamed; Sallehuddin, Roselina; Mustaffa, Noorfa Haszlinna
2017-11-01
Microarray systems enable experts to examine gene profile at molecular level using machine learning algorithms. It increases the potentials of classification and diagnosis of many diseases at gene expression level. Though, numerous difficulties may affect the efficiency of machine learning algorithms which includes vast number of genes features comprised in the original data. Many of these features may be unrelated to the intended analysis. Therefore, feature selection is necessary to be performed in the data pre-processing. Many feature selection algorithms are developed and applied on microarray which including the metaheuristic optimization algorithms. This paper discusses the application of the metaheuristics algorithms for feature selection in microarray dataset. This study reveals that, the algorithms have yield an interesting result with limited resources thereby saving computational expenses of machine learning algorithms.
Scheduled Relaxation Jacobi method: Improvements and applications
NASA Astrophysics Data System (ADS)
Adsuara, J. E.; Cordero-Carrión, I.; Cerdá-Durán, P.; Aloy, M. A.
2016-09-01
Elliptic partial differential equations (ePDEs) appear in a wide variety of areas of mathematics, physics and engineering. Typically, ePDEs must be solved numerically, which sets an ever growing demand for efficient and highly parallel algorithms to tackle their computational solution. The Scheduled Relaxation Jacobi (SRJ) is a promising class of methods, atypical for combining simplicity and efficiency, that has been recently introduced for solving linear Poisson-like ePDEs. The SRJ methodology relies on computing the appropriate parameters of a multilevel approach with the goal of minimizing the number of iterations needed to cut down the residuals below specified tolerances. The efficiency in the reduction of the residual increases with the number of levels employed in the algorithm. Applying the original methodology to compute the algorithm parameters with more than 5 levels notably hinders obtaining optimal SRJ schemes, as the mixed (non-linear) algebraic-differential system of equations from which they result becomes notably stiff. Here we present a new methodology for obtaining the parameters of SRJ schemes that overcomes the limitations of the original algorithm and provide parameters for SRJ schemes with up to 15 levels and resolutions of up to 215 points per dimension, allowing for acceleration factors larger than several hundreds with respect to the Jacobi method for typical resolutions and, in some high resolution cases, close to 1000. Most of the success in finding SRJ optimal schemes with more than 10 levels is based on an analytic reduction of the complexity of the previously mentioned system of equations. Furthermore, we extend the original algorithm to apply it to certain systems of non-linear ePDEs.
Entropy-Based Search Algorithm for Experimental Design
NASA Astrophysics Data System (ADS)
Malakar, N. K.; Knuth, K. H.
2011-03-01
The scientific method relies on the iterated processes of inference and inquiry. The inference phase consists of selecting the most probable models based on the available data; whereas the inquiry phase consists of using what is known about the models to select the most relevant experiment. Optimizing inquiry involves searching the parameterized space of experiments to select the experiment that promises, on average, to be maximally informative. In the case where it is important to learn about each of the model parameters, the relevance of an experiment is quantified by Shannon entropy of the distribution of experimental outcomes predicted by a probable set of models. If the set of potential experiments is described by many parameters, we must search this high-dimensional entropy space. Brute force search methods will be slow and computationally expensive. We present an entropy-based search algorithm, called nested entropy sampling, to select the most informative experiment for efficient experimental design. This algorithm is inspired by Skilling's nested sampling algorithm used in inference and borrows the concept of a rising threshold while a set of experiment samples are maintained. We demonstrate that this algorithm not only selects highly relevant experiments, but also is more efficient than brute force search. Such entropic search techniques promise to greatly benefit autonomous experimental design.
Efficient Hardware Implementation of the Lightweight Block Encryption Algorithm LEA
Lee, Donggeon; Kim, Dong-Chan; Kwon, Daesung; Kim, Howon
2014-01-01
Recently, due to the advent of resource-constrained trends, such as smartphones and smart devices, the computing environment is changing. Because our daily life is deeply intertwined with ubiquitous networks, the importance of security is growing. A lightweight encryption algorithm is essential for secure communication between these kinds of resource-constrained devices, and many researchers have been investigating this field. Recently, a lightweight block cipher called LEA was proposed. LEA was originally targeted for efficient implementation on microprocessors, as it is fast when implemented in software and furthermore, it has a small memory footprint. To reflect on recent technology, all required calculations utilize 32-bit wide operations. In addition, the algorithm is comprised of not complex S-Box-like structures but simple Addition, Rotation, and XOR operations. To the best of our knowledge, this paper is the first report on a comprehensive hardware implementation of LEA. We present various hardware structures and their implementation results according to key sizes. Even though LEA was originally targeted at software efficiency, it also shows high efficiency when implemented as hardware. PMID:24406859
NASA Astrophysics Data System (ADS)
Wang, Yue; Yu, Jingjun; Pei, Xu
2018-06-01
A new forward kinematics algorithm for the mechanism of 3-RPS (R: Revolute; P: Prismatic; S: Spherical) parallel manipulators is proposed in this study. This algorithm is primarily based on the special geometric conditions of the 3-RPS parallel mechanism, and it eliminates the errors produced by parasitic motions to improve and ensure accuracy. Specifically, the errors can be less than 10-6. In this method, only the group of solutions that is consistent with the actual situation of the platform is obtained rapidly. This algorithm substantially improves calculation efficiency because the selected initial values are reasonable, and all the formulas in the calculation are analytical. This novel forward kinematics algorithm is well suited for real-time and high-precision control of the 3-RPS parallel mechanism.
NASA Astrophysics Data System (ADS)
Berahmand, Kamal; Bouyer, Asgarali
2018-03-01
Community detection is an essential approach for analyzing the structural and functional properties of complex networks. Although many community detection algorithms have been recently presented, most of them are weak and limited in different ways. Label Propagation Algorithm (LPA) is a well-known and efficient community detection technique which is characterized by the merits of nearly-linear running time and easy implementation. However, LPA has some significant problems such as instability, randomness, and monster community detection. In this paper, an algorithm, namely node’s label influence policy for label propagation algorithm (LP-LPA) was proposed for detecting efficient community structures. LP-LPA measures link strength value for edges and nodes’ label influence value for nodes in a new label propagation strategy with preference on link strength and for initial nodes selection, avoid of random behavior in tiebreak states, and efficient updating order and rule update. These procedures can sort out the randomness issue in an original LPA and stabilize the discovered communities in all runs of the same network. Experiments on synthetic networks and a wide range of real-world social networks indicated that the proposed method achieves significant accuracy and high stability. Indeed, it can obviously solve monster community problem with regard to detecting communities in networks.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gayeski, N.; Armstrong, Peter; Alvira, M.
2011-11-30
KGS Buildings LLC (KGS) and Pacific Northwest National Laboratory (PNNL) have developed a simplified control algorithm and prototype low-lift chiller controller suitable for model-predictive control in a demonstration project of low-lift cooling. Low-lift cooling is a highly efficient cooling strategy conceived to enable low or net-zero energy buildings. A low-lift cooling system consists of a high efficiency low-lift chiller, radiant cooling, thermal storage, and model-predictive control to pre-cool thermal storage overnight on an optimal cooling rate trajectory. We call the properly integrated and controlled combination of these elements a low-lift cooling system (LLCS). This document is the final report formore » that project.« less
Multiscale high-order/low-order (HOLO) algorithms and applications
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chacon, Luis; Chen, Guangye; Knoll, Dana Alan
Here, we review the state of the art in the formulation, implementation, and performance of so-called high-order/low-order (HOLO) algorithms for challenging multiscale problems. HOLO algorithms attempt to couple one or several high-complexity physical models (the high-order model, HO) with low-complexity ones (the low-order model, LO). The primary goal of HOLO algorithms is to achieve nonlinear convergence between HO and LO components while minimizing memory footprint and managing the computational complexity in a practical manner. Key to the HOLO approach is the use of the LO representations to address temporal stiffness, effectively accelerating the convergence of the HO/LO coupled system. Themore » HOLO approach is broadly underpinned by the concept of nonlinear elimination, which enables segregation of the HO and LO components in ways that can effectively use heterogeneous architectures. The accuracy and efficiency benefits of HOLO algorithms are demonstrated with specific applications to radiation transport, gas dynamics, plasmas (both Eulerian and Lagrangian formulations), and ocean modeling. Across this broad application spectrum, HOLO algorithms achieve significant accuracy improvements at a fraction of the cost compared to conventional approaches. It follows that HOLO algorithms hold significant potential for high-fidelity system scale multiscale simulations leveraging exascale computing.« less
Multiscale high-order/low-order (HOLO) algorithms and applications
Chacon, Luis; Chen, Guangye; Knoll, Dana Alan; ...
2016-11-11
Here, we review the state of the art in the formulation, implementation, and performance of so-called high-order/low-order (HOLO) algorithms for challenging multiscale problems. HOLO algorithms attempt to couple one or several high-complexity physical models (the high-order model, HO) with low-complexity ones (the low-order model, LO). The primary goal of HOLO algorithms is to achieve nonlinear convergence between HO and LO components while minimizing memory footprint and managing the computational complexity in a practical manner. Key to the HOLO approach is the use of the LO representations to address temporal stiffness, effectively accelerating the convergence of the HO/LO coupled system. Themore » HOLO approach is broadly underpinned by the concept of nonlinear elimination, which enables segregation of the HO and LO components in ways that can effectively use heterogeneous architectures. The accuracy and efficiency benefits of HOLO algorithms are demonstrated with specific applications to radiation transport, gas dynamics, plasmas (both Eulerian and Lagrangian formulations), and ocean modeling. Across this broad application spectrum, HOLO algorithms achieve significant accuracy improvements at a fraction of the cost compared to conventional approaches. It follows that HOLO algorithms hold significant potential for high-fidelity system scale multiscale simulations leveraging exascale computing.« less
Yi, Tianzhu; He, Zhihua; He, Feng; Dong, Zhen; Wu, Manqing
2017-01-01
This paper presents an efficient and precise imaging algorithm for the large bandwidth sliding spotlight synthetic aperture radar (SAR). The existing sub-aperture processing method based on the baseband azimuth scaling (BAS) algorithm cannot cope with the high order phase coupling along the range and azimuth dimensions. This coupling problem causes defocusing along the range and azimuth dimensions. This paper proposes a generalized chirp scaling (GCS)-BAS processing algorithm, which is based on the GCS algorithm. It successfully mitigates the deep focus along the range dimension of a sub-aperture of the large bandwidth sliding spotlight SAR, as well as high order phase coupling along the range and azimuth dimensions. Additionally, the azimuth focusing can be achieved by this azimuth scaling method. Simulation results demonstrate the ability of the GCS-BAS algorithm to process the large bandwidth sliding spotlight SAR data. It is proven that great improvements of the focus depth and imaging accuracy are obtained via the GCS-BAS algorithm. PMID:28555057
Peng, Hui; Zheng, Yi; Blumenstein, Michael; Tao, Dacheng; Li, Jinyan
2018-04-16
CRISPR/Cas9 system is a widely used genome editing tool. A prediction problem of great interests for this system is: how to select optimal single guide RNAs (sgRNAs) such that its cleavage efficiency is high meanwhile the off-target effect is low. This work proposed a two-step averaging method (TSAM) for the regression of cleavage efficiencies of a set of sgRNAs by averaging the predicted efficiency scores of a boosting algorithm and those by a support vector machine (SVM).We also proposed to use profiled Markov properties as novel features to capture the global characteristics of sgRNAs. These new features are combined with the outstanding features ranked by the boosting algorithm for the training of the SVM regressor. TSAM improved the mean Spearman correlation coefficiencies comparing with the state-of-the-art performance on benchmark datasets containing thousands of human, mouse and zebrafish sgRNAs. Our method can be also converted to make binary distinctions between efficient and inefficient sgRNAs with superior performance to the existing methods. The analysis reveals that highly efficient sgRNAs have lower melting temperature at the middle of the spacer, cut at 5'-end closer parts of the genome and contain more 'A' but less 'G' comparing with inefficient ones. Comprehensive further analysis also demonstrates that our tool can predict an sgRNA's cutting efficiency with consistently good performance no matter it is expressed from an U6 promoter in cells or from a T7 promoter in vitro. Online tool is available at http://www.aai-bioinfo.com/CRISPR/. Python and Matlab source codes are freely available at https://github.com/penn-hui/TSAM. Jinyan.Li@uts.edu.au. Supplementary data are available at Bioinformatics online.
Automated Development of Accurate Algorithms and Efficient Codes for Computational Aeroacoustics
NASA Technical Reports Server (NTRS)
Goodrich, John W.; Dyson, Rodger W.
1999-01-01
The simulation of sound generation and propagation in three space dimensions with realistic aircraft components is a very large time dependent computation with fine details. Simulations in open domains with embedded objects require accurate and robust algorithms for propagation, for artificial inflow and outflow boundaries, and for the definition of geometrically complex objects. The development, implementation, and validation of methods for solving these demanding problems is being done to support the NASA pillar goals for reducing aircraft noise levels. Our goal is to provide algorithms which are sufficiently accurate and efficient to produce usable results rapidly enough to allow design engineers to study the effects on sound levels of design changes in propulsion systems, and in the integration of propulsion systems with airframes. There is a lack of design tools for these purposes at this time. Our technical approach to this problem combines the development of new, algorithms with the use of Mathematica and Unix utilities to automate the algorithm development, code implementation, and validation. We use explicit methods to ensure effective implementation by domain decomposition for SPMD parallel computing. There are several orders of magnitude difference in the computational efficiencies of the algorithms which we have considered. We currently have new artificial inflow and outflow boundary conditions that are stable, accurate, and unobtrusive, with implementations that match the accuracy and efficiency of the propagation methods. The artificial numerical boundary treatments have been proven to have solutions which converge to the full open domain problems, so that the error from the boundary treatments can be driven as low as is required. The purpose of this paper is to briefly present a method for developing highly accurate algorithms for computational aeroacoustics, the use of computer automation in this process, and a brief survey of the algorithms that have resulted from this work. A review of computational aeroacoustics has recently been given by Lele.
Choi, Jaewon; Jung, Hyung-Sup; Yun, Sang-Ho
2015-03-09
As the aerospace industry grows, images obtained from Earth observation satellites have been successfully used in various fields. Specifically, the demand for a high-resolution (HR) optical images is gradually increasing, and hence the generation of a high-quality mosaic image is being magnified as an interesting issue. In this paper, we have proposed an efficient mosaic algorithm for HR optical images that are significantly different due to seasonal change. The algorithm includes main steps such as: (1) seamline extraction from gradient magnitude and seam images; (2) histogram matching; and (3) image feathering. Eleven Kompsat-2 images characterized by seasonal variations are used for the performance validation of the proposed method. The results of the performance test show that the proposed method effectively mosaics Kompsat-2 adjacent images including severe seasonal changes. Moreover, the results reveal that the proposed method is applicable to HR optic images such as GeoEye, IKONOS, QuickBird, RapidEye, SPOT, WorldView, etc.
Rodríguez, Manuel; Magdaleno, Eduardo; Pérez, Fernando; García, Cristhian
2017-03-28
Non-equispaced Fast Fourier transform (NFFT) is a very important algorithm in several technological and scientific areas such as synthetic aperture radar, computational photography, medical imaging, telecommunications, seismic analysis and so on. However, its computation complexity is high. In this paper, we describe an efficient NFFT implementation with a hardware coprocessor using an All-Programmable System-on-Chip (APSoC). This is a hybrid device that employs an Advanced RISC Machine (ARM) as Processing System with Programmable Logic for high-performance digital signal processing through parallelism and pipeline techniques. The algorithm has been coded in C language with pragma directives to optimize the architecture of the system. We have used the very novel Software Develop System-on-Chip (SDSoC) evelopment tool that simplifies the interface and partitioning between hardware and software. This provides shorter development cycles and iterative improvements by exploring several architectures of the global system. The computational results shows that hardware acceleration significantly outperformed the software based implementation.
Rodríguez, Manuel; Magdaleno, Eduardo; Pérez, Fernando; García, Cristhian
2017-01-01
Non-equispaced Fast Fourier transform (NFFT) is a very important algorithm in several technological and scientific areas such as synthetic aperture radar, computational photography, medical imaging, telecommunications, seismic analysis and so on. However, its computation complexity is high. In this paper, we describe an efficient NFFT implementation with a hardware coprocessor using an All-Programmable System-on-Chip (APSoC). This is a hybrid device that employs an Advanced RISC Machine (ARM) as Processing System with Programmable Logic for high-performance digital signal processing through parallelism and pipeline techniques. The algorithm has been coded in C language with pragma directives to optimize the architecture of the system. We have used the very novel Software Develop System-on-Chip (SDSoC) evelopment tool that simplifies the interface and partitioning between hardware and software. This provides shorter development cycles and iterative improvements by exploring several architectures of the global system. The computational results shows that hardware acceleration significantly outperformed the software based implementation. PMID:28350358
Huang, Shuai; Li, Jing; Ye, Jieping; Fleisher, Adam; Chen, Kewei; Wu, Teresa; Reiman, Eric
2013-06-01
Structure learning of Bayesian Networks (BNs) is an important topic in machine learning. Driven by modern applications in genetics and brain sciences, accurate and efficient learning of large-scale BN structures from high-dimensional data becomes a challenging problem. To tackle this challenge, we propose a Sparse Bayesian Network (SBN) structure learning algorithm that employs a novel formulation involving one L1-norm penalty term to impose sparsity and another penalty term to ensure that the learned BN is a Directed Acyclic Graph--a required property of BNs. Through both theoretical analysis and extensive experiments on 11 moderate and large benchmark networks with various sample sizes, we show that SBN leads to improved learning accuracy, scalability, and efficiency as compared with 10 existing popular BN learning algorithms. We apply SBN to a real-world application of brain connectivity modeling for Alzheimer's disease (AD) and reveal findings that could lead to advancements in AD research.
Huang, Shuai; Li, Jing; Ye, Jieping; Fleisher, Adam; Chen, Kewei; Wu, Teresa; Reiman, Eric
2014-01-01
Structure learning of Bayesian Networks (BNs) is an important topic in machine learning. Driven by modern applications in genetics and brain sciences, accurate and efficient learning of large-scale BN structures from high-dimensional data becomes a challenging problem. To tackle this challenge, we propose a Sparse Bayesian Network (SBN) structure learning algorithm that employs a novel formulation involving one L1-norm penalty term to impose sparsity and another penalty term to ensure that the learned BN is a Directed Acyclic Graph (DAG)—a required property of BNs. Through both theoretical analysis and extensive experiments on 11 moderate and large benchmark networks with various sample sizes, we show that SBN leads to improved learning accuracy, scalability, and efficiency as compared with 10 existing popular BN learning algorithms. We apply SBN to a real-world application of brain connectivity modeling for Alzheimer’s disease (AD) and reveal findings that could lead to advancements in AD research. PMID:22665720
Machine Learning-based Intelligent Formal Reasoning and Proving System
NASA Astrophysics Data System (ADS)
Chen, Shengqing; Huang, Xiaojian; Fang, Jiaze; Liang, Jia
2018-03-01
The reasoning system can be used in many fields. How to improve reasoning efficiency is the core of the design of system. Through the formal description of formal proof and the regular matching algorithm, after introducing the machine learning algorithm, the system of intelligent formal reasoning and verification has high efficiency. The experimental results show that the system can verify the correctness of propositional logic reasoning and reuse the propositional logical reasoning results, so as to obtain the implicit knowledge in the knowledge base and provide the basic reasoning model for the construction of intelligent system.
NASA Astrophysics Data System (ADS)
Chen, Lei; Li, Dehua; Yang, Jie
2007-12-01
Constructing virtual international strategy environment needs many kinds of information, such as economy, politic, military, diploma, culture, science, etc. So it is very important to build an information auto-extract, classification, recombination and analysis management system with high efficiency as the foundation and component of military strategy hall. This paper firstly use improved Boost algorithm to classify obtained initial information, then use a strategy intelligence extract algorithm to extract strategy intelligence from initial information to help strategist to analysis information.
Density-matrix-based algorithm for solving eigenvalue problems
NASA Astrophysics Data System (ADS)
Polizzi, Eric
2009-03-01
A fast and stable numerical algorithm for solving the symmetric eigenvalue problem is presented. The technique deviates fundamentally from the traditional Krylov subspace iteration based techniques (Arnoldi and Lanczos algorithms) or other Davidson-Jacobi techniques and takes its inspiration from the contour integration and density-matrix representation in quantum mechanics. It will be shown that this algorithm—named FEAST—exhibits high efficiency, robustness, accuracy, and scalability on parallel architectures. Examples from electronic structure calculations of carbon nanotubes are presented, and numerical performances and capabilities are discussed.
1987-03-31
processors . The symmetry-breaking algorithms give efficient ways to convert probabilistic algorithms to deterministic algorithms. Some of the...techniques have been applied to construct several efficient linear- processor algorithms for graph problems, including an O(lg* n)-time algorithm for (A + 1...On n-node graphs, the algorithm works in O(log 2 n) time using only n processors , in contrast to the previous best algorithm which used about n3
NASA Astrophysics Data System (ADS)
da Silva, Thaísa Leal; Agostini, Luciano Volcan; da Silva Cruz, Luis A.
2014-05-01
Intra prediction is a very important tool in current video coding standards. High-efficiency video coding (HEVC) intra prediction presents relevant gains in encoding efficiency when compared to previous standards, but with a very important increase in the computational complexity since 33 directional angular modes must be evaluated. Motivated by this high complexity, this article presents a complexity reduction algorithm developed to reduce the HEVC intra mode decision complexity targeting multiview videos. The proposed algorithm presents an efficient fast intra prediction compliant with singleview and multiview video encoding. This fast solution defines a reduced subset of intra directions according to the video texture and it exploits the relationship between prediction units (PUs) of neighbor depth levels of the coding tree. This fast intra coding procedure is used to develop an inter-view prediction method, which exploits the relationship between the intra mode directions of adjacent views to further accelerate the intra prediction process in multiview video encoding applications. When compared to HEVC simulcast, our method achieves a complexity reduction of up to 47.77%, at the cost of an average BD-PSNR loss of 0.08 dB.
High performance transcription factor-DNA docking with GPU computing
2012-01-01
Background Protein-DNA docking is a very challenging problem in structural bioinformatics and has important implications in a number of applications, such as structure-based prediction of transcription factor binding sites and rational drug design. Protein-DNA docking is very computational demanding due to the high cost of energy calculation and the statistical nature of conformational sampling algorithms. More importantly, experiments show that the docking quality depends on the coverage of the conformational sampling space. It is therefore desirable to accelerate the computation of the docking algorithm, not only to reduce computing time, but also to improve docking quality. Methods In an attempt to accelerate the sampling process and to improve the docking performance, we developed a graphics processing unit (GPU)-based protein-DNA docking algorithm. The algorithm employs a potential-based energy function to describe the binding affinity of a protein-DNA pair, and integrates Monte-Carlo simulation and a simulated annealing method to search through the conformational space. Algorithmic techniques were developed to improve the computation efficiency and scalability on GPU-based high performance computing systems. Results The effectiveness of our approach is tested on a non-redundant set of 75 TF-DNA complexes and a newly developed TF-DNA docking benchmark. We demonstrated that the GPU-based docking algorithm can significantly accelerate the simulation process and thereby improving the chance of finding near-native TF-DNA complex structures. This study also suggests that further improvement in protein-DNA docking research would require efforts from two integral aspects: improvement in computation efficiency and energy function design. Conclusions We present a high performance computing approach for improving the prediction accuracy of protein-DNA docking. The GPU-based docking algorithm accelerates the search of the conformational space and thus increases the chance of finding more near-native structures. To the best of our knowledge, this is the first ad hoc effort of applying GPU or GPU clusters to the protein-DNA docking problem. PMID:22759575
CCOMP: An efficient algorithm for complex roots computation of determinantal equations
NASA Astrophysics Data System (ADS)
Zouros, Grigorios P.
2018-01-01
In this paper a free Python algorithm, entitled CCOMP (Complex roots COMPutation), is developed for the efficient computation of complex roots of determinantal equations inside a prescribed complex domain. The key to the method presented is the efficient determination of the candidate points inside the domain which, in their close neighborhood, a complex root may lie. Once these points are detected, the algorithm proceeds to a two-dimensional minimization problem with respect to the minimum modulus eigenvalue of the system matrix. In the core of CCOMP exist three sub-algorithms whose tasks are the efficient estimation of the minimum modulus eigenvalues of the system matrix inside the prescribed domain, the efficient computation of candidate points which guarantee the existence of minima, and finally, the computation of minima via bound constrained minimization algorithms. Theoretical results and heuristics support the development and the performance of the algorithm, which is discussed in detail. CCOMP supports general complex matrices, and its efficiency, applicability and validity is demonstrated to a variety of microwave applications.
Large Scale Frequent Pattern Mining using MPI One-Sided Model
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vishnu, Abhinav; Agarwal, Khushbu
In this paper, we propose a work-stealing runtime --- Library for Work Stealing LibWS --- using MPI one-sided model for designing scalable FP-Growth --- {\\em de facto} frequent pattern mining algorithm --- on large scale systems. LibWS provides locality efficient and highly scalable work-stealing techniques for load balancing on a variety of data distributions. We also propose a novel communication algorithm for FP-growth data exchange phase, which reduces the communication complexity from state-of-the-art O(p) to O(f + p/f) for p processes and f frequent attributed-ids. FP-Growth is implemented using LibWS and evaluated on several work distributions and support counts. Anmore » experimental evaluation of the FP-Growth on LibWS using 4096 processes on an InfiniBand Cluster demonstrates excellent efficiency for several work distributions (87\\% efficiency for Power-law and 91% for Poisson). The proposed distributed FP-Tree merging algorithm provides 38x communication speedup on 4096 cores.« less
Efficient parallel resolution of the simplified transport equations in mixed-dual formulation
NASA Astrophysics Data System (ADS)
Barrault, M.; Lathuilière, B.; Ramet, P.; Roman, J.
2011-03-01
A reactivity computation consists of computing the highest eigenvalue of a generalized eigenvalue problem, for which an inverse power algorithm is commonly used. Very fine modelizations are difficult to treat for our sequential solver, based on the simplified transport equations, in terms of memory consumption and computational time. A first implementation of a Lagrangian based domain decomposition method brings to a poor parallel efficiency because of an increase in the power iterations [1]. In order to obtain a high parallel efficiency, we improve the parallelization scheme by changing the location of the loop over the subdomains in the overall algorithm and by benefiting from the characteristics of the Raviart-Thomas finite element. The new parallel algorithm still allows us to locally adapt the numerical scheme (mesh, finite element order). However, it can be significantly optimized for the matching grid case. The good behavior of the new parallelization scheme is demonstrated for the matching grid case on several hundreds of nodes for computations based on a pin-by-pin discretization.
NASA Technical Reports Server (NTRS)
Jothiprasad, Giridhar; Mavriplis, Dimitri J.; Caughey, David A.; Bushnell, Dennis M. (Technical Monitor)
2002-01-01
The efficiency gains obtained using higher-order implicit Runge-Kutta schemes as compared with the second-order accurate backward difference schemes for the unsteady Navier-Stokes equations are investigated. Three different algorithms for solving the nonlinear system of equations arising at each timestep are presented. The first algorithm (NMG) is a pseudo-time-stepping scheme which employs a non-linear full approximation storage (FAS) agglomeration multigrid method to accelerate convergence. The other two algorithms are based on Inexact Newton's methods. The linear system arising at each Newton step is solved using iterative/Krylov techniques and left preconditioning is used to accelerate convergence of the linear solvers. One of the methods (LMG) uses Richardson's iterative scheme for solving the linear system at each Newton step while the other (PGMRES) uses the Generalized Minimal Residual method. Results demonstrating the relative superiority of these Newton's methods based schemes are presented. Efficiency gains as high as 10 are obtained by combining the higher-order time integration schemes with the more efficient nonlinear solvers.
On the development of efficient algorithms for three dimensional fluid flow
NASA Technical Reports Server (NTRS)
Maccormack, R. W.
1988-01-01
The difficulties of constructing efficient algorithms for three-dimensional flow are discussed. Reasonable candidates are analyzed and tested, and most are found to have obvious shortcomings. Yet, there is promise that an efficient class of algorithms exist between the severely time-step sized-limited explicit or approximately factored algorithms and the computationally intensive direct inversion of large sparse matrices by Gaussian elimination.
NASA Technical Reports Server (NTRS)
Mareboyana, Manohar; Le Moigne-Stewart, Jacqueline; Bennett, Jerome
2016-01-01
In this paper, we demonstrate a simple algorithm that projects low resolution (LR) images differing in subpixel shifts on a high resolution (HR) also called super resolution (SR) grid. The algorithm is very effective in accuracy as well as time efficiency. A number of spatial interpolation techniques using nearest neighbor, inverse-distance weighted averages, Radial Basis Functions (RBF) etc. used in projection yield comparable results. For best accuracy of reconstructing SR image by a factor of two requires four LR images differing in four independent subpixel shifts. The algorithm has two steps: i) registration of low resolution images and (ii) shifting the low resolution images to align with reference image and projecting them on high resolution grid based on the shifts of each low resolution image using different interpolation techniques. Experiments are conducted by simulating low resolution images by subpixel shifts and subsampling of original high resolution image and the reconstructing the high resolution images from the simulated low resolution images. The results of accuracy of reconstruction are compared by using mean squared error measure between original high resolution image and reconstructed image. The algorithm was tested on remote sensing images and found to outperform previously proposed techniques such as Iterative Back Projection algorithm (IBP), Maximum Likelihood (ML), and Maximum a posterior (MAP) algorithms. The algorithm is robust and is not overly sensitive to the registration inaccuracies.
A 3D inversion for all-space magnetotelluric data with static shift correction
NASA Astrophysics Data System (ADS)
Zhang, Kun
2017-04-01
Base on the previous studies on the static shift correction and 3D inversion algorithms, we improve the NLCG 3D inversion method and propose a new static shift correction method which work in the inversion. The static shift correction method is based on the 3D theory and real data. The static shift can be detected by the quantitative analysis of apparent parameters (apparent resistivity and impedance phase) of MT in high frequency range, and completed correction with inversion. The method is an automatic processing technology of computer with 0 cost, and avoids the additional field work and indoor processing with good results. The 3D inversion algorithm is improved (Zhang et al., 2013) base on the NLCG method of Newman & Alumbaugh (2000) and Rodi & Mackie (2001). For the algorithm, we added the parallel structure, improved the computational efficiency, reduced the memory of computer and added the topographic and marine factors. So the 3D inversion could work in general PC with high efficiency and accuracy. And all the MT data of surface stations, seabed stations and underground stations can be used in the inversion algorithm.
Parameter-tolerant design of high contrast gratings
NASA Astrophysics Data System (ADS)
Chevallier, Christyves; Fressengeas, Nicolas; Jacquet, Joel; Almuneau, Guilhem; Laaroussi, Youness; Gauthier-Lafaye, Olivier; Cerutti, Laurent; Genty, Frédéric
2015-02-01
This work is devoted to the design of high contrast grating mirrors taking into account the technological constraints and tolerance of fabrication. First, a global optimization algorithm has been combined to a numerical analysis of grating structures (RCWA) to automatically design HCG mirrors. Then, the tolerances of the grating dimensions have been precisely studied to develop a robust optimization algorithm with which high contrast gratings, exhibiting not only a high efficiency but also large tolerance values, could be designed. Finally, several structures integrating previously designed HCGs has been simulated to validate and illustrate the interest of such gratings.
An algorithmic framework for multiobjective optimization.
Ganesan, T; Elamvazuthi, I; Shaari, Ku Zilati Ku; Vasant, P
2013-01-01
Multiobjective (MO) optimization is an emerging field which is increasingly being encountered in many fields globally. Various metaheuristic techniques such as differential evolution (DE), genetic algorithm (GA), gravitational search algorithm (GSA), and particle swarm optimization (PSO) have been used in conjunction with scalarization techniques such as weighted sum approach and the normal-boundary intersection (NBI) method to solve MO problems. Nevertheless, many challenges still arise especially when dealing with problems with multiple objectives (especially in cases more than two). In addition, problems with extensive computational overhead emerge when dealing with hybrid algorithms. This paper discusses these issues by proposing an alternative framework that utilizes algorithmic concepts related to the problem structure for generating efficient and effective algorithms. This paper proposes a framework to generate new high-performance algorithms with minimal computational overhead for MO optimization.
An Algorithmic Framework for Multiobjective Optimization
Ganesan, T.; Elamvazuthi, I.; Shaari, Ku Zilati Ku; Vasant, P.
2013-01-01
Multiobjective (MO) optimization is an emerging field which is increasingly being encountered in many fields globally. Various metaheuristic techniques such as differential evolution (DE), genetic algorithm (GA), gravitational search algorithm (GSA), and particle swarm optimization (PSO) have been used in conjunction with scalarization techniques such as weighted sum approach and the normal-boundary intersection (NBI) method to solve MO problems. Nevertheless, many challenges still arise especially when dealing with problems with multiple objectives (especially in cases more than two). In addition, problems with extensive computational overhead emerge when dealing with hybrid algorithms. This paper discusses these issues by proposing an alternative framework that utilizes algorithmic concepts related to the problem structure for generating efficient and effective algorithms. This paper proposes a framework to generate new high-performance algorithms with minimal computational overhead for MO optimization. PMID:24470795
A Strassen-Newton algorithm for high-speed parallelizable matrix inversion
NASA Technical Reports Server (NTRS)
Bailey, David H.; Ferguson, Helaman R. P.
1988-01-01
Techniques are described for computing matrix inverses by algorithms that are highly suited to massively parallel computation. The techniques are based on an algorithm suggested by Strassen (1969). Variations of this scheme use matrix Newton iterations and other methods to improve the numerical stability while at the same time preserving a very high level of parallelism. One-processor Cray-2 implementations of these schemes range from one that is up to 55 percent faster than a conventional library routine to one that is slower than a library routine but achieves excellent numerical stability. The problem of computing the solution to a single set of linear equations is discussed, and it is shown that this problem can also be solved efficiently using these techniques.
Deist, T M; Gorissen, B L
2016-02-07
High-dose-rate brachytherapy is a tumor treatment method where a highly radioactive source is brought in close proximity to the tumor. In this paper we develop a simulated annealing algorithm to optimize the dwell times at preselected dwell positions to maximize tumor coverage under dose-volume constraints on the organs at risk. Compared to existing algorithms, our algorithm has advantages in terms of speed and objective value and does not require an expensive general purpose solver. Its success mainly depends on exploiting the efficiency of matrix multiplication and a careful selection of the neighboring states. In this paper we outline its details and make an in-depth comparison with existing methods using real patient data.
An efficient quantum algorithm for spectral estimation
NASA Astrophysics Data System (ADS)
Steffens, Adrian; Rebentrost, Patrick; Marvian, Iman; Eisert, Jens; Lloyd, Seth
2017-03-01
We develop an efficient quantum implementation of an important signal processing algorithm for line spectral estimation: the matrix pencil method, which determines the frequencies and damping factors of signals consisting of finite sums of exponentially damped sinusoids. Our algorithm provides a quantum speedup in a natural regime where the sampling rate is much higher than the number of sinusoid components. Along the way, we develop techniques that are expected to be useful for other quantum algorithms as well—consecutive phase estimations to efficiently make products of asymmetric low rank matrices classically accessible and an alternative method to efficiently exponentiate non-Hermitian matrices. Our algorithm features an efficient quantum-classical division of labor: the time-critical steps are implemented in quantum superposition, while an interjacent step, requiring much fewer parameters, can operate classically. We show that frequencies and damping factors can be obtained in time logarithmic in the number of sampling points, exponentially faster than known classical algorithms.
Pattern-based integer sample motion search strategies in the context of HEVC
NASA Astrophysics Data System (ADS)
Maier, Georg; Bross, Benjamin; Grois, Dan; Marpe, Detlev; Schwarz, Heiko; Veltkamp, Remco C.; Wiegand, Thomas
2015-09-01
The H.265/MPEG-H High Efficiency Video Coding (HEVC) standard provides a significant increase in coding efficiency compared to its predecessor, the H.264/MPEG-4 Advanced Video Coding (AVC) standard, which however comes at the cost of a high computational burden for a compliant encoder. Motion estimation (ME), which is a part of the inter-picture prediction process, typically consumes a high amount of computational resources, while significantly increasing the coding efficiency. In spite of the fact that both H.265/MPEG-H HEVC and H.264/MPEG-4 AVC standards allow processing motion information on a fractional sample level, the motion search algorithms based on the integer sample level remain to be an integral part of ME. In this paper, a flexible integer sample ME framework is proposed, thereby allowing to trade off significant reduction of ME computation time versus coding efficiency penalty in terms of bit rate overhead. As a result, through extensive experimentation, an integer sample ME algorithm that provides a good trade-off is derived, incorporating a combination and optimization of known predictive, pattern-based and early termination techniques. The proposed ME framework is implemented on a basis of the HEVC Test Model (HM) reference software, further being compared to the state-of-the-art fast search algorithm, which is a native part of HM. It is observed that for high resolution sequences, the integer sample ME process can be speed-up by factors varying from 3.2 to 7.6, resulting in the bit-rate overhead of 1.5% and 0.6% for Random Access (RA) and Low Delay P (LDP) configurations, respectively. In addition, the similar speed-up is observed for sequences with mainly Computer-Generated Imagery (CGI) content while trading off the bit rate overhead of up to 5.2%.
Improved transition path sampling methods for simulation of rare events
NASA Astrophysics Data System (ADS)
Chopra, Manan; Malshe, Rohit; Reddy, Allam S.; de Pablo, J. J.
2008-04-01
The free energy surfaces of a wide variety of systems encountered in physics, chemistry, and biology are characterized by the existence of deep minima separated by numerous barriers. One of the central aims of recent research in computational chemistry and physics has been to determine how transitions occur between deep local minima on rugged free energy landscapes, and transition path sampling (TPS) Monte-Carlo methods have emerged as an effective means for numerical investigation of such transitions. Many of the shortcomings of TPS-like approaches generally stem from their high computational demands. Two new algorithms are presented in this work that improve the efficiency of TPS simulations. The first algorithm uses biased shooting moves to render the sampling of reactive trajectories more efficient. The second algorithm is shown to substantially improve the accuracy of the transition state ensemble by introducing a subset of local transition path simulations in the transition state. The system considered in this work consists of a two-dimensional rough energy surface that is representative of numerous systems encountered in applications. When taken together, these algorithms provide gains in efficiency of over two orders of magnitude when compared to traditional TPS simulations.
Statistical efficiency of adaptive algorithms.
Widrow, Bernard; Kamenetsky, Max
2003-01-01
The statistical efficiency of a learning algorithm applied to the adaptation of a given set of variable weights is defined as the ratio of the quality of the converged solution to the amount of data used in training the weights. Statistical efficiency is computed by averaging over an ensemble of learning experiences. A high quality solution is very close to optimal, while a low quality solution corresponds to noisy weights and less than optimal performance. In this work, two gradient descent adaptive algorithms are compared, the LMS algorithm and the LMS/Newton algorithm. LMS is simple and practical, and is used in many applications worldwide. LMS/Newton is based on Newton's method and the LMS algorithm. LMS/Newton is optimal in the least squares sense. It maximizes the quality of its adaptive solution while minimizing the use of training data. Many least squares adaptive algorithms have been devised over the years, but no other least squares algorithm can give better performance, on average, than LMS/Newton. LMS is easily implemented, but LMS/Newton, although of great mathematical interest, cannot be implemented in most practical applications. Because of its optimality, LMS/Newton serves as a benchmark for all least squares adaptive algorithms. The performances of LMS and LMS/Newton are compared, and it is found that under many circumstances, both algorithms provide equal performance. For example, when both algorithms are tested with statistically nonstationary input signals, their average performances are equal. When adapting with stationary input signals and with random initial conditions, their respective learning times are on average equal. However, under worst-case initial conditions, the learning time of LMS can be much greater than that of LMS/Newton, and this is the principal disadvantage of the LMS algorithm. But the strong points of LMS are ease of implementation and optimal performance under important practical conditions. For these reasons, the LMS algorithm has enjoyed very widespread application. It is used in almost every modem for channel equalization and echo cancelling. Furthermore, it is related to the famous backpropagation algorithm used for training neural networks.
Memetic algorithms for de novo motif-finding in biomedical sequences.
Bi, Chengpeng
2012-09-01
The objectives of this study are to design and implement a new memetic algorithm for de novo motif discovery, which is then applied to detect important signals hidden in various biomedical molecular sequences. In this paper, memetic algorithms are developed and tested in de novo motif-finding problems. Several strategies in the algorithm design are employed that are to not only efficiently explore the multiple sequence local alignment space, but also effectively uncover the molecular signals. As a result, there are a number of key features in the implementation of the memetic motif-finding algorithm (MaMotif), including a chromosome replacement operator, a chromosome alteration-aware local search operator, a truncated local search strategy, and a stochastic operation of local search imposed on individual learning. To test the new algorithm, we compare MaMotif with a few of other similar algorithms using simulated and experimental data including genomic DNA, primary microRNA sequences (let-7 family), and transmembrane protein sequences. The new memetic motif-finding algorithm is successfully implemented in C++, and exhaustively tested with various simulated and real biological sequences. In the simulation, it shows that MaMotif is the most time-efficient algorithm compared with others, that is, it runs 2 times faster than the expectation maximization (EM) method and 16 times faster than the genetic algorithm-based EM hybrid. In both simulated and experimental testing, results show that the new algorithm is compared favorably or superior to other algorithms. Notably, MaMotif is able to successfully discover the transcription factors' binding sites in the chromatin immunoprecipitation followed by massively parallel sequencing (ChIP-Seq) data, correctly uncover the RNA splicing signals in gene expression, and precisely find the highly conserved helix motif in the transmembrane protein sequences, as well as rightly detect the palindromic segments in the primary microRNA sequences. The memetic motif-finding algorithm is effectively designed and implemented, and its applications demonstrate it is not only time-efficient, but also exhibits excellent performance while compared with other popular algorithms. Copyright © 2012 Elsevier B.V. All rights reserved.
Scalable Parallel Density-based Clustering and Applications
NASA Astrophysics Data System (ADS)
Patwary, Mostofa Ali
2014-04-01
Recently, density-based clustering algorithms (DBSCAN and OPTICS) have gotten significant attention of the scientific community due to their unique capability of discovering arbitrary shaped clusters and eliminating noise data. These algorithms have several applications, which require high performance computing, including finding halos and subhalos (clusters) from massive cosmology data in astrophysics, analyzing satellite images, X-ray crystallography, and anomaly detection. However, parallelization of these algorithms are extremely challenging as they exhibit inherent sequential data access order, unbalanced workload resulting in low parallel efficiency. To break the data access sequentiality and to achieve high parallelism, we develop new parallel algorithms, both for DBSCAN and OPTICS, designed using graph algorithmic techniques. For example, our parallel DBSCAN algorithm exploits the similarities between DBSCAN and computing connected components. Using datasets containing up to a billion floating point numbers, we show that our parallel density-based clustering algorithms significantly outperform the existing algorithms, achieving speedups up to 27.5 on 40 cores on shared memory architecture and speedups up to 5,765 using 8,192 cores on distributed memory architecture. In our experiments, we found that while achieving the scalability, our algorithms produce clustering results with comparable quality to the classical algorithms.
Sze, Sing-Hoi; Parrott, Jonathan J; Tarone, Aaron M
2017-12-06
While the continued development of high-throughput sequencing has facilitated studies of entire transcriptomes in non-model organisms, the incorporation of an increasing amount of RNA-Seq libraries has made de novo transcriptome assembly difficult. Although algorithms that can assemble a large amount of RNA-Seq data are available, they are generally very memory-intensive and can only be used to construct small assemblies. We develop a divide-and-conquer strategy that allows these algorithms to be utilized, by subdividing a large RNA-Seq data set into small libraries. Each individual library is assembled independently by an existing algorithm, and a merging algorithm is developed to combine these assemblies by picking a subset of high quality transcripts to form a large transcriptome. When compared to existing algorithms that return a single assembly directly, this strategy achieves comparable or increased accuracy as memory-efficient algorithms that can be used to process a large amount of RNA-Seq data, and comparable or decreased accuracy as memory-intensive algorithms that can only be used to construct small assemblies. Our divide-and-conquer strategy allows memory-intensive de novo transcriptome assembly algorithms to be utilized to construct large assemblies.
Efficient error correction for next-generation sequencing of viral amplicons
2012-01-01
Background Next-generation sequencing allows the analysis of an unprecedented number of viral sequence variants from infected patients, presenting a novel opportunity for understanding virus evolution, drug resistance and immune escape. However, sequencing in bulk is error prone. Thus, the generated data require error identification and correction. Most error-correction methods to date are not optimized for amplicon analysis and assume that the error rate is randomly distributed. Recent quality assessment of amplicon sequences obtained using 454-sequencing showed that the error rate is strongly linked to the presence and size of homopolymers, position in the sequence and length of the amplicon. All these parameters are strongly sequence specific and should be incorporated into the calibration of error-correction algorithms designed for amplicon sequencing. Results In this paper, we present two new efficient error correction algorithms optimized for viral amplicons: (i) k-mer-based error correction (KEC) and (ii) empirical frequency threshold (ET). Both were compared to a previously published clustering algorithm (SHORAH), in order to evaluate their relative performance on 24 experimental datasets obtained by 454-sequencing of amplicons with known sequences. All three algorithms show similar accuracy in finding true haplotypes. However, KEC and ET were significantly more efficient than SHORAH in removing false haplotypes and estimating the frequency of true ones. Conclusions Both algorithms, KEC and ET, are highly suitable for rapid recovery of error-free haplotypes obtained by 454-sequencing of amplicons from heterogeneous viruses. The implementations of the algorithms and data sets used for their testing are available at: http://alan.cs.gsu.edu/NGS/?q=content/pyrosequencing-error-correction-algorithm PMID:22759430
Efficient error correction for next-generation sequencing of viral amplicons.
Skums, Pavel; Dimitrova, Zoya; Campo, David S; Vaughan, Gilberto; Rossi, Livia; Forbi, Joseph C; Yokosawa, Jonny; Zelikovsky, Alex; Khudyakov, Yury
2012-06-25
Next-generation sequencing allows the analysis of an unprecedented number of viral sequence variants from infected patients, presenting a novel opportunity for understanding virus evolution, drug resistance and immune escape. However, sequencing in bulk is error prone. Thus, the generated data require error identification and correction. Most error-correction methods to date are not optimized for amplicon analysis and assume that the error rate is randomly distributed. Recent quality assessment of amplicon sequences obtained using 454-sequencing showed that the error rate is strongly linked to the presence and size of homopolymers, position in the sequence and length of the amplicon. All these parameters are strongly sequence specific and should be incorporated into the calibration of error-correction algorithms designed for amplicon sequencing. In this paper, we present two new efficient error correction algorithms optimized for viral amplicons: (i) k-mer-based error correction (KEC) and (ii) empirical frequency threshold (ET). Both were compared to a previously published clustering algorithm (SHORAH), in order to evaluate their relative performance on 24 experimental datasets obtained by 454-sequencing of amplicons with known sequences. All three algorithms show similar accuracy in finding true haplotypes. However, KEC and ET were significantly more efficient than SHORAH in removing false haplotypes and estimating the frequency of true ones. Both algorithms, KEC and ET, are highly suitable for rapid recovery of error-free haplotypes obtained by 454-sequencing of amplicons from heterogeneous viruses.The implementations of the algorithms and data sets used for their testing are available at: http://alan.cs.gsu.edu/NGS/?q=content/pyrosequencing-error-correction-algorithm.
NASA Astrophysics Data System (ADS)
Pei, Zongrui; Eisenbach, Markus
2017-06-01
Dislocations are among the most important defects in determining the mechanical properties of both conventional alloys and high-entropy alloys. The Peierls-Nabarro model supplies an efficient pathway to their geometries and mobility. The difficulty in solving the integro-differential Peierls-Nabarro equation is how to effectively avoid the local minima in the energy landscape of a dislocation core. Among the other methods to optimize the dislocation core structures, we choose the algorithm of Particle Swarm Optimization, an algorithm that simulates the social behaviors of organisms. By employing more particles (bigger swarm) and more iterative steps (allowing them to explore for longer time), the local minima can be effectively avoided. But this would require more computational cost. The advantage of this algorithm is that it is readily parallelized in modern high computing architecture. We demonstrate the performance of our parallelized algorithm scales linearly with the number of employed cores.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chacón, L., E-mail: chacon@lanl.gov; Chen, G.; Knoll, D.A.
We review the state of the art in the formulation, implementation, and performance of so-called high-order/low-order (HOLO) algorithms for challenging multiscale problems. HOLO algorithms attempt to couple one or several high-complexity physical models (the high-order model, HO) with low-complexity ones (the low-order model, LO). The primary goal of HOLO algorithms is to achieve nonlinear convergence between HO and LO components while minimizing memory footprint and managing the computational complexity in a practical manner. Key to the HOLO approach is the use of the LO representations to address temporal stiffness, effectively accelerating the convergence of the HO/LO coupled system. The HOLOmore » approach is broadly underpinned by the concept of nonlinear elimination, which enables segregation of the HO and LO components in ways that can effectively use heterogeneous architectures. The accuracy and efficiency benefits of HOLO algorithms are demonstrated with specific applications to radiation transport, gas dynamics, plasmas (both Eulerian and Lagrangian formulations), and ocean modeling. Across this broad application spectrum, HOLO algorithms achieve significant accuracy improvements at a fraction of the cost compared to conventional approaches. It follows that HOLO algorithms hold significant potential for high-fidelity system scale multiscale simulations leveraging exascale computing.« less
Information filtering via biased heat conduction.
Liu, Jian-Guo; Zhou, Tao; Guo, Qiang
2011-09-01
The process of heat conduction has recently found application in personalized recommendation [Zhou et al., Proc. Natl. Acad. Sci. USA 107, 4511 (2010)], which is of high diversity but low accuracy. By decreasing the temperatures of small-degree objects, we present an improved algorithm, called biased heat conduction, which could simultaneously enhance the accuracy and diversity. Extensive experimental analyses demonstrate that the accuracy on MovieLens, Netflix, and Delicious datasets could be improved by 43.5%, 55.4% and 19.2%, respectively, compared with the standard heat conduction algorithm and also the diversity is increased or approximately unchanged. Further statistical analyses suggest that the present algorithm could simultaneously identify users' mainstream and special tastes, resulting in better performance than the standard heat conduction algorithm. This work provides a creditable way for highly efficient information filtering.
Image analysis of multiple moving wood pieces in real time
NASA Astrophysics Data System (ADS)
Wang, Weixing
2006-02-01
This paper presents algorithms for image processing and image analysis of wood piece materials. The algorithms were designed for auto-detection of wood piece materials on a moving conveyor belt or a truck. When wood objects on moving, the hard task is to trace the contours of the objects in n optimal way. To make the algorithms work efficiently in the plant, a flexible online system was designed and developed, which mainly consists of image acquisition, image processing, object delineation and analysis. A number of newly-developed algorithms can delineate wood objects with high accuracy and high speed, and in the wood piece analysis part, each wood piece can be characterized by a number of visual parameters which can also be used for constructing experimental models directly in the system.
Data-driven gradient algorithm for high-precision quantum control
NASA Astrophysics Data System (ADS)
Wu, Re-Bing; Chu, Bing; Owens, David H.; Rabitz, Herschel
2018-04-01
In the quest to achieve scalable quantum information processing technologies, gradient-based optimal control algorithms (e.g., grape) are broadly used for implementing high-precision quantum gates, but their performance is often hindered by deterministic or random errors in the system model and the control electronics. In this paper, we show that grape can be taught to be more effective by jointly learning from the design model and the experimental data obtained from process tomography. The resulting data-driven gradient optimization algorithm (d-grape) can in principle correct all deterministic gate errors, with a mild efficiency loss. The d-grape algorithm may become more powerful with broadband controls that involve a large number of control parameters, while other algorithms usually slow down due to the increased size of the search space. These advantages are demonstrated by simulating the implementation of a two-qubit controlled-not gate.
Wen, Ying; Hou, Lili; He, Lianghua; Peterson, Bradley S; Xu, Dongrong
2015-05-01
Spatial normalization plays a key role in voxel-based analyses of brain images. We propose a highly accurate algorithm for high-dimensional spatial normalization of brain images based on the technique of symmetric optical flow. We first construct a three dimension optical model with the consistency assumption of intensity and consistency of the gradient of intensity under a constraint of discontinuity-preserving spatio-temporal smoothness. Then, an efficient inverse consistency optical flow is proposed with aims of higher registration accuracy, where the flow is naturally symmetric. By employing a hierarchical strategy ranging from coarse to fine scales of resolution and a method of Euler-Lagrange numerical analysis, our algorithm is capable of registering brain images data. Experiments using both simulated and real datasets demonstrated that the accuracy of our algorithm is not only better than that of those traditional optical flow algorithms, but also comparable to other registration methods used extensively in the medical imaging community. Moreover, our registration algorithm is fully automated, requiring a very limited number of parameters and no manual intervention. Copyright © 2015 Elsevier Inc. All rights reserved.
An Incremental High-Utility Mining Algorithm with Transaction Insertion
Gan, Wensheng; Zhang, Binbin
2015-01-01
Association-rule mining is commonly used to discover useful and meaningful patterns from a very large database. It only considers the occurrence frequencies of items to reveal the relationships among itemsets. Traditional association-rule mining is, however, not suitable in real-world applications since the purchased items from a customer may have various factors, such as profit or quantity. High-utility mining was designed to solve the limitations of association-rule mining by considering both the quantity and profit measures. Most algorithms of high-utility mining are designed to handle the static database. Fewer researches handle the dynamic high-utility mining with transaction insertion, thus requiring the computations of database rescan and combination explosion of pattern-growth mechanism. In this paper, an efficient incremental algorithm with transaction insertion is designed to reduce computations without candidate generation based on the utility-list structures. The enumeration tree and the relationships between 2-itemsets are also adopted in the proposed algorithm to speed up the computations. Several experiments are conducted to show the performance of the proposed algorithm in terms of runtime, memory consumption, and number of generated patterns. PMID:25811038
Two-dimensional priority-based dynamic resource allocation algorithm for QoS in WDM/TDM PON networks
NASA Astrophysics Data System (ADS)
Sun, Yixin; Liu, Bo; Zhang, Lijia; Xin, Xiangjun; Zhang, Qi; Rao, Lan
2018-01-01
Wavelength division multiplexing/time division multiplexing (WDM/TDM) passive optical networks (PON) is being viewed as a promising solution for delivering multiple services and applications. The hybrid WDM / TDM PON uses the wavelength and bandwidth allocation strategy to control the distribution of the wavelength channels in the uplink direction, so that it can ensure the high bandwidth requirements of multiple Optical Network Units (ONUs) while improving the wavelength resource utilization. Through the investigation of the presented dynamic bandwidth allocation algorithms, these algorithms can't satisfy the requirements of different levels of service very well while adapting to the structural characteristics of mixed WDM / TDM PON system. This paper introduces a novel wavelength and bandwidth allocation algorithm to efficiently utilize the bandwidth and support QoS (Quality of Service) guarantees in WDM/TDM PON. Two priority based polling subcycles are introduced in order to increase system efficiency and improve system performance. The fixed priority polling subcycle and dynamic priority polling subcycle follow different principles to implement wavelength and bandwidth allocation according to the priority of different levels of service. A simulation was conducted to study the performance of the priority based polling in dynamic resource allocation algorithm in WDM/TDM PON. The results show that the performance of delay-sensitive services is greatly improved without degrading QoS guarantees for other services. Compared with the traditional dynamic bandwidth allocation algorithms, this algorithm can meet bandwidth needs of different priority traffic class, achieve low loss rate performance, and ensure real-time of high priority traffic class in terms of overall traffic on the network.
Lin, Jingjing; Jing, Honglei
2016-01-01
Artificial immune system is one of the most recently introduced intelligence methods which was inspired by biological immune system. Most immune system inspired algorithms are based on the clonal selection principle, known as clonal selection algorithms (CSAs). When coping with complex optimization problems with the characteristics of multimodality, high dimension, rotation, and composition, the traditional CSAs often suffer from the premature convergence and unsatisfied accuracy. To address these concerning issues, a recombination operator inspired by the biological combinatorial recombination is proposed at first. The recombination operator could generate the promising candidate solution to enhance search ability of the CSA by fusing the information from random chosen parents. Furthermore, a modified hypermutation operator is introduced to construct more promising and efficient candidate solutions. A set of 16 common used benchmark functions are adopted to test the effectiveness and efficiency of the recombination and hypermutation operators. The comparisons with classic CSA, CSA with recombination operator (RCSA), and CSA with recombination and modified hypermutation operator (RHCSA) demonstrate that the proposed algorithm significantly improves the performance of classic CSA. Moreover, comparison with the state-of-the-art algorithms shows that the proposed algorithm is quite competitive. PMID:27698662
Biclustering Protein Complex Interactions with a Biclique FindingAlgorithm
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ding, Chris; Zhang, Anne Ya; Holbrook, Stephen
2006-12-01
Biclustering has many applications in text mining, web clickstream mining, and bioinformatics. When data entries are binary, the tightest biclusters become bicliques. We propose a flexible and highly efficient algorithm to compute bicliques. We first generalize the Motzkin-Straus formalism for computing the maximal clique from L{sub 1} constraint to L{sub p} constraint, which enables us to provide a generalized Motzkin-Straus formalism for computing maximal-edge bicliques. By adjusting parameters, the algorithm can favor biclusters with more rows less columns, or vice verse, thus increasing the flexibility of the targeted biclusters. We then propose an algorithm to solve the generalized Motzkin-Straus optimizationmore » problem. The algorithm is provably convergent and has a computational complexity of O(|E|) where |E| is the number of edges. It relies on a matrix vector multiplication and runs efficiently on most current computer architectures. Using this algorithm, we bicluster the yeast protein complex interaction network. We find that biclustering protein complexes at the protein level does not clearly reflect the functional linkage among protein complexes in many cases, while biclustering at protein domain level can reveal many underlying linkages. We show several new biologically significant results.« less
Hybrid dose calculation: a dose calculation algorithm for microbeam radiation therapy
NASA Astrophysics Data System (ADS)
Donzelli, Mattia; Bräuer-Krisch, Elke; Oelfke, Uwe; Wilkens, Jan J.; Bartzsch, Stefan
2018-02-01
Microbeam radiation therapy (MRT) is still a preclinical approach in radiation oncology that uses planar micrometre wide beamlets with extremely high peak doses, separated by a few hundred micrometre wide low dose regions. Abundant preclinical evidence demonstrates that MRT spares normal tissue more effectively than conventional radiation therapy, at equivalent tumour control. In order to launch first clinical trials, accurate and efficient dose calculation methods are an inevitable prerequisite. In this work a hybrid dose calculation approach is presented that is based on a combination of Monte Carlo and kernel based dose calculation. In various examples the performance of the algorithm is compared to purely Monte Carlo and purely kernel based dose calculations. The accuracy of the developed algorithm is comparable to conventional pure Monte Carlo calculations. In particular for inhomogeneous materials the hybrid dose calculation algorithm out-performs purely convolution based dose calculation approaches. It is demonstrated that the hybrid algorithm can efficiently calculate even complicated pencil beam and cross firing beam geometries. The required calculation times are substantially lower than for pure Monte Carlo calculations.
A Hybrid Shared-Memory Parallel Max-Tree Algorithm for Extreme Dynamic-Range Images.
Moschini, Ugo; Meijster, Arnold; Wilkinson, Michael H F
2018-03-01
Max-trees, or component trees, are graph structures that represent the connected components of an image in a hierarchical way. Nowadays, many application fields rely on images with high-dynamic range or floating point values. Efficient sequential algorithms exist to build trees and compute attributes for images of any bit depth. However, we show that the current parallel algorithms perform poorly already with integers at bit depths higher than 16 bits per pixel. We propose a parallel method combining the two worlds of flooding and merging max-tree algorithms. First, a pilot max-tree of a quantized version of the image is built in parallel using a flooding method. Later, this structure is used in a parallel leaf-to-root approach to compute efficiently the final max-tree and to drive the merging of the sub-trees computed by the threads. We present an analysis of the performance both on simulated and actual 2D images and 3D volumes. Execution times are about better than the fastest sequential algorithm and speed-up goes up to on 64 threads.
Hall, Gunnsteinn; Liang, Wenxuan; Li, Xingde
2017-10-01
Collagen fiber alignment derived from second harmonic generation (SHG) microscopy images can be important for disease diagnostics. Image processing algorithms are needed to robustly quantify the alignment in images with high sensitivity and reliability. Fourier transform (FT) magnitude, 2D power spectrum, and image autocorrelation have previously been used to extract fiber information from images by assuming a certain mathematical model (e.g. Gaussian distribution of the fiber-related parameters) and fitting. The fitting process is slow and fails to converge when the data is not Gaussian. Herein we present an efficient constant-time deterministic algorithm which characterizes the symmetricity of the FT magnitude image in terms of a single parameter, named the fiber alignment anisotropy R ranging from 0 (randomized fibers) to 1 (perfect alignment). This represents an important improvement of the technology and may bring us one step closer to utilizing the technology for various applications in real time. In addition, we present a digital image phantom-based framework for characterizing and validating the algorithm, as well as assessing the robustness of the algorithm against different perturbations.
Hu, Shaoxing; Xu, Shike; Wang, Duhu; Zhang, Aiwu
2015-11-11
Aiming at addressing the problem of high computational cost of the traditional Kalman filter in SINS/GPS, a practical optimization algorithm with offline-derivation and parallel processing methods based on the numerical characteristics of the system is presented in this paper. The algorithm exploits the sparseness and/or symmetry of matrices to simplify the computational procedure. Thus plenty of invalid operations can be avoided by offline derivation using a block matrix technique. For enhanced efficiency, a new parallel computational mechanism is established by subdividing and restructuring calculation processes after analyzing the extracted "useful" data. As a result, the algorithm saves about 90% of the CPU processing time and 66% of the memory usage needed in a classical Kalman filter. Meanwhile, the method as a numerical approach needs no precise-loss transformation/approximation of system modules and the accuracy suffers little in comparison with the filter before computational optimization. Furthermore, since no complicated matrix theories are needed, the algorithm can be easily transplanted into other modified filters as a secondary optimization method to achieve further efficiency.
EMILiO: a fast algorithm for genome-scale strain design.
Yang, Laurence; Cluett, William R; Mahadevan, Radhakrishnan
2011-05-01
Systems-level design of cell metabolism is becoming increasingly important for renewable production of fuels, chemicals, and drugs. Computational models are improving in the accuracy and scope of predictions, but are also growing in complexity. Consequently, efficient and scalable algorithms are increasingly important for strain design. Previous algorithms helped to consolidate the utility of computational modeling in this field. To meet intensifying demands for high-performance strains, both the number and variety of genetic manipulations involved in strain construction are increasing. Existing algorithms have experienced combinatorial increases in computational complexity when applied toward the design of such complex strains. Here, we present EMILiO, a new algorithm that increases the scope of strain design to include reactions with individually optimized fluxes. Unlike existing approaches that would experience an explosion in complexity to solve this problem, we efficiently generated numerous alternate strain designs producing succinate, l-glutamate and l-serine. This was enabled by successive linear programming, a technique new to the area of computational strain design. Copyright © 2011 Elsevier Inc. All rights reserved.
Steffensen, Jon Lund; Dufault-Thompson, Keith; Zhang, Ying
2018-01-01
The metabolism of individual organisms and biological communities can be viewed as a network of metabolites connected to each other through chemical reactions. In metabolic networks, chemical reactions transform reactants into products, thereby transferring elements between these metabolites. Knowledge of how elements are transferred through reactant/product pairs allows for the identification of primary compound connections through a metabolic network. However, such information is not readily available and is often challenging to obtain for large reaction databases or genome-scale metabolic models. In this study, a new algorithm was developed for automatically predicting the element-transferring reactant/product pairs using the limited information available in the standard representation of metabolic networks. The algorithm demonstrated high efficiency in analyzing large datasets and provided accurate predictions when benchmarked with manually curated data. Applying the algorithm to the visualization of metabolic networks highlighted pathways of primary reactant/product connections and provided an organized view of element-transferring biochemical transformations. The algorithm was implemented as a new function in the open source software package PSAMM in the release v0.30 (https://zhanglab.github.io/psamm/).
Online particle detection with Neural Networks based on topological calorimetry information
NASA Astrophysics Data System (ADS)
Ciodaro, T.; Deva, D.; de Seixas, J. M.; Damazio, D.
2012-06-01
This paper presents the latest results from the Ringer algorithm, which is based on artificial neural networks for the electron identification at the online filtering system of the ATLAS particle detector, in the context of the LHC experiment at CERN. The algorithm performs topological feature extraction using the ATLAS calorimetry information (energy measurements). The extracted information is presented to a neural network classifier. Studies showed that the Ringer algorithm achieves high detection efficiency, while keeping the false alarm rate low. Optimizations, guided by detailed analysis, reduced the algorithm execution time by 59%. Also, the total memory necessary to store the Ringer algorithm information represents less than 6.2 percent of the total filtering system amount.
Airport Flight Departure Delay Model on Improved BN Structure Learning
NASA Astrophysics Data System (ADS)
Cao, Weidong; Fang, Xiangnong
An high score prior genetic simulated annealing Bayesian network structure learning algorithm (HSPGSA) by combining genetic algorithm(GA) with simulated annealing algorithm(SAA) is developed. The new algorithm provides not only with strong global search capability of GA, but also with strong local hill climb search capability of SAA. The structure with the highest score is prior selected. In the mean time, structures with lower score are also could be choice. It can avoid efficiently prematurity problem by higher score individual wrong direct growing population. Algorithm is applied to flight departure delays analysis in a large hub airport. Based on the flight data a BN model is created. Experiments show that parameters learning can reflect departure delay.
NASA's Experience with UV Remote Using SBUV and TOMS Instruments
NASA Technical Reports Server (NTRS)
Bhartia, P. K.
1999-01-01
This paper will discuss key features of the NASA algorithm that has been used to produce several highly popular geophysical products from the Solar Backscatter Ultraviolet (SBUV) and Total Ozone Mapping Spectrometer (TOMS) series of instruments. Since these instruments have a limited number of wavelengths, many innovative algorithmic approaches have been developed over the years to derive maximum information from these sensors. We will use Global Ozone Monitoring Experiment (GOME) data to test the assumptions made in these algorithms and show what additional information is contained in the GOME hyperspectral data. At NASA we are using this information to improve the SBUV and TOMS algorithms, as well as to develop more efficient algorithms to process GOME data.
Estimating the size of the solution space of metabolic networks
Braunstein, Alfredo; Mulet, Roberto; Pagnani, Andrea
2008-01-01
Background Cellular metabolism is one of the most investigated system of biological interactions. While the topological nature of individual reactions and pathways in the network is quite well understood there is still a lack of comprehension regarding the global functional behavior of the system. In the last few years flux-balance analysis (FBA) has been the most successful and widely used technique for studying metabolism at system level. This method strongly relies on the hypothesis that the organism maximizes an objective function. However only under very specific biological conditions (e.g. maximization of biomass for E. coli in reach nutrient medium) the cell seems to obey such optimization law. A more refined analysis not assuming extremization remains an elusive task for large metabolic systems due to algorithmic limitations. Results In this work we propose a novel algorithmic strategy that provides an efficient characterization of the whole set of stable fluxes compatible with the metabolic constraints. Using a technique derived from the fields of statistical physics and information theory we designed a message-passing algorithm to estimate the size of the affine space containing all possible steady-state flux distributions of metabolic networks. The algorithm, based on the well known Bethe approximation, can be used to approximately compute the volume of a non full-dimensional convex polytope in high dimensions. We first compare the accuracy of the predictions with an exact algorithm on small random metabolic networks. We also verify that the predictions of the algorithm match closely those of Monte Carlo based methods in the case of the Red Blood Cell metabolic network. Then we test the effect of gene knock-outs on the size of the solution space in the case of E. coli central metabolism. Finally we analyze the statistical properties of the average fluxes of the reactions in the E. coli metabolic network. Conclusion We propose a novel efficient distributed algorithmic strategy to estimate the size and shape of the affine space of a non full-dimensional convex polytope in high dimensions. The method is shown to obtain, quantitatively and qualitatively compatible results with the ones of standard algorithms (where this comparison is possible) being still efficient on the analysis of large biological systems, where exact deterministic methods experience an explosion in algorithmic time. The algorithm we propose can be considered as an alternative to Monte Carlo sampling methods. PMID:18489757
An efficient algorithm for global periodic orbits generation near irregular-shaped asteroids
NASA Astrophysics Data System (ADS)
Shang, Haibin; Wu, Xiaoyu; Ren, Yuan; Shan, Jinjun
2017-07-01
Periodic orbits (POs) play an important role in understanding dynamical behaviors around natural celestial bodies. In this study, an efficient algorithm was presented to generate the global POs around irregular-shaped uniformly rotating asteroids. The algorithm was performed in three steps, namely global search, local refinement, and model continuation. First, a mascon model with a low number of particles and optimized mass distribution was constructed to remodel the exterior gravitational potential of the asteroid. Using this model, a multi-start differential evolution enhanced with a deflection strategy with strong global exploration and bypassing abilities was adopted. This algorithm can be regarded as a search engine to find multiple globally optimal regions in which potential POs were located. This was followed by applying a differential correction to locally refine global search solutions and generate the accurate POs in the mascon model in which an analytical Jacobian matrix was derived to improve convergence. Finally, the concept of numerical model continuation was introduced and used to convert the POs from the mascon model into a high-fidelity polyhedron model by sequentially correcting the initial states. The efficiency of the proposed algorithm was substantiated by computing the global POs around an elongated shoe-shaped asteroid 433 Eros. Various global POs with different topological structures in the configuration space were successfully located. Specifically, the proposed algorithm was generic and could be conveniently extended to explore periodic motions in other gravitational systems.
Tensor completion for estimating missing values in visual data.
Liu, Ji; Musialski, Przemyslaw; Wonka, Peter; Ye, Jieping
2013-01-01
In this paper, we propose an algorithm to estimate missing values in tensors of visual data. The values can be missing due to problems in the acquisition process or because the user manually identified unwanted outliers. Our algorithm works even with a small amount of samples and it can propagate structure to fill larger missing regions. Our methodology is built on recent studies about matrix completion using the matrix trace norm. The contribution of our paper is to extend the matrix case to the tensor case by proposing the first definition of the trace norm for tensors and then by building a working algorithm. First, we propose a definition for the tensor trace norm that generalizes the established definition of the matrix trace norm. Second, similarly to matrix completion, the tensor completion is formulated as a convex optimization problem. Unfortunately, the straightforward problem extension is significantly harder to solve than the matrix case because of the dependency among multiple constraints. To tackle this problem, we developed three algorithms: simple low rank tensor completion (SiLRTC), fast low rank tensor completion (FaLRTC), and high accuracy low rank tensor completion (HaLRTC). The SiLRTC algorithm is simple to implement and employs a relaxation technique to separate the dependent relationships and uses the block coordinate descent (BCD) method to achieve a globally optimal solution; the FaLRTC algorithm utilizes a smoothing scheme to transform the original nonsmooth problem into a smooth one and can be used to solve a general tensor trace norm minimization problem; the HaLRTC algorithm applies the alternating direction method of multipliers (ADMMs) to our problem. Our experiments show potential applications of our algorithms and the quantitative evaluation indicates that our methods are more accurate and robust than heuristic approaches. The efficiency comparison indicates that FaLTRC and HaLRTC are more efficient than SiLRTC and between FaLRTC an- HaLRTC the former is more efficient to obtain a low accuracy solution and the latter is preferred if a high-accuracy solution is desired.
NASA Technical Reports Server (NTRS)
Watson, Brian; Kamat, M. P.
1990-01-01
Element-by-element preconditioned conjugate gradient (EBE-PCG) algorithms have been advocated for use in parallel/vector processing environments as being superior to the conventional LDL(exp T) decomposition algorithm for single load cases. Although there may be some advantages in using such algorithms for a single load case, when it comes to situations involving multiple load cases, the LDL(exp T) decomposition algorithm would appear to be decidedly more cost-effective. The authors have outlined an EBE-PCG algorithm suitable for multiple load cases and compared its effectiveness to the highly efficient LDL(exp T) decomposition scheme. The proposed algorithm offers almost no advantages over the LDL(exp T) algorithm for the linear problems investigated on the Alliant FX/8. However, there may be some merit in the algorithm in solving nonlinear problems with load incrementation, but that remains to be investigated.
NASA Astrophysics Data System (ADS)
Tsang, Sik-Ho; Chan, Yui-Lam; Siu, Wan-Chi
2017-01-01
Weighted prediction (WP) is an efficient video coding tool that was introduced since the establishment of the H.264/AVC video coding standard, for compensating the temporal illumination change in motion estimation and compensation. WP parameters, including a multiplicative weight and an additive offset for each reference frame, are required to be estimated and transmitted to the decoder by slice header. These parameters cause extra bits in the coded video bitstream. High efficiency video coding (HEVC) provides WP parameter prediction to reduce the overhead. Therefore, WP parameter prediction is crucial to research works or applications, which are related to WP. Prior art has been suggested to further improve the WP parameter prediction by implicit prediction of image characteristics and derivation of parameters. By exploiting both temporal and interlayer redundancies, we propose three WP parameter prediction algorithms, enhanced implicit WP parameter, enhanced direct WP parameter derivation, and interlayer WP parameter, to further improve the coding efficiency of HEVC. Results show that our proposed algorithms can achieve up to 5.83% and 5.23% bitrate reduction compared to the conventional scalable HEVC in the base layer for SNR scalability and 2× spatial scalability, respectively.
Edge-Based Efficient Search over Encrypted Data Mobile Cloud Storage
Liu, Fang; Cai, Zhiping; Xiao, Nong; Zhao, Ziming
2018-01-01
Smart sensor-equipped mobile devices sense, collect, and process data generated by the edge network to achieve intelligent control, but such mobile devices usually have limited storage and computing resources. Mobile cloud storage provides a promising solution owing to its rich storage resources, great accessibility, and low cost. But it also brings a risk of information leakage. The encryption of sensitive data is the basic step to resist the risk. However, deploying a high complexity encryption and decryption algorithm on mobile devices will greatly increase the burden of terminal operation and the difficulty to implement the necessary privacy protection algorithm. In this paper, we propose ENSURE (EfficieNt and SecURE), an efficient and secure encrypted search architecture over mobile cloud storage. ENSURE is inspired by edge computing. It allows mobile devices to offload the computation intensive task onto the edge server to achieve a high efficiency. Besides, to protect data security, it reduces the information acquisition of untrusted cloud by hiding the relevance between query keyword and search results from the cloud. Experiments on a real data set show that ENSURE reduces the computation time by 15% to 49% and saves the energy consumption by 38% to 69% per query. PMID:29652810
Edge-Based Efficient Search over Encrypted Data Mobile Cloud Storage.
Guo, Yeting; Liu, Fang; Cai, Zhiping; Xiao, Nong; Zhao, Ziming
2018-04-13
Smart sensor-equipped mobile devices sense, collect, and process data generated by the edge network to achieve intelligent control, but such mobile devices usually have limited storage and computing resources. Mobile cloud storage provides a promising solution owing to its rich storage resources, great accessibility, and low cost. But it also brings a risk of information leakage. The encryption of sensitive data is the basic step to resist the risk. However, deploying a high complexity encryption and decryption algorithm on mobile devices will greatly increase the burden of terminal operation and the difficulty to implement the necessary privacy protection algorithm. In this paper, we propose ENSURE (EfficieNt and SecURE), an efficient and secure encrypted search architecture over mobile cloud storage. ENSURE is inspired by edge computing. It allows mobile devices to offload the computation intensive task onto the edge server to achieve a high efficiency. Besides, to protect data security, it reduces the information acquisition of untrusted cloud by hiding the relevance between query keyword and search results from the cloud. Experiments on a real data set show that ENSURE reduces the computation time by 15% to 49% and saves the energy consumption by 38% to 69% per query.
Optimization design of wind turbine drive train based on Matlab genetic algorithm toolbox
NASA Astrophysics Data System (ADS)
Li, R. N.; Liu, X.; Liu, S. J.
2013-12-01
In order to ensure the high efficiency of the whole flexible drive train of the front-end speed adjusting wind turbine, the working principle of the main part of the drive train is analyzed. As critical parameters, rotating speed ratios of three planetary gear trains are selected as the research subject. The mathematical model of the torque converter speed ratio is established based on these three critical variable quantity, and the effect of key parameters on the efficiency of hydraulic mechanical transmission is analyzed. Based on the torque balance and the energy balance, refer to hydraulic mechanical transmission characteristics, the transmission efficiency expression of the whole drive train is established. The fitness function and constraint functions are established respectively based on the drive train transmission efficiency and the torque converter rotating speed ratio range. And the optimization calculation is carried out by using MATLAB genetic algorithm toolbox. The optimization method and results provide an optimization program for exact match of wind turbine rotor, gearbox, hydraulic mechanical transmission, hydraulic torque converter and synchronous generator, ensure that the drive train work with a high efficiency, and give a reference for the selection of the torque converter and hydraulic mechanical transmission.
Efficient algorithms for single-axis attitude estimation
NASA Technical Reports Server (NTRS)
Shuster, M. D.
1981-01-01
The computationally efficient algorithms determine attitude from the measurement of art lengths and dihedral angles. The dependence of these algorithms on the solution of trigonometric equations was reduced. Both single time and batch estimators are presented along with the covariance analysis of each algorithm.
Ultrafast treatment plan optimization for volumetric modulated arc therapy (VMAT)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Men Chunhua; Romeijn, H. Edwin; Jia Xun
2010-11-15
Purpose: To develop a novel aperture-based algorithm for volumetric modulated arc therapy (VMAT) treatment plan optimization with high quality and high efficiency. Methods: The VMAT optimization problem is formulated as a large-scale convex programming problem solved by a column generation approach. The authors consider a cost function consisting two terms, the first enforcing a desired dose distribution and the second guaranteeing a smooth dose rate variation between successive gantry angles. A gantry rotation is discretized into 180 beam angles and for each beam angle, only one MLC aperture is allowed. The apertures are generated one by one in a sequentialmore » way. At each iteration of the column generation method, a deliverable MLC aperture is generated for one of the unoccupied beam angles by solving a subproblem with the consideration of MLC mechanic constraints. A subsequent master problem is then solved to determine the dose rate at all currently generated apertures by minimizing the cost function. When all 180 beam angles are occupied, the optimization completes, yielding a set of deliverable apertures and associated dose rates that produce a high quality plan. Results: The algorithm was preliminarily tested on five prostate and five head-and-neck clinical cases, each with one full gantry rotation without any couch/collimator rotations. High quality VMAT plans have been generated for all ten cases with extremely high efficiency. It takes only 5-8 min on CPU (MATLAB code on an Intel Xeon 2.27 GHz CPU) and 18-31 s on GPU (CUDA code on an NVIDIA Tesla C1060 GPU card) to generate such plans. Conclusions: The authors have developed an aperture-based VMAT optimization algorithm which can generate clinically deliverable high quality treatment plans at very high efficiency.« less
Ultrafast treatment plan optimization for volumetric modulated arc therapy (VMAT).
Men, Chunhua; Romeijn, H Edwin; Jia, Xun; Jiang, Steve B
2010-11-01
To develop a novel aperture-based algorithm for volumetric modulated are therapy (VMAT) treatment plan optimization with high quality and high efficiency. The VMAT optimization problem is formulated as a large-scale convex programming problem solved by a column generation approach. The authors consider a cost function consisting two terms, the first enforcing a desired dose distribution and the second guaranteeing a smooth dose rate variation between successive gantry angles. A gantry rotation is discretized into 180 beam angles and for each beam angle, only one MLC aperture is allowed. The apertures are generated one by one in a sequential way. At each iteration of the column generation method, a deliverable MLC aperture is generated for one of the unoccupied beam angles by solving a subproblem with the consideration of MLC mechanic constraints. A subsequent master problem is then solved to determine the dose rate at all currently generated apertures by minimizing the cost function. When all 180 beam angles are occupied, the optimization completes, yielding a set of deliverable apertures and associated dose rates that produce a high quality plan. The algorithm was preliminarily tested on five prostate and five head-and-neck clinical cases, each with one full gantry rotation without any couch/collimator rotations. High quality VMAT plans have been generated for all ten cases with extremely high efficiency. It takes only 5-8 min on CPU (MATLAB code on an Intel Xeon 2.27 GHz CPU) and 18-31 s on GPU (CUDA code on an NVIDIA Tesla C1060 GPU card) to generate such plans. The authors have developed an aperture-based VMAT optimization algorithm which can generate clinically deliverable high quality treatment plans at very high efficiency.
Group implicit concurrent algorithms in nonlinear structural dynamics
NASA Technical Reports Server (NTRS)
Ortiz, M.; Sotelino, E. D.
1989-01-01
During the 70's and 80's, considerable effort was devoted to developing efficient and reliable time stepping procedures for transient structural analysis. Mathematically, the equations governing this type of problems are generally stiff, i.e., they exhibit a wide spectrum in the linear range. The algorithms best suited to this type of applications are those which accurately integrate the low frequency content of the response without necessitating the resolution of the high frequency modes. This means that the algorithms must be unconditionally stable, which in turn rules out explicit integration. The most exciting possibility in the algorithms development area in recent years has been the advent of parallel computers with multiprocessing capabilities. So, this work is mainly concerned with the development of parallel algorithms in the area of structural dynamics. A primary objective is to devise unconditionally stable and accurate time stepping procedures which lend themselves to an efficient implementation in concurrent machines. Some features of the new computer architecture are summarized. A brief survey of current efforts in the area is presented. A new class of concurrent procedures, or Group Implicit algorithms is introduced and analyzed. The numerical simulation shows that GI algorithms hold considerable promise for application in coarse grain as well as medium grain parallel computers.
NASA Astrophysics Data System (ADS)
Wu, Lianglong; Fu, Xiquan; Guo, Xing
2013-03-01
In this paper, we propose a modified adaptive algorithm (MAA) of dealing with the high chirp to efficiently simulate the propagation of chirped pulses along an optical fiber for the propagation distance shorter than the "temporal focal length". The basis of the MAA is that the chirp term of initial pulse is treated as the rapidly varying part by means of the idea of the slowly varying envelope approximation (SVEA). Numerical simulations show that the performance of the MAA is validated, and that the proposed method can decrease the number of sampling points by orders of magnitude. In addition, the computational efficiency of the MAA compared with the time-domain beam propagation method (BPM) can be enhanced with the increase of the chirp of initial pulse.
SAGE: The Self-Adaptive Grid Code. 3
NASA Technical Reports Server (NTRS)
Davies, Carol B.; Venkatapathy, Ethiraj
1999-01-01
The multi-dimensional self-adaptive grid code, SAGE, is an important tool in the field of computational fluid dynamics (CFD). It provides an efficient method to improve the accuracy of flow solutions while simultaneously reducing computer processing time. Briefly, SAGE enhances an initial computational grid by redistributing the mesh points into more appropriate locations. The movement of these points is driven by an equal-error-distribution algorithm that utilizes the relationship between high flow gradients and excessive solution errors. The method also provides a balance between clustering points in the high gradient regions and maintaining the smoothness and continuity of the adapted grid, The latest version, Version 3, includes the ability to change the boundaries of a given grid to more efficiently enclose flow structures and provides alternative redistribution algorithms.
NASA Astrophysics Data System (ADS)
Ziegler, Benjamin; Rauhut, Guntram
2016-03-01
The transformation of multi-dimensional potential energy surfaces (PESs) from a grid-based multimode representation to an analytical one is a standard procedure in quantum chemical programs. Within the framework of linear least squares fitting, a simple and highly efficient algorithm is presented, which relies on a direct product representation of the PES and a repeated use of Kronecker products. It shows the same scalings in computational cost and memory requirements as the potfit approach. In comparison to customary linear least squares fitting algorithms, this corresponds to a speed-up and memory saving by several orders of magnitude. Different fitting bases are tested, namely, polynomials, B-splines, and distributed Gaussians. Benchmark calculations are provided for the PESs of a set of small molecules.
Ziegler, Benjamin; Rauhut, Guntram
2016-03-21
The transformation of multi-dimensional potential energy surfaces (PESs) from a grid-based multimode representation to an analytical one is a standard procedure in quantum chemical programs. Within the framework of linear least squares fitting, a simple and highly efficient algorithm is presented, which relies on a direct product representation of the PES and a repeated use of Kronecker products. It shows the same scalings in computational cost and memory requirements as the potfit approach. In comparison to customary linear least squares fitting algorithms, this corresponds to a speed-up and memory saving by several orders of magnitude. Different fitting bases are tested, namely, polynomials, B-splines, and distributed Gaussians. Benchmark calculations are provided for the PESs of a set of small molecules.
High rate concatenated coding systems using bandwidth efficient trellis inner codes
NASA Technical Reports Server (NTRS)
Deng, Robert H.; Costello, Daniel J., Jr.
1989-01-01
High-rate concatenated coding systems with bandwidth-efficient trellis inner codes and Reed-Solomon (RS) outer codes are investigated for application in high-speed satellite communication systems. Two concatenated coding schemes are proposed. In one the inner code is decoded with soft-decision Viterbi decoding, and the outer RS code performs error-correction-only decoding (decoding without side information). In the other, the inner code is decoded with a modified Viterbi algorithm, which produces reliability information along with the decoded output. In this algorithm, path metrics are used to estimate the entire information sequence, whereas branch metrics are used to provide reliability information on the decoded sequence. This information is used to erase unreliable bits in the decoded output. An errors-and-erasures RS decoder is then used for the outer code. The two schemes have been proposed for high-speed data communication on NASA satellite channels. The rates considered are at least double those used in current NASA systems, and the results indicate that high system reliability can still be achieved.
Scalable Nearest Neighbor Algorithms for High Dimensional Data.
Muja, Marius; Lowe, David G
2014-11-01
For many computer vision and machine learning problems, large training sets are key for good performance. However, the most computationally expensive part of many computer vision and machine learning algorithms consists of finding nearest neighbor matches to high dimensional vectors that represent the training data. We propose new algorithms for approximate nearest neighbor matching and evaluate and compare them with previous algorithms. For matching high dimensional features, we find two algorithms to be the most efficient: the randomized k-d forest and a new algorithm proposed in this paper, the priority search k-means tree. We also propose a new algorithm for matching binary features by searching multiple hierarchical clustering trees and show it outperforms methods typically used in the literature. We show that the optimal nearest neighbor algorithm and its parameters depend on the data set characteristics and describe an automated configuration procedure for finding the best algorithm to search a particular data set. In order to scale to very large data sets that would otherwise not fit in the memory of a single machine, we propose a distributed nearest neighbor matching framework that can be used with any of the algorithms described in the paper. All this research has been released as an open source library called fast library for approximate nearest neighbors (FLANN), which has been incorporated into OpenCV and is now one of the most popular libraries for nearest neighbor matching.
NASA Astrophysics Data System (ADS)
Becker, Matthew Rand
I present a new algorithm, CALCLENS, for efficiently computing weak gravitational lensing shear signals from large N-body light cone simulations over a curved sky. This new algorithm properly accounts for the sky curvature and boundary conditions, is able to produce redshift- dependent shear signals including corrections to the Born approximation by using multiple- plane ray tracing, and properly computes the lensed images of source galaxies in the light cone. The key feature of this algorithm is a new, computationally efficient Poisson solver for the sphere that combines spherical harmonic transform and multigrid methods. As a result, large areas of sky (~10,000 square degrees) can be ray traced efficiently at high-resolution using only a few hundred cores. Using this new algorithm and curved-sky calculations that only use a slower but more accurate spherical harmonic transform Poisson solver, I study the convergence, shear E-mode, shear B-mode and rotation mode power spectra. Employing full-sky E/B-mode decompositions, I confirm that the numerically computed shear B-mode and rotation mode power spectra are equal at high accuracy ( ≲ 1%) as expected from perturbation theory up to second order. Coupled with realistic galaxy populations placed in large N-body light cone simulations, this new algorithm is ideally suited for the construction of synthetic weak lensing shear catalogs to be used to test for systematic effects in data analysis procedures for upcoming large-area sky surveys. The implementation presented in this work, written in C and employing widely available software libraries to maintain portability, is publicly available at http://code.google.com/p/calclens.
NASA Astrophysics Data System (ADS)
Becker, Matthew R.
2013-10-01
I present a new algorithm, Curved-sky grAvitational Lensing for Cosmological Light conE simulatioNS (CALCLENS), for efficiently computing weak gravitational lensing shear signals from large N-body light cone simulations over a curved sky. This new algorithm properly accounts for the sky curvature and boundary conditions, is able to produce redshift-dependent shear signals including corrections to the Born approximation by using multiple-plane ray tracing and properly computes the lensed images of source galaxies in the light cone. The key feature of this algorithm is a new, computationally efficient Poisson solver for the sphere that combines spherical harmonic transform and multigrid methods. As a result, large areas of sky (˜10 000 square degrees) can be ray traced efficiently at high resolution using only a few hundred cores. Using this new algorithm and curved-sky calculations that only use a slower but more accurate spherical harmonic transform Poisson solver, I study the convergence, shear E-mode, shear B-mode and rotation mode power spectra. Employing full-sky E/B-mode decompositions, I confirm that the numerically computed shear B-mode and rotation mode power spectra are equal at high accuracy (≲1 per cent) as expected from perturbation theory up to second order. Coupled with realistic galaxy populations placed in large N-body light cone simulations, this new algorithm is ideally suited for the construction of synthetic weak lensing shear catalogues to be used to test for systematic effects in data analysis procedures for upcoming large-area sky surveys. The implementation presented in this work, written in C and employing widely available software libraries to maintain portability, is publicly available at http://code.google.com/p/calclens.
NASA Astrophysics Data System (ADS)
Doha, E.; Bhrawy, A.
2006-06-01
It is well known that spectral methods (tau, Galerkin, collocation) have a condition number of ( is the number of retained modes of polynomial approximations). This paper presents some efficient spectral algorithms, which have a condition number of , based on the Jacobi?Galerkin methods of second-order elliptic equations in one and two space variables. The key to the efficiency of these algorithms is to construct appropriate base functions, which lead to systems with specially structured matrices that can be efficiently inverted. The complexities of the algorithms are a small multiple of operations for a -dimensional domain with unknowns, while the convergence rates of the algorithms are exponentials with smooth solutions.
Spectral compression algorithms for the analysis of very large multivariate images
Keenan, Michael R.
2007-10-16
A method for spectrally compressing data sets enables the efficient analysis of very large multivariate images. The spectral compression algorithm uses a factored representation of the data that can be obtained from Principal Components Analysis or other factorization technique. Furthermore, a block algorithm can be used for performing common operations more efficiently. An image analysis can be performed on the factored representation of the data, using only the most significant factors. The spectral compression algorithm can be combined with a spatial compression algorithm to provide further computational efficiencies.
A scalable parallel algorithm for multiple objective linear programs
NASA Technical Reports Server (NTRS)
Wiecek, Malgorzata M.; Zhang, Hong
1994-01-01
This paper presents an ADBASE-based parallel algorithm for solving multiple objective linear programs (MOLP's). Job balance, speedup and scalability are of primary interest in evaluating efficiency of the new algorithm. Implementation results on Intel iPSC/2 and Paragon multiprocessors show that the algorithm significantly speeds up the process of solving MOLP's, which is understood as generating all or some efficient extreme points and unbounded efficient edges. The algorithm gives specially good results for large and very large problems. Motivation and justification for solving such large MOLP's are also included.
A new precoding scheme for spectral efficient optical OFDM systems
NASA Astrophysics Data System (ADS)
Hardan, Saad Mshhain; Bayat, Oguz; Abdulkafi, Ayad Atiyah
2018-07-01
Achieving high spectral efficiency is the key requirement of 5G and optical wireless communication systems and has recently attracted much attention, aiming to satisfy the ever increasing demand for high data rates in communications systems. In this paper, we propose a new precoding/decoding algorithm for spectral efficient optical orthogonal frequency division multiplexing (OFDM) scheme based visible light communication (VLC) systems. The proposed coded modulated optical (CMO) based OFDM system can be applied for both single input single output (SISO) and multiple input multiple-output (MIMO) architectures. Firstly, the real OFDM time domain signal is obtained through invoking the precoding/decoding algorithm without the Hermitian symmetry. After that, the positive signal is achieved either by adding a DC-bias or by using the spatial multiplexing technique. The proposed CMO-OFDM scheme efficiently improves the spectral efficiency of the VLC system as it does not require the Hermitian symmetry constraint to yield real signals. A comparison of the performance improvement of the proposed scheme with other OFDM approaches is also presented in this work. Simulation results show that the proposed CMO-OFDM scheme can not only enhance the spectral efficiency of OFDM-based VLC systems but also improve bit error rate (BER) performance compared with other optical OFDM schemes.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Atanassov, E.; Dimitrov, D., E-mail: d.slavov@bas.bg, E-mail: emanouil@parallel.bas.bg, E-mail: gurov@bas.bg; Gurov, T.
2015-10-28
The recent developments in the area of high-performance computing are driven not only by the desire for ever higher performance but also by the rising costs of electricity. The use of various types of accelerators like GPUs, Intel Xeon Phi has become mainstream and many algorithms and applications have been ported to make use of them where available. In Financial Mathematics the question of optimal use of computational resources should also take into account the limitations on space, because in many use cases the servers are deployed close to the exchanges. In this work we evaluate various algorithms for optionmore » pricing that we have implemented for different target architectures in terms of their energy and space efficiency. Since it has been established that low-discrepancy sequences may be better than pseudorandom numbers for these types of algorithms, we also test the Sobol and Halton sequences. We present the raw results, the computed metrics and conclusions from our tests.« less
Using Deep Learning Algorithm to Enhance Image-review Software for Surveillance Cameras
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cui, Yonggang; Thomas, Maikael A.
We propose the development of proven deep learning algorithms to flag objects and events of interest in Next Generation Surveillance System (NGSS) surveillance to make IAEA image review more efficient. Video surveillance is one of the core monitoring technologies used by the IAEA Department of Safeguards when implementing safeguards at nuclear facilities worldwide. The current image review software GARS has limited automated functions, such as scene-change detection, black image detection and missing scene analysis, but struggles with highly cluttered backgrounds. A cutting-edge algorithm to be developed in this project will enable efficient and effective searches in images and video streamsmore » by identifying and tracking safeguards relevant objects and detect anomalies in their vicinity. In this project, we will develop the algorithm, test it with the IAEA surveillance cameras and data sets collected at simulated nuclear facilities at BNL and SNL, and implement it in a software program for potential integration into the IAEA’s IRAP (Integrated Review and Analysis Program).« less
Silva, Adão; Gameiro, Atílio
2014-01-01
We present in this work a low-complexity algorithm to solve the sum rate maximization problem in multiuser MIMO broadcast channels with downlink beamforming. Our approach decouples the user selection problem from the resource allocation problem and its main goal is to create a set of quasiorthogonal users. The proposed algorithm exploits physical metrics of the wireless channels that can be easily computed in such a way that a null space projection power can be approximated efficiently. Based on the derived metrics we present a mathematical model that describes the dynamics of the user selection process which renders the user selection problem into an integer linear program. Numerical results show that our approach is highly efficient to form groups of quasiorthogonal users when compared to previously proposed algorithms in the literature. Our user selection algorithm achieves a large portion of the optimum user selection sum rate (90%) for a moderate number of active users. PMID:24574928
A Methodology for the Hybridization Based in Active Components: The Case of cGA and Scatter Search.
Villagra, Andrea; Alba, Enrique; Leguizamón, Guillermo
2016-01-01
This work presents the results of a new methodology for hybridizing metaheuristics. By first locating the active components (parts) of one algorithm and then inserting them into second one, we can build efficient and accurate optimization, search, and learning algorithms. This gives a concrete way of constructing new techniques that contrasts the spread ad hoc way of hybridizing. In this paper, the enhanced algorithm is a Cellular Genetic Algorithm (cGA) which has been successfully used in the past to find solutions to such hard optimization problems. In order to extend and corroborate the use of active components as an emerging hybridization methodology, we propose here the use of active components taken from Scatter Search (SS) to improve cGA. The results obtained over a varied set of benchmarks are highly satisfactory in efficacy and efficiency when compared with a standard cGA. Moreover, the proposed hybrid approach (i.e., cGA+SS) has shown encouraging results with regard to earlier applications of our methodology.
Orthogonalizing EM: A design-based least squares algorithm.
Xiong, Shifeng; Dai, Bin; Huling, Jared; Qian, Peter Z G
We introduce an efficient iterative algorithm, intended for various least squares problems, based on a design of experiments perspective. The algorithm, called orthogonalizing EM (OEM), works for ordinary least squares and can be easily extended to penalized least squares. The main idea of the procedure is to orthogonalize a design matrix by adding new rows and then solve the original problem by embedding the augmented design in a missing data framework. We establish several attractive theoretical properties concerning OEM. For the ordinary least squares with a singular regression matrix, an OEM sequence converges to the Moore-Penrose generalized inverse-based least squares estimator. For ordinary and penalized least squares with various penalties, it converges to a point having grouping coherence for fully aliased regression matrices. Convergence and the convergence rate of the algorithm are examined. Finally, we demonstrate that OEM is highly efficient for large-scale least squares and penalized least squares problems, and is considerably faster than competing methods when n is much larger than p . Supplementary materials for this article are available online.
NASA Astrophysics Data System (ADS)
Atanassov, E.; Dimitrov, D.; Gurov, T.
2015-10-01
The recent developments in the area of high-performance computing are driven not only by the desire for ever higher performance but also by the rising costs of electricity. The use of various types of accelerators like GPUs, Intel Xeon Phi has become mainstream and many algorithms and applications have been ported to make use of them where available. In Financial Mathematics the question of optimal use of computational resources should also take into account the limitations on space, because in many use cases the servers are deployed close to the exchanges. In this work we evaluate various algorithms for option pricing that we have implemented for different target architectures in terms of their energy and space efficiency. Since it has been established that low-discrepancy sequences may be better than pseudorandom numbers for these types of algorithms, we also test the Sobol and Halton sequences. We present the raw results, the computed metrics and conclusions from our tests.
Gao, Zheng; Gui, Ping
2012-07-01
In this paper, we present a digital predistortion technique to improve the linearity and power efficiency of a high-voltage class-AB power amplifier (PA) for ultrasound transmitters. The system is composed of a digital-to-analog converter (DAC), an analog-to-digital converter (ADC), and a field-programmable gate array (FPGA) in which the digital predistortion (DPD) algorithm is implemented. The DPD algorithm updates the error, which is the difference between the ideal signal and the attenuated distorted output signal, in the look-up table (LUT) memory during each cycle of a sinusoidal signal using the least-mean-square (LMS) algorithm. On the next signal cycle, the error data are used to equalize the signal with negative harmonic components to cancel the amplifier's nonlinear response. The algorithm also includes a linear interpolation method applied to the windowed sinusoidal signals for the B-mode and Doppler modes. The measurement test bench uses an arbitrary function generator as the DAC to generate the input signal, an oscilloscope as the ADC to capture the output waveform, and software to implement the DPD algorithm. The measurement results show that the proposed system is able to reduce the second-order harmonic distortion (HD2) by 20 dB and the third-order harmonic distortion (HD3) by 14.5 dB, while at the same time improving the power efficiency by 18%.
Efficient conjugate gradient algorithms for computation of the manipulator forward dynamics
NASA Technical Reports Server (NTRS)
Fijany, Amir; Scheid, Robert E.
1989-01-01
The applicability of conjugate gradient algorithms for computation of the manipulator forward dynamics is investigated. The redundancies in the previously proposed conjugate gradient algorithm are analyzed. A new version is developed which, by avoiding these redundancies, achieves a significantly greater efficiency. A preconditioned conjugate gradient algorithm is also presented. A diagonal matrix whose elements are the diagonal elements of the inertia matrix is proposed as the preconditioner. In order to increase the computational efficiency, an algorithm is developed which exploits the synergism between the computation of the diagonal elements of the inertia matrix and that required by the conjugate gradient algorithm.
An Extended Kalman Filter-Based Attitude Tracking Algorithm for Star Sensors
Li, Jian; Wei, Xinguo; Zhang, Guangjun
2017-01-01
Efficiency and reliability are key issues when a star sensor operates in tracking mode. In the case of high attitude dynamics, the performance of existing attitude tracking algorithms degenerates rapidly. In this paper an extended Kalman filtering-based attitude tracking algorithm is presented. The star sensor is modeled as a nonlinear stochastic system with the state estimate providing the three degree-of-freedom attitude quaternion and angular velocity. The star positions in the star image are predicted and measured to estimate the optimal attitude. Furthermore, all the cataloged stars observed in the sensor field-of-view according the predicted image motion are accessed using a catalog partition table to speed up the tracking, called star mapping. Software simulation and night-sky experiment are performed to validate the efficiency and reliability of the proposed method. PMID:28825684
An Extended Kalman Filter-Based Attitude Tracking Algorithm for Star Sensors.
Li, Jian; Wei, Xinguo; Zhang, Guangjun
2017-08-21
Efficiency and reliability are key issues when a star sensor operates in tracking mode. In the case of high attitude dynamics, the performance of existing attitude tracking algorithms degenerates rapidly. In this paper an extended Kalman filtering-based attitude tracking algorithm is presented. The star sensor is modeled as a nonlinear stochastic system with the state estimate providing the three degree-of-freedom attitude quaternion and angular velocity. The star positions in the star image are predicted and measured to estimate the optimal attitude. Furthermore, all the cataloged stars observed in the sensor field-of-view according the predicted image motion are accessed using a catalog partition table to speed up the tracking, called star mapping. Software simulation and night-sky experiment are performed to validate the efficiency and reliability of the proposed method.
The algorithm of fast image stitching based on multi-feature extraction
NASA Astrophysics Data System (ADS)
Yang, Chunde; Wu, Ge; Shi, Jing
2018-05-01
This paper proposed an improved image registration method combining Hu-based invariant moment contour information and feature points detection, aiming to solve the problems in traditional image stitching algorithm, such as time-consuming feature points extraction process, redundant invalid information overload and inefficiency. First, use the neighborhood of pixels to extract the contour information, employing the Hu invariant moment as similarity measure to extract SIFT feature points in those similar regions. Then replace the Euclidean distance with Hellinger kernel function to improve the initial matching efficiency and get less mismatching points, further, estimate affine transformation matrix between the images. Finally, local color mapping method is adopted to solve uneven exposure, using the improved multiresolution fusion algorithm to fuse the mosaic images and realize seamless stitching. Experimental results confirm high accuracy and efficiency of method proposed in this paper.
Redundant interferometric calibration as a complex optimization problem
NASA Astrophysics Data System (ADS)
Grobler, T. L.; Bernardi, G.; Kenyon, J. S.; Parsons, A. R.; Smirnov, O. M.
2018-05-01
Observations of the redshifted 21 cm line from the epoch of reionization have recently motivated the construction of low-frequency radio arrays with highly redundant configurations. These configurations provide an alternative calibration strategy - `redundant calibration' - and boost sensitivity on specific spatial scales. In this paper, we formulate calibration of redundant interferometric arrays as a complex optimization problem. We solve this optimization problem via the Levenberg-Marquardt algorithm. This calibration approach is more robust to initial conditions than current algorithms and, by leveraging an approximate matrix inversion, allows for further optimization and an efficient implementation (`redundant STEFCAL'). We also investigated using the preconditioned conjugate gradient method as an alternative to the approximate matrix inverse, but found that its computational performance is not competitive with respect to `redundant STEFCAL'. The efficient implementation of this new algorithm is made publicly available.
Multicore and GPU algorithms for Nussinov RNA folding
2014-01-01
Background One segment of a RNA sequence might be paired with another segment of the same RNA sequence due to the force of hydrogen bonds. This two-dimensional structure is called the RNA sequence's secondary structure. Several algorithms have been proposed to predict an RNA sequence's secondary structure. These algorithms are referred to as RNA folding algorithms. Results We develop cache efficient, multicore, and GPU algorithms for RNA folding using Nussinov's algorithm. Conclusions Our cache efficient algorithm provides a speedup between 1.6 and 3.0 relative to a naive straightforward single core code. The multicore version of the cache efficient single core algorithm provides a speedup, relative to the naive single core algorithm, between 7.5 and 14.0 on a 6 core hyperthreaded CPU. Our GPU algorithm for the NVIDIA C2050 is up to 1582 times as fast as the naive single core algorithm and between 5.1 and 11.2 times as fast as the fastest previously known GPU algorithm for Nussinov RNA folding. PMID:25082539
Staff | Computational Science | NREL
develops and leads laboratory-wide efforts in high-performance computing and energy-efficient data centers Professional IV-High Perf Computing Jim.Albin@nrel.gov 303-275-4069 Ananthan, Shreyas Senior Scientist - High -Performance Algorithms and Modeling Shreyas.Ananthan@nrel.gov 303-275-4807 Bendl, Kurt IT Professional IV-High
Li, Jun-qing; Pan, Quan-ke; Mao, Kun
2014-01-01
A hybrid algorithm which combines particle swarm optimization (PSO) and iterated local search (ILS) is proposed for solving the hybrid flowshop scheduling (HFS) problem with preventive maintenance (PM) activities. In the proposed algorithm, different crossover operators and mutation operators are investigated. In addition, an efficient multiple insert mutation operator is developed for enhancing the searching ability of the algorithm. Furthermore, an ILS-based local search procedure is embedded in the algorithm to improve the exploitation ability of the proposed algorithm. The detailed experimental parameter for the canonical PSO is tuning. The proposed algorithm is tested on the variation of 77 Carlier and Néron's benchmark problems. Detailed comparisons with the present efficient algorithms, including hGA, ILS, PSO, and IG, verify the efficiency and effectiveness of the proposed algorithm. PMID:24883414
A density based algorithm to detect cavities and holes from planar points
NASA Astrophysics Data System (ADS)
Zhu, Jie; Sun, Yizhong; Pang, Yueyong
2017-12-01
Delaunay-based shape reconstruction algorithms are widely used in approximating the shape from planar points. However, these algorithms cannot ensure the optimality of varied reconstructed cavity boundaries and hole boundaries. This inadequate reconstruction can be primarily attributed to the lack of efficient mathematic formulation for the two structures (hole and cavity). In this paper, we develop an efficient algorithm for generating cavities and holes from planar points. The algorithm yields the final boundary based on an iterative removal of the Delaunay triangulation. Our algorithm is mainly divided into two steps, namely, rough and refined shape reconstructions. The rough shape reconstruction performed by the algorithm is controlled by a relative parameter. Based on the rough result, the refined shape reconstruction mainly aims to detect holes and pure cavities. Cavity and hole are conceptualized as a structure with a low-density region surrounded by the high-density region. With this structure, cavity and hole are characterized by a mathematic formulation called as compactness of point formed by the length variation of the edges incident to point in Delaunay triangulation. The boundaries of cavity and hole are then found by locating a shape gradient change in compactness of point set. The experimental comparison with other shape reconstruction approaches shows that the proposed algorithm is able to accurately yield the boundaries of cavity and hole with varying point set densities and distributions.
Multigrid treatment of implicit continuum diffusion
NASA Astrophysics Data System (ADS)
Francisquez, Manaure; Zhu, Ben; Rogers, Barrett
2017-10-01
Implicit treatment of diffusive terms of various differential orders common in continuum mechanics modeling, such as computational fluid dynamics, is investigated with spectral and multigrid algorithms in non-periodic 2D domains. In doubly periodic time dependent problems these terms can be efficiently and implicitly handled by spectral methods, but in non-periodic systems solved with distributed memory parallel computing and 2D domain decomposition, this efficiency is lost for large numbers of processors. We built and present here a multigrid algorithm for these types of problems which outperforms a spectral solution that employs the highly optimized FFTW library. This multigrid algorithm is not only suitable for high performance computing but may also be able to efficiently treat implicit diffusion of arbitrary order by introducing auxiliary equations of lower order. We test these solvers for fourth and sixth order diffusion with idealized harmonic test functions as well as a turbulent 2D magnetohydrodynamic simulation. It is also shown that an anisotropic operator without cross-terms can improve model accuracy and speed, and we examine the impact that the various diffusion operators have on the energy, the enstrophy, and the qualitative aspect of a simulation. This work was supported by DOE-SC-0010508. This research used resources of the National Energy Research Scientific Computing Center (NERSC).
Maurer, S A; Kussmann, J; Ochsenfeld, C
2014-08-07
We present a low-prefactor, cubically scaling scaled-opposite-spin second-order Møller-Plesset perturbation theory (SOS-MP2) method which is highly suitable for massively parallel architectures like graphics processing units (GPU). The scaling is reduced from O(N⁵) to O(N³) by a reformulation of the MP2-expression in the atomic orbital basis via Laplace transformation and the resolution-of-the-identity (RI) approximation of the integrals in combination with efficient sparse algebra for the 3-center integral transformation. In contrast to previous works that employ GPUs for post Hartree-Fock calculations, we do not simply employ GPU-based linear algebra libraries to accelerate the conventional algorithm. Instead, our reformulation allows to replace the rate-determining contraction step with a modified J-engine algorithm, that has been proven to be highly efficient on GPUs. Thus, our SOS-MP2 scheme enables us to treat large molecular systems in an accurate and efficient manner on a single GPU-server.
Computationally efficient multibody simulations
NASA Technical Reports Server (NTRS)
Ramakrishnan, Jayant; Kumar, Manoj
1994-01-01
Computationally efficient approaches to the solution of the dynamics of multibody systems are presented in this work. The computational efficiency is derived from both the algorithmic and implementational standpoint. Order(n) approaches provide a new formulation of the equations of motion eliminating the assembly and numerical inversion of a system mass matrix as required by conventional algorithms. Computational efficiency is also gained in the implementation phase by the symbolic processing and parallel implementation of these equations. Comparison of this algorithm with existing multibody simulation programs illustrates the increased computational efficiency.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bai, T; UT Southwestern Medical Center, Dallas, TX; Yan, H
2014-06-15
Purpose: To develop a 3D dictionary learning based statistical reconstruction algorithm on graphic processing units (GPU), to improve the quality of low-dose cone beam CT (CBCT) imaging with high efficiency. Methods: A 3D dictionary containing 256 small volumes (atoms) of 3x3x3 voxels was trained from a high quality volume image. During reconstruction, we utilized a Cholesky decomposition based orthogonal matching pursuit algorithm to find a sparse representation on this dictionary basis of each patch in the reconstructed image, in order to regularize the image quality. To accelerate the time-consuming sparse coding in the 3D case, we implemented our algorithm inmore » a parallel fashion by taking advantage of the tremendous computational power of GPU. Evaluations are performed based on a head-neck patient case. FDK reconstruction with full dataset of 364 projections is used as the reference. We compared the proposed 3D dictionary learning based method with a tight frame (TF) based one using a subset data of 121 projections. The image qualities under different resolutions in z-direction, with or without statistical weighting are also studied. Results: Compared to the TF-based CBCT reconstruction, our experiments indicated that 3D dictionary learning based CBCT reconstruction is able to recover finer structures, to remove more streaking artifacts, and is less susceptible to blocky artifacts. It is also observed that statistical reconstruction approach is sensitive to inconsistency between the forward and backward projection operations in parallel computing. Using high a spatial resolution along z direction helps improving the algorithm robustness. Conclusion: 3D dictionary learning based CBCT reconstruction algorithm is able to sense the structural information while suppressing noise, and hence to achieve high quality reconstruction. The GPU realization of the whole algorithm offers a significant efficiency enhancement, making this algorithm more feasible for potential clinical application. A high zresolution is preferred to stabilize statistical iterative reconstruction. This work was supported in part by NIH(1R01CA154747-01), NSFC((No. 61172163), Research Fund for the Doctoral Program of Higher Education of China (No. 20110201110011), China Scholarship Council.« less
Multiple quay cranes scheduling for double cycling in container terminals
Chu, Yanling; Zhang, Xiaoju; Yang, Zhongzhen
2017-01-01
Double cycling is an efficient tool to increase the efficiency of quay crane (QC) in container terminals. In this paper, an optimization model for double cycling is developed to optimize the operation sequence of multiple QCs. The objective is to minimize the makespan of the ship handling operation considering the ship balance constraint. To solve the model, an algorithm based on Lagrangian relaxation is designed. Finally, we compare the efficiency of the Lagrangian relaxation based heuristic with the branch-and-bound method and a genetic algorithm using instances of different sizes. The results of numerical experiments indicate that the proposed model can effectively reduce the unloading and loading times of QCs. The effects of the ship balance constraint are more notable when the number of QCs is high. PMID:28692699
Multiple quay cranes scheduling for double cycling in container terminals.
Chu, Yanling; Zhang, Xiaoju; Yang, Zhongzhen
2017-01-01
Double cycling is an efficient tool to increase the efficiency of quay crane (QC) in container terminals. In this paper, an optimization model for double cycling is developed to optimize the operation sequence of multiple QCs. The objective is to minimize the makespan of the ship handling operation considering the ship balance constraint. To solve the model, an algorithm based on Lagrangian relaxation is designed. Finally, we compare the efficiency of the Lagrangian relaxation based heuristic with the branch-and-bound method and a genetic algorithm using instances of different sizes. The results of numerical experiments indicate that the proposed model can effectively reduce the unloading and loading times of QCs. The effects of the ship balance constraint are more notable when the number of QCs is high.
2D-RBUC for efficient parallel compression of residuals
NASA Astrophysics Data System (ADS)
Đurđević, Đorđe M.; Tartalja, Igor I.
2018-02-01
In this paper, we present a method for lossless compression of residuals with an efficient SIMD parallel decompression. The residuals originate from lossy or near lossless compression of height fields, which are commonly used to represent models of terrains. The algorithm is founded on the existing RBUC method for compression of non-uniform data sources. We have adapted the method to capture 2D spatial locality of height fields, and developed the data decompression algorithm for modern GPU architectures already present even in home computers. In combination with the point-level SIMD-parallel lossless/lossy high field compression method HFPaC, characterized by fast progressive decompression and seamlessly reconstructed surface, the newly proposed method trades off small efficiency degradation for a non negligible compression ratio (measured up to 91%) benefit.
Data decomposition method for parallel polygon rasterization considering load balancing
NASA Astrophysics Data System (ADS)
Zhou, Chen; Chen, Zhenjie; Liu, Yongxue; Li, Feixue; Cheng, Liang; Zhu, A.-xing; Li, Manchun
2015-12-01
It is essential to adopt parallel computing technology to rapidly rasterize massive polygon data. In parallel rasterization, it is difficult to design an effective data decomposition method. Conventional methods ignore load balancing of polygon complexity in parallel rasterization and thus fail to achieve high parallel efficiency. In this paper, a novel data decomposition method based on polygon complexity (DMPC) is proposed. First, four factors that possibly affect the rasterization efficiency were investigated. Then, a metric represented by the boundary number and raster pixel number in the minimum bounding rectangle was developed to calculate the complexity of each polygon. Using this metric, polygons were rationally allocated according to the polygon complexity, and each process could achieve balanced loads of polygon complexity. To validate the efficiency of DMPC, it was used to parallelize different polygon rasterization algorithms and tested on different datasets. Experimental results showed that DMPC could effectively parallelize polygon rasterization algorithms. Furthermore, the implemented parallel algorithms with DMPC could achieve good speedup ratios of at least 15.69 and generally outperformed conventional decomposition methods in terms of parallel efficiency and load balancing. In addition, the results showed that DMPC exhibited consistently better performance for different spatial distributions of polygons.
Multiresolution molecular mechanics: Implementation and efficiency
NASA Astrophysics Data System (ADS)
Biyikli, Emre; To, Albert C.
2017-01-01
Atomistic/continuum coupling methods combine accurate atomistic methods and efficient continuum methods to simulate the behavior of highly ordered crystalline systems. Coupled methods utilize the advantages of both approaches to simulate systems at a lower computational cost, while retaining the accuracy associated with atomistic methods. Many concurrent atomistic/continuum coupling methods have been proposed in the past; however, their true computational efficiency has not been demonstrated. The present work presents an efficient implementation of a concurrent coupling method called the Multiresolution Molecular Mechanics (MMM) for serial, parallel, and adaptive analysis. First, we present the features of the software implemented along with the associated technologies. The scalability of the software implementation is demonstrated, and the competing effects of multiscale modeling and parallelization are discussed. Then, the algorithms contributing to the efficiency of the software are presented. These include algorithms for eliminating latent ghost atoms from calculations and measurement-based dynamic balancing of parallel workload. The efficiency improvements made by these algorithms are demonstrated by benchmark tests. The efficiency of the software is found to be on par with LAMMPS, a state-of-the-art Molecular Dynamics (MD) simulation code, when performing full atomistic simulations. Speed-up of the MMM method is shown to be directly proportional to the reduction of the number of the atoms visited in force computation. Finally, an adaptive MMM analysis on a nanoindentation problem, containing over a million atoms, is performed, yielding an improvement of 6.3-8.5 times in efficiency, over the full atomistic MD method. For the first time, the efficiency of a concurrent atomistic/continuum coupling method is comprehensively investigated and demonstrated.
Xiao, Feng; Kong, Lingjiang; Chen, Jian
2017-06-01
A rapid-search algorithm to improve the beam-steering efficiency for a liquid crystal optical phased array was proposed and experimentally demonstrated in this paper. This proposed algorithm, in which the value of steering efficiency is taken as the objective function and the controlling voltage codes are considered as the optimization variables, consisted of a detection stage and a construction stage. It optimized the steering efficiency in the detection stage and adjusted its search direction adaptively in the construction stage to avoid getting caught in a wrong search space. Simulations had been conducted to compare the proposed algorithm with the widely used pattern-search algorithm using criteria of convergence rate and optimized efficiency. Beam-steering optimization experiments had been performed to verify the validity of the proposed method.
Modified GMDH-NN algorithm and its application for global sensitivity analysis
NASA Astrophysics Data System (ADS)
Song, Shufang; Wang, Lu
2017-11-01
Global sensitivity analysis (GSA) is a very useful tool to evaluate the influence of input variables in the whole distribution range. Sobol' method is the most commonly used among variance-based methods, which are efficient and popular GSA techniques. High dimensional model representation (HDMR) is a popular way to compute Sobol' indices, however, its drawbacks cannot be ignored. We show that modified GMDH-NN algorithm can calculate coefficients of metamodel efficiently, so this paper aims at combining it with HDMR and proposes GMDH-HDMR method. The new method shows higher precision and faster convergent rate. Several numerical and engineering examples are used to confirm its advantages.
A Fast parallel tridiagonal algorithm for a class of CFD applications
NASA Technical Reports Server (NTRS)
Moitra, Stuti; Sun, Xian-He
1996-01-01
The parallel diagonal dominant (PDD) algorithm is an efficient tridiagonal solver. This paper presents for study a variation of the PDD algorithm, the reduced PDD algorithm. The new algorithm maintains the minimum communication provided by the PDD algorithm, but has a reduced operation count. The PDD algorithm also has a smaller operation count than the conventional sequential algorithm for many applications. Accuracy analysis is provided for the reduced PDD algorithm for symmetric Toeplitz tridiagonal (STT) systems. Implementation results on Langley's Intel Paragon and IBM SP2 show that both the PDD and reduced PDD algorithms are efficient and scalable.
A high performance load balance strategy for real-time multicore systems.
Cho, Keng-Mao; Tsai, Chun-Wei; Chiu, Yi-Shiuan; Yang, Chu-Sing
2014-01-01
Finding ways to distribute workloads to each processor core and efficiently reduce power consumption is of vital importance, especially for real-time systems. In this paper, a novel scheduling algorithm is proposed for real-time multicore systems to balance the computation loads and save power. The developed algorithm simultaneously considers multiple criteria, a novel factor, and task deadline, and is called power and deadline-aware multicore scheduling (PDAMS). Experiment results show that the proposed algorithm can greatly reduce energy consumption by up to 54.2% and the deadline times missed, as compared to the other scheduling algorithms outlined in this paper.
NASA Astrophysics Data System (ADS)
Mohamed, Najihah; Lutfi Amri Ramli, Ahmad; Majid, Ahmad Abd; Piah, Abd Rahni Mt
2017-09-01
A metaheuristic algorithm, called Harmony Search is quite highly applied in optimizing parameters in many areas. HS is a derivative-free real parameter optimization algorithm, and draws an inspiration from the musical improvisation process of searching for a perfect state of harmony. Propose in this paper Modified Harmony Search for solving optimization problems, which employs a concept from genetic algorithm method and particle swarm optimization for generating new solution vectors that enhances the performance of HS algorithm. The performances of MHS and HS are investigated on ten benchmark optimization problems in order to make a comparison to reflect the efficiency of the MHS in terms of final accuracy, convergence speed and robustness.
NASA Astrophysics Data System (ADS)
Kaveh, A.; Zolghadr, A.
2017-08-01
Structural optimization with frequency constraints is seen as a challenging problem because it is associated with highly nonlinear, discontinuous and non-convex search spaces consisting of several local optima. Therefore, competent optimization algorithms are essential for addressing these problems. In this article, a newly developed metaheuristic method called the cyclical parthenogenesis algorithm (CPA) is used for layout optimization of truss structures subjected to frequency constraints. CPA is a nature-inspired, population-based metaheuristic algorithm, which imitates the reproductive and social behaviour of some animal species such as aphids, which alternate between sexual and asexual reproduction. The efficiency of the CPA is validated using four numerical examples.
A High Performance Load Balance Strategy for Real-Time Multicore Systems
Cho, Keng-Mao; Tsai, Chun-Wei; Chiu, Yi-Shiuan; Yang, Chu-Sing
2014-01-01
Finding ways to distribute workloads to each processor core and efficiently reduce power consumption is of vital importance, especially for real-time systems. In this paper, a novel scheduling algorithm is proposed for real-time multicore systems to balance the computation loads and save power. The developed algorithm simultaneously considers multiple criteria, a novel factor, and task deadline, and is called power and deadline-aware multicore scheduling (PDAMS). Experiment results show that the proposed algorithm can greatly reduce energy consumption by up to 54.2% and the deadline times missed, as compared to the other scheduling algorithms outlined in this paper. PMID:24955382
A proximity algorithm accelerated by Gauss-Seidel iterations for L1/TV denoising models
NASA Astrophysics Data System (ADS)
Li, Qia; Micchelli, Charles A.; Shen, Lixin; Xu, Yuesheng
2012-09-01
Our goal in this paper is to improve the computational performance of the proximity algorithms for the L1/TV denoising model. This leads us to a new characterization of all solutions to the L1/TV model via fixed-point equations expressed in terms of the proximity operators. Based upon this observation we develop an algorithm for solving the model and establish its convergence. Furthermore, we demonstrate that the proposed algorithm can be accelerated through the use of the componentwise Gauss-Seidel iteration so that the CPU time consumed is significantly reduced. Numerical experiments using the proposed algorithm for impulsive noise removal are included, with a comparison to three recently developed algorithms. The numerical results show that while the proposed algorithm enjoys a high quality of the restored images, as the other three known algorithms do, it performs significantly better in terms of computational efficiency measured in the CPU time consumed.
A Benders based rolling horizon algorithm for a dynamic facility location problem
Marufuzzaman,, Mohammad; Gedik, Ridvan; Roni, Mohammad S.
2016-06-28
This study presents a well-known capacitated dynamic facility location problem (DFLP) that satisfies the customer demand at a minimum cost by determining the time period for opening, closing, or retaining an existing facility in a given location. To solve this challenging NP-hard problem, this paper develops a unique hybrid solution algorithm that combines a rolling horizon algorithm with an accelerated Benders decomposition algorithm. Extensive computational experiments are performed on benchmark test instances to evaluate the hybrid algorithm’s efficiency and robustness in solving the DFLP problem. Computational results indicate that the hybrid Benders based rolling horizon algorithm consistently offers high qualitymore » feasible solutions in a much shorter computational time period than the standalone rolling horizon and accelerated Benders decomposition algorithms in the experimental range.« less
Fast object detection algorithm based on HOG and CNN
NASA Astrophysics Data System (ADS)
Lu, Tongwei; Wang, Dandan; Zhang, Yanduo
2018-04-01
In the field of computer vision, object classification and object detection are widely used in many fields. The traditional object detection have two main problems:one is that sliding window of the regional selection strategy is high time complexity and have window redundancy. And the other one is that Robustness of the feature is not well. In order to solve those problems, Regional Proposal Network (RPN) is used to select candidate regions instead of selective search algorithm. Compared with traditional algorithms and selective search algorithms, RPN has higher efficiency and accuracy. We combine HOG feature and convolution neural network (CNN) to extract features. And we use SVM to classify. For TorontoNet, our algorithm's mAP is 1.6 percentage points higher. For OxfordNet, our algorithm's mAP is 1.3 percentage higher.
Production scheduling and rescheduling with genetic algorithms.
Bierwirth, C; Mattfeld, D C
1999-01-01
A general model for job shop scheduling is described which applies to static, dynamic and non-deterministic production environments. Next, a Genetic Algorithm is presented which solves the job shop scheduling problem. This algorithm is tested in a dynamic environment under different workload situations. Thereby, a highly efficient decoding procedure is proposed which strongly improves the quality of schedules. Finally, this technique is tested for scheduling and rescheduling in a non-deterministic environment. It is shown by experiment that conventional methods of production control are clearly outperformed at reasonable run-time costs.
Evolutionary algorithm for optimization of nonimaging Fresnel lens geometry.
Yamada, N; Nishikawa, T
2010-06-21
In this study, an evolutionary algorithm (EA), which consists of genetic and immune algorithms, is introduced to design the optical geometry of a nonimaging Fresnel lens; this lens generates the uniform flux concentration required for a photovoltaic cell. Herein, a design procedure that incorporates a ray-tracing technique in the EA is described, and the validity of the design is demonstrated. The results show that the EA automatically generated a unique geometry of the Fresnel lens; the use of this geometry resulted in better uniform flux concentration with high optical efficiency.
Pei, Jiquan; Han, Steve; Liao, Haijun; Li, Tao
2014-01-22
A highly efficient and simple-to-implement Monte Carlo algorithm is proposed for the evaluation of the Rényi entanglement entropy (REE) of the quantum dimer model (QDM) at the Rokhsar-Kivelson (RK) point. It makes possible the evaluation of REE at the RK point to the thermodynamic limit for a general QDM. We apply the algorithm to a QDM defined on the triangular and the square lattice in two dimensions and the simple and the face centered cubic (fcc) lattice in three dimensions. We find the REE on all these lattices follows perfect linear scaling in the thermodynamic limit, apart from an even-odd oscillation in the case of the square lattice. We also evaluate the topological entanglement entropy (TEE) with both a subtraction and an extrapolation procedure. We find the QDMs on both the triangular and the fcc lattice exhibit robust Z2 topological order. The expected TEE of ln2 is clearly demonstrated in both cases. Our large scale simulation also proves the recently proposed extrapolation procedure in cylindrical geometry to be a highly reliable way to extract the TEE of a topologically ordered system.
NASA Astrophysics Data System (ADS)
Villar, Xabier; Piso, Daniel; Bruguera, Javier D.
2014-02-01
This paper presents an FPGA implementation of an algorithm, previously published, for the the reconstruction of cosmic rays' trajectories and the determination of the time of arrival and velocity of the particles. The accuracy and precision issues of the algorithm have been analyzed to propose a suitable implementation. Thus, a 32-bit fixed-point format has been used for the representation of the data values. Moreover, the dependencies among the different operations have been taken into account to obtain a highly parallel and efficient hardware implementation. The final hardware architecture requires 18 cycles to process every particle, and has been exhaustively simulated to validate all the design decisions. The architecture has been mapped over different commercial FPGAs, with a frequency of operation ranging from 300 MHz to 1.3 GHz, depending on the FPGA being used. Consequently, the number of particle trajectories processed per second is between 16 million and 72 million. The high number of particle trajectories calculated per second shows that the proposed FPGA implementation might be used also in high rate environments such as those found in particle and nuclear physics experiments.
Denni Algorithm An Enhanced Of SMS (Scan, Move and Sort) Algorithm
NASA Astrophysics Data System (ADS)
Aprilsyah Lubis, Denni; Salim Sitompul, Opim; Marwan; Tulus; Andri Budiman, M.
2017-12-01
Sorting has been a profound area for the algorithmic researchers, and many resources are invested to suggest a more working sorting algorithm. For this purpose many existing sorting algorithms were observed in terms of the efficiency of the algorithmic complexity. Efficient sorting is important to optimize the use of other algorithms that require sorted lists to work correctly. Sorting has been considered as a fundamental problem in the study of algorithms that due to many reasons namely, the necessary to sort information is inherent in many applications, algorithms often use sorting as a key subroutine, in algorithm design there are many essential techniques represented in the body of sorting algorithms, and many engineering issues come to the fore when implementing sorting algorithms., Many algorithms are very well known for sorting the unordered lists, and one of the well-known algorithms that make the process of sorting to be more economical and efficient is SMS (Scan, Move and Sort) algorithm, an enhancement of Quicksort invented Rami Mansi in 2010. This paper presents a new sorting algorithm called Denni-algorithm. The Denni algorithm is considered as an enhancement on the SMS algorithm in average, and worst cases. The Denni algorithm is compared with the SMS algorithm and the results were promising.
High-dimensional cluster analysis with the Masked EM Algorithm
Kadir, Shabnam N.; Goodman, Dan F. M.; Harris, Kenneth D.
2014-01-01
Cluster analysis faces two problems in high dimensions: first, the “curse of dimensionality” that can lead to overfitting and poor generalization performance; and second, the sheer time taken for conventional algorithms to process large amounts of high-dimensional data. We describe a solution to these problems, designed for the application of “spike sorting” for next-generation high channel-count neural probes. In this problem, only a small subset of features provide information about the cluster member-ship of any one data vector, but this informative feature subset is not the same for all data points, rendering classical feature selection ineffective. We introduce a “Masked EM” algorithm that allows accurate and time-efficient clustering of up to millions of points in thousands of dimensions. We demonstrate its applicability to synthetic data, and to real-world high-channel-count spike sorting data. PMID:25149694
NASA Astrophysics Data System (ADS)
Wen, Xianfei; Enqvist, Andreas
2017-09-01
Cs2LiYCl6:Ce3+ (CLYC) detectors have demonstrated the capability to simultaneously detect γ-rays and thermal and fast neutrons with medium energy resolution, reasonable detection efficiency, and substantially high pulse shape discrimination performance. A disadvantage of CLYC detectors is the long scintillation decay times, which causes pulse pile-up at moderate input count rate. Pulse processing algorithms were developed based on triangular and trapezoidal filters to discriminate between neutrons and γ-rays at high count rate. The algorithms were first tested using low-rate data. They exhibit a pulse-shape discrimination performance comparable to that of the charge comparison method, at low rate. Then, they were evaluated at high count rate. Neutrons and γ-rays were adequately identified with high throughput at rates of up to 375 kcps. The algorithm developed using the triangular filter exhibits discrimination capability marginally higher than that of the trapezoidal filter based algorithm irrespective of low or high rate. The algorithms exhibit low computational complexity and are executable on an FPGA in real-time. They are also suitable for application to other radiation detectors whose pulses are piled-up at high rate owing to long scintillation decay times.
Optimal Fungal Space Searching Algorithms.
Asenova, Elitsa; Lin, Hsin-Yu; Fu, Eileen; Nicolau, Dan V; Nicolau, Dan V
2016-10-01
Previous experiments have shown that fungi use an efficient natural algorithm for searching the space available for their growth in micro-confined networks, e.g., mazes. This natural "master" algorithm, which comprises two "slave" sub-algorithms, i.e., collision-induced branching and directional memory, has been shown to be more efficient than alternatives, with one, or the other, or both sub-algorithms turned off. In contrast, the present contribution compares the performance of the fungal natural algorithm against several standard artificial homologues. It was found that the space-searching fungal algorithm consistently outperforms uninformed algorithms, such as Depth-First-Search (DFS). Furthermore, while the natural algorithm is inferior to informed ones, such as A*, this under-performance does not importantly increase with the increase of the size of the maze. These findings suggest that a systematic effort of harvesting the natural space searching algorithms used by microorganisms is warranted and possibly overdue. These natural algorithms, if efficient, can be reverse-engineered for graph and tree search strategies.
Big data privacy protection model based on multi-level trusted system
NASA Astrophysics Data System (ADS)
Zhang, Nan; Liu, Zehua; Han, Hongfeng
2018-05-01
This paper introduces and inherit the multi-level trusted system model that solves the Trojan virus by encrypting the privacy of user data, and achieve the principle: "not to read the high priority hierarchy, not to write the hierarchy with low priority". Thus ensuring that the low-priority data privacy leak does not affect the disclosure of high-priority data privacy. This paper inherits the multi-level trustworthy system model of Trojan horse and divides seven different risk levels. The priority level 1˜7 represent the low to high value of user data privacy, and realize seven kinds of encryption with different execution efficiency Algorithm, the higher the priority, the greater the value of user data privacy, at the expense of efficiency under the premise of choosing a more encrypted encryption algorithm to ensure data security. For enterprises, the price point is determined by the unit equipment users to decide the length of time. The higher the risk sub-group algorithm, the longer the encryption time. The model assumes that users prefer the lower priority encryption algorithm to ensure efficiency. This paper proposes a privacy cost model for each of the seven risk subgroups. Among them, the higher the privacy cost, the higher the priority of the risk sub-group, the higher the price the user needs to pay to ensure the privacy of the data. Furthermore, by introducing the existing pricing model of economics and the human traffic model proposed by this paper and fluctuating with the market demand, this paper improves the price of unit products when the market demand is low. On the other hand, when the market demand increases, the profit of the enterprise will be guaranteed under the guidance of the government by reducing the price per unit of product. Then, this paper introduces the dynamic factors of consumers' mood and age to optimize. At the same time, seven algorithms are selected from symmetric and asymmetric encryption algorithms to define the enterprise costs at different levels. Therefore, the proposed model solves the continuous influence caused by cascading events and ensures that the disclosure of low-level data privacy of users does not affect the high-level data privacy, thus greatly improving the safety of the private information of user.
NASA Astrophysics Data System (ADS)
Rastogi, Richa; Srivastava, Abhishek; Khonde, Kiran; Sirasala, Kirannmayi M.; Londhe, Ashutosh; Chavhan, Hitesh
2015-07-01
This paper presents an efficient parallel 3D Kirchhoff depth migration algorithm suitable for current class of multicore architecture. The fundamental Kirchhoff depth migration algorithm exhibits inherent parallelism however, when it comes to 3D data migration, as the data size increases the resource requirement of the algorithm also increases. This challenges its practical implementation even on current generation high performance computing systems. Therefore a smart parallelization approach is essential to handle 3D data for migration. The most compute intensive part of Kirchhoff depth migration algorithm is the calculation of traveltime tables due to its resource requirements such as memory/storage and I/O. In the current research work, we target this area and develop a competent parallel algorithm for post and prestack 3D Kirchhoff depth migration, using hybrid MPI+OpenMP programming techniques. We introduce a concept of flexi-depth iterations while depth migrating data in parallel imaging space, using optimized traveltime table computations. This concept provides flexibility to the algorithm by migrating data in a number of depth iterations, which depends upon the available node memory and the size of data to be migrated during runtime. Furthermore, it minimizes the requirements of storage, I/O and inter-node communication, thus making it advantageous over the conventional parallelization approaches. The developed parallel algorithm is demonstrated and analysed on Yuva II, a PARAM series of supercomputers. Optimization, performance and scalability experiment results along with the migration outcome show the effectiveness of the parallel algorithm.
Information filtering via biased heat conduction
NASA Astrophysics Data System (ADS)
Liu, Jian-Guo; Zhou, Tao; Guo, Qiang
2011-09-01
The process of heat conduction has recently found application in personalized recommendation [Zhou , Proc. Natl. Acad. Sci. USA PNASA60027-842410.1073/pnas.1000488107107, 4511 (2010)], which is of high diversity but low accuracy. By decreasing the temperatures of small-degree objects, we present an improved algorithm, called biased heat conduction, which could simultaneously enhance the accuracy and diversity. Extensive experimental analyses demonstrate that the accuracy on MovieLens, Netflix, and Delicious datasets could be improved by 43.5%, 55.4% and 19.2%, respectively, compared with the standard heat conduction algorithm and also the diversity is increased or approximately unchanged. Further statistical analyses suggest that the present algorithm could simultaneously identify users' mainstream and special tastes, resulting in better performance than the standard heat conduction algorithm. This work provides a creditable way for highly efficient information filtering.
Gradient Optimization for Analytic conTrols - GOAT
NASA Astrophysics Data System (ADS)
Assémat, Elie; Machnes, Shai; Tannor, David; Wilhelm-Mauch, Frank
Quantum optimal control becomes a necessary step in a number of studies in the quantum realm. Recent experimental advances showed that superconducting qubits can be controlled with an impressive accuracy. However, most of the standard optimal control algorithms are not designed to manage such high accuracy. To tackle this issue, a novel quantum optimal control algorithm have been introduced: the Gradient Optimization for Analytic conTrols (GOAT). It avoids the piecewise constant approximation of the control pulse used by standard algorithms. This allows an efficient implementation of very high accuracy optimization. It also includes a novel method to compute the gradient that provides many advantages, e.g. the absence of backpropagation or the natural route to optimize the robustness of the control pulses. This talk will present the GOAT algorithm and a few applications to transmons systems.
Spin-the-bottle Sort and Annealing Sort: Oblivious Sorting via Round-robin Random Comparisons
Goodrich, Michael T.
2013-01-01
We study sorting algorithms based on randomized round-robin comparisons. Specifically, we study Spin-the-bottle sort, where comparisons are unrestricted, and Annealing sort, where comparisons are restricted to a distance bounded by a temperature parameter. Both algorithms are simple, randomized, data-oblivious sorting algorithms, which are useful in privacy-preserving computations, but, as we show, Annealing sort is much more efficient. We show that there is an input permutation that causes Spin-the-bottle sort to require Ω(n2 log n) expected time in order to succeed, and that in O(n2 log n) time this algorithm succeeds with high probability for any input. We also show there is a specification of Annealing sort that runs in O(n log n) time and succeeds with very high probability. PMID:24550575
Least reliable bits coding (LRBC) for high data rate satellite communications
NASA Technical Reports Server (NTRS)
Vanderaar, Mark; Budinger, James; Wagner, Paul
1992-01-01
LRBC, a bandwidth efficient multilevel/multistage block-coded modulation technique, is analyzed. LRBC uses simple multilevel component codes that provide increased error protection on increasingly unreliable modulated bits in order to maintain an overall high code rate that increases spectral efficiency. Soft-decision multistage decoding is used to make decisions on unprotected bits through corrections made on more protected bits. Analytical expressions and tight performance bounds are used to show that LRBC can achieve increased spectral efficiency and maintain equivalent or better power efficiency compared to that of BPSK. The relative simplicity of Galois field algebra vs the Viterbi algorithm and the availability of high-speed commercial VLSI for block codes indicates that LRBC using block codes is a desirable method for high data rate implementations.
Spatial compression algorithm for the analysis of very large multivariate images
Keenan, Michael R [Albuquerque, NM
2008-07-15
A method for spatially compressing data sets enables the efficient analysis of very large multivariate images. The spatial compression algorithms use a wavelet transformation to map an image into a compressed image containing a smaller number of pixels that retain the original image's information content. Image analysis can then be performed on a compressed data matrix consisting of a reduced number of significant wavelet coefficients. Furthermore, a block algorithm can be used for performing common operations more efficiently. The spatial compression algorithms can be combined with spectral compression algorithms to provide further computational efficiencies.
Efficient FFT Algorithm for Psychoacoustic Model of the MPEG-4 AAC
NASA Astrophysics Data System (ADS)
Lee, Jae-Seong; Lee, Chang-Joon; Park, Young-Cheol; Youn, Dae-Hee
This paper proposes an efficient FFT algorithm for the Psycho-Acoustic Model (PAM) of MPEG-4 AAC. The proposed algorithm synthesizes FFT coefficients using MDCT and MDST coefficients through circular convolution. The complexity of the MDCT and MDST coefficients is approximately half of the original FFT. We also design a new PAM based on the proposed FFT algorithm, which has 15% lower computational complexity than the original PAM without degradation of sound quality. Subjective as well as objective test results are presented to confirm the efficiency of the proposed FFT computation algorithm and the PAM.
Multimedia transmission in MC-CDMA using adaptive subcarrier power allocation and CFO compensation
NASA Astrophysics Data System (ADS)
Chitra, S.; Kumaratharan, N.
2018-02-01
Multicarrier code division multiple access (MC-CDMA) system is one of the most effective techniques in fourth-generation (4G) wireless technology, due to its high data rate, high spectral efficiency and resistance to multipath fading. However, MC-CDMA systems are greatly deteriorated by carrier frequency offset (CFO) which is due to Doppler shift and oscillator instabilities. It leads to loss of orthogonality among the subcarriers and causes intercarrier interference (ICI). Water filling algorithm (WFA) is an efficient resource allocation algorithm to solve the power utilisation problems among the subcarriers in time-dispersive channels. The conventional WFA fails to consider the effect of CFO. To perform subcarrier power allocation with reduced CFO and to improve the capacity of MC-CDMA system, residual CFO compensated adaptive subcarrier power allocation algorithm is proposed in this paper. The proposed technique allocates power only to subcarriers with high channel to noise power ratio. The performance of the proposed method is evaluated using random binary data and image as source inputs. Simulation results depict that the bit error rate performance and ICI reduction capability of the proposed modified WFA offered superior performance in both power allocation and image compression for high-quality multimedia transmission in the presence of CFO and imperfect channel state information conditions.
NASA Astrophysics Data System (ADS)
Barbu, Alina L.; Laurent-Varin, Julien; Perosanz, Felix; Mercier, Flavien; Marty, Jean-Charles
2018-01-01
The implementation into the GINS CNES geodetic software of a more efficient filter was needed to satisfy the users who wanted to compute high-rate GNSS PPP solutions. We selected the SRI approach and a QR factorization technique including an innovative algorithm which optimizes the matrix reduction step. A full description of this algorithm is given for future users. The new capacities of the software have been tested using a set of 1 Hz data from the Japanese GEONET network including the Mw 9.0 2011 Tohoku earthquake. Station coordinates solution agreed at a sub-decimeter level with previous publications as well as with solutions we computed with the National Resource Canada software. An additional benefit from the implementation of the SRI filter is the capability to estimate high-rate tropospheric parameters too. As the CPU time to estimate a 1 Hz kinematic solution from 1 h of data is now less than 1 min we could produced series of coordinates for the full 1300 stations of the Japanese network. The corresponding movie shows the impressive co-seismic deformation as well as the wave propagation along the island. The processing was straightforward using a cluster of PCs which illustrates the new potentiality of the GINS software for massive network high rate PPP processing.
A New Algorithm Using the Non-Dominated Tree to Improve Non-Dominated Sorting.
Gustavsson, Patrik; Syberfeldt, Anna
2018-01-01
Non-dominated sorting is a technique often used in evolutionary algorithms to determine the quality of solutions in a population. The most common algorithm is the Fast Non-dominated Sort (FNS). This algorithm, however, has the drawback that its performance deteriorates when the population size grows. The same drawback applies also to other non-dominating sorting algorithms such as the Efficient Non-dominated Sort with Binary Strategy (ENS-BS). An algorithm suggested to overcome this drawback is the Divide-and-Conquer Non-dominated Sort (DCNS) which works well on a limited number of objectives but deteriorates when the number of objectives grows. This article presents a new, more efficient algorithm called the Efficient Non-dominated Sort with Non-Dominated Tree (ENS-NDT). ENS-NDT is an extension of the ENS-BS algorithm and uses a novel Non-Dominated Tree (NDTree) to speed up the non-dominated sorting. ENS-NDT is able to handle large population sizes and a large number of objectives more efficiently than existing algorithms for non-dominated sorting. In the article, it is shown that with ENS-NDT the runtime of multi-objective optimization algorithms such as the Non-Dominated Sorting Genetic Algorithm II (NSGA-II) can be substantially reduced.
Incoherent beam combining based on the momentum SPGD algorithm
NASA Astrophysics Data System (ADS)
Yang, Guoqing; Liu, Lisheng; Jiang, Zhenhua; Guo, Jin; Wang, Tingfeng
2018-05-01
Incoherent beam combining (ICBC) technology is one of the most promising ways to achieve high-energy, near-diffraction laser output. In this paper, the momentum method is proposed as a modification of the stochastic parallel gradient descent (SPGD) algorithm. The momentum method can improve the speed of convergence of the combining system efficiently. The analytical method is employed to interpret the principle of the momentum method. Furthermore, the proposed algorithm is testified through simulations as well as experiments. The results of the simulations and the experiments show that the proposed algorithm not only accelerates the speed of the iteration, but also keeps the stability of the combining process. Therefore the feasibility of the proposed algorithm in the beam combining system is testified.
Special Issue on a Fault Tolerant Network on Chip Architecture
NASA Astrophysics Data System (ADS)
Janidarmian, Majid; Tinati, Melika; Khademzadeh, Ahmad; Ghavibazou, Maryam; Fekr, Atena Roshan
2010-06-01
In this paper a fast and efficient spare switch selection algorithm is presented in a reliable NoC architecture based on specific application mapped onto mesh topology called FERNA. Based on ring concept used in FERNA, this algorithm achieves best results equivalent to exhaustive algorithm with much less run time improving two parameters. Inputs of FERNA algorithm for response time of the system and extra communication cost minimization are derived from simulation of high transaction level using SystemC TLM and mathematical formulation, respectively. The results demonstrate that improvement of above mentioned parameters lead to advance whole system reliability that is analytically calculated. Mapping algorithm has been also investigated as an effective issue on extra bandwidth requirement and system reliability.
Finding minimum spanning trees more efficiently for tile-based phase unwrapping
NASA Astrophysics Data System (ADS)
Sawaf, Firas; Tatam, Ralph P.
2006-06-01
The tile-based phase unwrapping method employs an algorithm for finding the minimum spanning tree (MST) in each tile. We first examine the properties of a tile's representation from a graph theory viewpoint, observing that it is possible to make use of a more efficient class of MST algorithms. We then describe a novel linear time algorithm which reduces the size of the MST problem by half at the least, and solves it completely at best. We also show how this algorithm can be applied to a tile using a sliding window technique. Finally, we show how the reduction algorithm can be combined with any other standard MST algorithm to achieve a more efficient hybrid, using Prim's algorithm for empirical comparison and noting that the reduction algorithm takes only 0.1% of the time taken by the overall hybrid.
Frequent statistics of link-layer bit stream data based on AC-IM algorithm
NASA Astrophysics Data System (ADS)
Cao, Chenghong; Lei, Yingke; Xu, Yiming
2017-08-01
At present, there are many relevant researches on data processing using classical pattern matching and its improved algorithm, but few researches on statistical data of link-layer bit stream. This paper adopts a frequent statistical method of link-layer bit stream data based on AC-IM algorithm for classical multi-pattern matching algorithms such as AC algorithm has high computational complexity, low efficiency and it cannot be applied to binary bit stream data. The method's maximum jump distance of the mode tree is length of the shortest mode string plus 3 in case of no missing? In this paper, theoretical analysis is made on the principle of algorithm construction firstly, and then the experimental results show that the algorithm can adapt to the binary bit stream data environment and extract the frequent sequence more accurately, the effect is obvious. Meanwhile, comparing with the classical AC algorithm and other improved algorithms, AC-IM algorithm has a greater maximum jump distance and less time-consuming.
NASA Astrophysics Data System (ADS)
Pan, Bing; Wang, Bo
2017-10-01
Digital volume correlation (DVC) is a powerful technique for quantifying interior deformation within solid opaque materials and biological tissues. In the last two decades, great efforts have been made to improve the accuracy and efficiency of the DVC algorithm. However, there is still a lack of a flexible, robust and accurate version that can be efficiently implemented in personal computers with limited RAM. This paper proposes an advanced DVC method that can realize accurate full-field internal deformation measurement applicable to high-resolution volume images with up to billions of voxels. Specifically, a novel layer-wise reliability-guided displacement tracking strategy combined with dynamic data management is presented to guide the DVC computation from slice to slice. The displacements at specified calculation points in each layer are computed using the advanced 3D inverse-compositional Gauss-Newton algorithm with the complete initial guess of the deformation vector accurately predicted from the computed calculation points. Since only limited slices of interest in the reference and deformed volume images rather than the whole volume images are required, the DVC calculation can thus be efficiently implemented on personal computers. The flexibility, accuracy and efficiency of the presented DVC approach are demonstrated by analyzing computer-simulated and experimentally obtained high-resolution volume images.
IDEAL: Images Across Domains, Experiments, Algorithms and Learning
NASA Astrophysics Data System (ADS)
Ushizima, Daniela M.; Bale, Hrishikesh A.; Bethel, E. Wes; Ercius, Peter; Helms, Brett A.; Krishnan, Harinarayan; Grinberg, Lea T.; Haranczyk, Maciej; Macdowell, Alastair A.; Odziomek, Katarzyna; Parkinson, Dilworth Y.; Perciano, Talita; Ritchie, Robert O.; Yang, Chao
2016-11-01
Research across science domains is increasingly reliant on image-centric data. Software tools are in high demand to uncover relevant, but hidden, information in digital images, such as those coming from faster next generation high-throughput imaging platforms. The challenge is to analyze the data torrent generated by the advanced instruments efficiently, and provide insights such as measurements for decision-making. In this paper, we overview work performed by an interdisciplinary team of computational and materials scientists, aimed at designing software applications and coordinating research efforts connecting (1) emerging algorithms for dealing with large and complex datasets; (2) data analysis methods with emphasis in pattern recognition and machine learning; and (3) advances in evolving computer architectures. Engineering tools around these efforts accelerate the analyses of image-based recordings, improve reusability and reproducibility, scale scientific procedures by reducing time between experiments, increase efficiency, and open opportunities for more users of the imaging facilities. This paper describes our algorithms and software tools, showing results across image scales, demonstrating how our framework plays a role in improving image understanding for quality control of existent materials and discovery of new compounds.
NASA Technical Reports Server (NTRS)
Lin, Shu; Fossorier, Marc
1998-01-01
The Viterbi algorithm is indeed a very simple and efficient method of implementing the maximum likelihood decoding. However, if we take advantage of the structural properties in a trellis section, other efficient trellis-based decoding algorithms can be devised. Recently, an efficient trellis-based recursive maximum likelihood decoding (RMLD) algorithm for linear block codes has been proposed. This algorithm is more efficient than the conventional Viterbi algorithm in both computation and hardware requirements. Most importantly, the implementation of this algorithm does not require the construction of the entire code trellis, only some special one-section trellises of relatively small state and branch complexities are needed for constructing path (or branch) metric tables recursively. At the end, there is only one table which contains only the most likely code-word and its metric for a given received sequence r = (r(sub 1), r(sub 2),...,r(sub n)). This algorithm basically uses the divide and conquer strategy. Furthermore, it allows parallel/pipeline processing of received sequences to speed up decoding.
Zhang, Dashan; Guo, Jie; Lei, Xiujun; Zhu, Changan
2016-04-22
The development of image sensor and optics enables the application of vision-based techniques to the non-contact dynamic vibration analysis of large-scale structures. As an emerging technology, a vision-based approach allows for remote measuring and does not bring any additional mass to the measuring object compared with traditional contact measurements. In this study, a high-speed vision-based sensor system is developed to extract structure vibration signals in real time. A fast motion extraction algorithm is required for this system because the maximum sampling frequency of the charge-coupled device (CCD) sensor can reach up to 1000 Hz. Two efficient subpixel level motion extraction algorithms, namely the modified Taylor approximation refinement algorithm and the localization refinement algorithm, are integrated into the proposed vision sensor. Quantitative analysis shows that both of the two modified algorithms are at least five times faster than conventional upsampled cross-correlation approaches and achieve satisfactory error performance. The practicability of the developed sensor is evaluated by an experiment in a laboratory environment and a field test. Experimental results indicate that the developed high-speed vision-based sensor system can extract accurate dynamic structure vibration signals by tracking either artificial targets or natural features.
Implicit methods for the Navier-Stokes equations
NASA Technical Reports Server (NTRS)
Yoon, S.; Kwak, D.
1990-01-01
Numerical solutions of the Navier-Stokes equations using explicit schemes can be obtained at the expense of efficiency. Conventional implicit methods which often achieve fast convergence rates suffer high cost per iteration. A new implicit scheme based on lower-upper factorization and symmetric Gauss-Seidel relaxation offers very low cost per iteration as well as fast convergence. High efficiency is achieved by accomplishing the complete vectorizability of the algorithm on oblique planes of sweep in three dimensions.
Lehr, M E; Plisky, P J; Butler, R J; Fink, M L; Kiesel, K B; Underwood, F B
2013-08-01
In athletics, efficient screening tools are sought to curb the rising number of noncontact injuries and associated health care costs. The authors hypothesized that an injury prediction algorithm that incorporates movement screening performance, demographic information, and injury history can accurately categorize risk of noncontact lower extremity (LE) injury. One hundred eighty-three collegiate athletes were screened during the preseason. The test scores and demographic information were entered into an injury prediction algorithm that weighted the evidence-based risk factors. Athletes were then prospectively followed for noncontact LE injury. Subsequent analysis collapsed the groupings into two risk categories: Low (normal and slight) and High (moderate and substantial). Using these groups and noncontact LE injuries, relative risk (RR), sensitivity, specificity, and likelihood ratios were calculated. Forty-two subjects sustained a noncontact LE injury over the course of the study. Athletes identified as High Risk (n = 63) were at a greater risk of noncontact LE injury (27/63) during the season [RR: 3.4 95% confidence interval 2.0 to 6.0]. These results suggest that an injury prediction algorithm composed of performance on efficient, low-cost, field-ready tests can help identify individuals at elevated risk of noncontact LE injury. © 2013 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Spectral unmixing of urban land cover using a generic library approach
NASA Astrophysics Data System (ADS)
Degerickx, Jeroen; Lordache, Marian-Daniel; Okujeni, Akpona; Hermy, Martin; van der Linden, Sebastian; Somers, Ben
2016-10-01
Remote sensing based land cover classification in urban areas generally requires the use of subpixel classification algorithms to take into account the high spatial heterogeneity. These spectral unmixing techniques often rely on spectral libraries, i.e. collections of pure material spectra (endmembers, EM), which ideally cover the large EM variability typically present in urban scenes. Despite the advent of several (semi-) automated EM detection algorithms, the collection of such image-specific libraries remains a tedious and time-consuming task. As an alternative, we suggest the use of a generic urban EM library, containing material spectra under varying conditions, acquired from different locations and sensors. This approach requires an efficient EM selection technique, capable of only selecting those spectra relevant for a specific image. In this paper, we evaluate and compare the potential of different existing library pruning algorithms (Iterative Endmember Selection and MUSIC) using simulated hyperspectral (APEX) data of the Brussels metropolitan area. In addition, we develop a new hybrid EM selection method which is shown to be highly efficient in dealing with both imagespecific and generic libraries, subsequently yielding more robust land cover classification results compared to existing methods. Future research will include further optimization of the proposed algorithm and additional tests on both simulated and real hyperspectral data.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Biyikli, Emre; To, Albert C., E-mail: albertto@pitt.edu
Atomistic/continuum coupling methods combine accurate atomistic methods and efficient continuum methods to simulate the behavior of highly ordered crystalline systems. Coupled methods utilize the advantages of both approaches to simulate systems at a lower computational cost, while retaining the accuracy associated with atomistic methods. Many concurrent atomistic/continuum coupling methods have been proposed in the past; however, their true computational efficiency has not been demonstrated. The present work presents an efficient implementation of a concurrent coupling method called the Multiresolution Molecular Mechanics (MMM) for serial, parallel, and adaptive analysis. First, we present the features of the software implemented along with themore » associated technologies. The scalability of the software implementation is demonstrated, and the competing effects of multiscale modeling and parallelization are discussed. Then, the algorithms contributing to the efficiency of the software are presented. These include algorithms for eliminating latent ghost atoms from calculations and measurement-based dynamic balancing of parallel workload. The efficiency improvements made by these algorithms are demonstrated by benchmark tests. The efficiency of the software is found to be on par with LAMMPS, a state-of-the-art Molecular Dynamics (MD) simulation code, when performing full atomistic simulations. Speed-up of the MMM method is shown to be directly proportional to the reduction of the number of the atoms visited in force computation. Finally, an adaptive MMM analysis on a nanoindentation problem, containing over a million atoms, is performed, yielding an improvement of 6.3–8.5 times in efficiency, over the full atomistic MD method. For the first time, the efficiency of a concurrent atomistic/continuum coupling method is comprehensively investigated and demonstrated.« less
Machine learning of molecular properties: Locality and active learning
NASA Astrophysics Data System (ADS)
Gubaev, Konstantin; Podryabinkin, Evgeny V.; Shapeev, Alexander V.
2018-06-01
In recent years, the machine learning techniques have shown great potent1ial in various problems from a multitude of disciplines, including materials design and drug discovery. The high computational speed on the one hand and the accuracy comparable to that of density functional theory on another hand make machine learning algorithms efficient for high-throughput screening through chemical and configurational space. However, the machine learning algorithms available in the literature require large training datasets to reach the chemical accuracy and also show large errors for the so-called outliers—the out-of-sample molecules, not well-represented in the training set. In the present paper, we propose a new machine learning algorithm for predicting molecular properties that addresses these two issues: it is based on a local model of interatomic interactions providing high accuracy when trained on relatively small training sets and an active learning algorithm of optimally choosing the training set that significantly reduces the errors for the outliers. We compare our model to the other state-of-the-art algorithms from the literature on the widely used benchmark tests.
An efficient CU partition algorithm for HEVC based on improved Sobel operator
NASA Astrophysics Data System (ADS)
Sun, Xuebin; Chen, Xiaodong; Xu, Yong; Sun, Gang; Yang, Yunsheng
2018-04-01
As the latest video coding standard, High Efficiency Video Coding (HEVC) achieves over 50% bit rate reduction with similar video quality compared with previous standards H.264/AVC. However, the higher compression efficiency is attained at the cost of significantly increasing computational load. In order to reduce the complexity, this paper proposes a fast coding unit (CU) partition technique to speed up the process. To detect the edge features of each CU, a more accurate improved Sobel filtering is developed and performed By analyzing the textural features of CU, an early CU splitting termination is proposed to decide whether a CU should be decomposed into four lower-dimensions CUs or not. Compared with the reference software HM16.7, experimental results indicate the proposed algorithm can lessen the encoding time up to 44.09% on average, with a negligible bit rate increase of 0.24%, and quality losses lower 0.03 dB, respectively. In addition, the proposed algorithm gets a better trade-off between complexity and rate-distortion among the other proposed works.
Competitive Swarm Optimizer Based Gateway Deployment Algorithm in Cyber-Physical Systems.
Huang, Shuqiang; Tao, Ming
2017-01-22
Wireless sensor network topology optimization is a highly important issue, and topology control through node selection can improve the efficiency of data forwarding, while saving energy and prolonging lifetime of the network. To address the problem of connecting a wireless sensor network to the Internet in cyber-physical systems, here we propose a geometric gateway deployment based on a competitive swarm optimizer algorithm. The particle swarm optimization (PSO) algorithm has a continuous search feature in the solution space, which makes it suitable for finding the geometric center of gateway deployment; however, its search mechanism is limited to the individual optimum (pbest) and the population optimum (gbest); thus, it easily falls into local optima. In order to improve the particle search mechanism and enhance the search efficiency of the algorithm, we introduce a new competitive swarm optimizer (CSO) algorithm. The CSO search algorithm is based on an inter-particle competition mechanism and can effectively avoid trapping of the population falling into a local optimum. With the improvement of an adaptive opposition-based search and its ability to dynamically parameter adjustments, this algorithm can maintain the diversity of the entire swarm to solve geometric K -center gateway deployment problems. The simulation results show that this CSO algorithm has a good global explorative ability as well as convergence speed and can improve the network quality of service (QoS) level of cyber-physical systems by obtaining a minimum network coverage radius. We also find that the CSO algorithm is more stable, robust and effective in solving the problem of geometric gateway deployment as compared to the PSO or Kmedoids algorithms.
Computationally efficient algorithm for high sampling-frequency operation of active noise control
NASA Astrophysics Data System (ADS)
Rout, Nirmal Kumar; Das, Debi Prasad; Panda, Ganapati
2015-05-01
In high sampling-frequency operation of active noise control (ANC) system the length of the secondary path estimate and the ANC filter are very long. This increases the computational complexity of the conventional filtered-x least mean square (FXLMS) algorithm. To reduce the computational complexity of long order ANC system using FXLMS algorithm, frequency domain block ANC algorithms have been proposed in past. These full block frequency domain ANC algorithms are associated with some disadvantages such as large block delay, quantization error due to computation of large size transforms and implementation difficulties in existing low-end DSP hardware. To overcome these shortcomings, the partitioned block ANC algorithm is newly proposed where the long length filters in ANC are divided into a number of equal partitions and suitably assembled to perform the FXLMS algorithm in the frequency domain. The complexity of this proposed frequency domain partitioned block FXLMS (FPBFXLMS) algorithm is quite reduced compared to the conventional FXLMS algorithm. It is further reduced by merging one fast Fourier transform (FFT)-inverse fast Fourier transform (IFFT) combination to derive the reduced structure FPBFXLMS (RFPBFXLMS) algorithm. Computational complexity analysis for different orders of filter and partition size are presented. Systematic computer simulations are carried out for both the proposed partitioned block ANC algorithms to show its accuracy compared to the time domain FXLMS algorithm.
NASA Astrophysics Data System (ADS)
Lin, Youzuo; O'Malley, Daniel; Vesselinov, Velimir V.
2016-09-01
Inverse modeling seeks model parameters given a set of observations. However, for practical problems because the number of measurements is often large and the model parameters are also numerous, conventional methods for inverse modeling can be computationally expensive. We have developed a new, computationally efficient parallel Levenberg-Marquardt method for solving inverse modeling problems with a highly parameterized model space. Levenberg-Marquardt methods require the solution of a linear system of equations which can be prohibitively expensive to compute for moderate to large-scale problems. Our novel method projects the original linear problem down to a Krylov subspace such that the dimensionality of the problem can be significantly reduced. Furthermore, we store the Krylov subspace computed when using the first damping parameter and recycle the subspace for the subsequent damping parameters. The efficiency of our new inverse modeling algorithm is significantly improved using these computational techniques. We apply this new inverse modeling method to invert for random transmissivity fields in 2-D and a random hydraulic conductivity field in 3-D. Our algorithm is fast enough to solve for the distributed model parameters (transmissivity) in the model domain. The algorithm is coded in Julia and implemented in the MADS computational framework (http://mads.lanl.gov). By comparing with Levenberg-Marquardt methods using standard linear inversion techniques such as QR or SVD methods, our Levenberg-Marquardt method yields a speed-up ratio on the order of ˜101 to ˜102 in a multicore computational environment. Therefore, our new inverse modeling method is a powerful tool for characterizing subsurface heterogeneity for moderate to large-scale problems.
NASA Technical Reports Server (NTRS)
Carroll, Chester C.; Youngblood, John N.; Saha, Aindam
1987-01-01
Improvements and advances in the development of computer architecture now provide innovative technology for the recasting of traditional sequential solutions into high-performance, low-cost, parallel system to increase system performance. Research conducted in development of specialized computer architecture for the algorithmic execution of an avionics system, guidance and control problem in real time is described. A comprehensive treatment of both the hardware and software structures of a customized computer which performs real-time computation of guidance commands with updated estimates of target motion and time-to-go is presented. An optimal, real-time allocation algorithm was developed which maps the algorithmic tasks onto the processing elements. This allocation is based on the critical path analysis. The final stage is the design and development of the hardware structures suitable for the efficient execution of the allocated task graph. The processing element is designed for rapid execution of the allocated tasks. Fault tolerance is a key feature of the overall architecture. Parallel numerical integration techniques, tasks definitions, and allocation algorithms are discussed. The parallel implementation is analytically verified and the experimental results are presented. The design of the data-driven computer architecture, customized for the execution of the particular algorithm, is discussed.
Wang, Xingmei; Hao, Wenqian; Li, Qiming
2017-12-18
This paper proposes an adaptive cultural algorithm with improved quantum-behaved particle swarm optimization (ACA-IQPSO) to detect the underwater sonar image. In the population space, to improve searching ability of particles, iterative times and the fitness value of particles are regarded as factors to adaptively adjust the contraction-expansion coefficient of the quantum-behaved particle swarm optimization algorithm (QPSO). The improved quantum-behaved particle swarm optimization algorithm (IQPSO) can make particles adjust their behaviours according to their quality. In the belief space, a new update strategy is adopted to update cultural individuals according to the idea of the update strategy in shuffled frog leaping algorithm (SFLA). Moreover, to enhance the utilization of information in the population space and belief space, accept function and influence function are redesigned in the new communication protocol. The experimental results show that ACA-IQPSO can obtain good clustering centres according to the grey distribution information of underwater sonar images, and accurately complete underwater objects detection. Compared with other algorithms, the proposed ACA-IQPSO has good effectiveness, excellent adaptability, a powerful searching ability and high convergence efficiency. Meanwhile, the experimental results of the benchmark functions can further demonstrate that the proposed ACA-IQPSO has better searching ability, convergence efficiency and stability.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Carroll, C.C.; Youngblood, J.N.; Saha, A.
1987-12-01
Improvements and advances in the development of computer architecture now provide innovative technology for the recasting of traditional sequential solutions into high-performance, low-cost, parallel system to increase system performance. Research conducted in development of specialized computer architecture for the algorithmic execution of an avionics system, guidance and control problem in real time is described. A comprehensive treatment of both the hardware and software structures of a customized computer which performs real-time computation of guidance commands with updated estimates of target motion and time-to-go is presented. An optimal, real-time allocation algorithm was developed which maps the algorithmic tasks onto the processingmore » elements. This allocation is based on the critical path analysis. The final stage is the design and development of the hardware structures suitable for the efficient execution of the allocated task graph. The processing element is designed for rapid execution of the allocated tasks. Fault tolerance is a key feature of the overall architecture. Parallel numerical integration techniques, tasks definitions, and allocation algorithms are discussed. The parallel implementation is analytically verified and the experimental results are presented. The design of the data-driven computer architecture, customized for the execution of the particular algorithm, is discussed.« less
An Example-Based Super-Resolution Algorithm for Selfie Images
William, Jino Hans; Venkateswaran, N.; Narayanan, Srinath; Ramachandran, Sandeep
2016-01-01
A selfie is typically a self-portrait captured using the front camera of a smartphone. Most state-of-the-art smartphones are equipped with a high-resolution (HR) rear camera and a low-resolution (LR) front camera. As selfies are captured by front camera with limited pixel resolution, the fine details in it are explicitly missed. This paper aims to improve the resolution of selfies by exploiting the fine details in HR images captured by rear camera using an example-based super-resolution (SR) algorithm. HR images captured by rear camera carry significant fine details and are used as an exemplar to train an optimal matrix-value regression (MVR) operator. The MVR operator serves as an image-pair priori which learns the correspondence between the LR-HR patch-pairs and is effectively used to super-resolve LR selfie images. The proposed MVR algorithm avoids vectorization of image patch-pairs and preserves image-level information during both learning and recovering process. The proposed algorithm is evaluated for its efficiency and effectiveness both qualitatively and quantitatively with other state-of-the-art SR algorithms. The results validate that the proposed algorithm is efficient as it requires less than 3 seconds to super-resolve LR selfie and is effective as it preserves sharp details without introducing any counterfeit fine details. PMID:27064500
An Energy-Efficient Underground Localization System Based on Heterogeneous Wireless Networks
Yuan, Yazhou; Chen, Cailian; Guan, Xinping; Yang, Qiuling
2015-01-01
A precision positioning system with energy efficiency is of great necessity for guaranteeing personnel safety in underground mines. The location information of the miners' should be transmitted to the control center timely and reliably; therefore, a heterogeneous network with the backbone based on high speed Industrial Ethernet is deployed. Since the mobile wireless nodes are working in an irregular tunnel, a specific wireless propagation model cannot fit all situations. In this paper, an underground localization system is designed to enable the adaptation to kinds of harsh tunnel environments, but also to reduce the energy consumption and thus prolong the lifetime of the network. Three key techniques are developed and implemented to improve the system performance, including a step counting algorithm with accelerometers, a power control algorithm and an adaptive packets scheduling scheme. The simulation study and experimental results show the effectiveness of the proposed algorithms and the implementation. PMID:26016918
Further optimization of SeDDaRA blind image deconvolution algorithm and its DSP implementation
NASA Astrophysics Data System (ADS)
Wen, Bo; Zhang, Qiheng; Zhang, Jianlin
2011-11-01
Efficient algorithm for blind image deconvolution and its high-speed implementation is of great value in practice. Further optimization of SeDDaRA is developed, from algorithm structure to numerical calculation methods. The main optimization covers that, the structure's modularization for good implementation feasibility, reducing the data computation and dependency of 2D-FFT/IFFT, and acceleration of power operation by segmented look-up table. Then the Fast SeDDaRA is proposed and specialized for low complexity. As the final implementation, a hardware system of image restoration is conducted by using the multi-DSP parallel processing. Experimental results show that, the processing time and memory demand of Fast SeDDaRA decreases 50% at least; the data throughput of image restoration system is over 7.8Msps. The optimization is proved efficient and feasible, and the Fast SeDDaRA is able to support the real-time application.
NASA Technical Reports Server (NTRS)
Cain, Michael D.
1999-01-01
The goal of this thesis is to develop an efficient and robust locally preconditioned semi-coarsening multigrid algorithm for the two-dimensional Navier-Stokes equations. This thesis examines the performance of the multigrid algorithm with local preconditioning for an upwind-discretization of the Navier-Stokes equations. A block Jacobi iterative scheme is used because of its high frequency error mode damping ability. At low Mach numbers, the performance of a flux preconditioner is investigated. The flux preconditioner utilizes a new limiting technique based on local information that was developed by Siu. Full-coarsening and-semi-coarsening are examined as well as the multigrid V-cycle and full multigrid. The numerical tests were performed on a NACA 0012 airfoil at a range of Mach numbers. The tests show that semi-coarsening with flux preconditioning is the most efficient and robust combination of coarsening strategy, and iterative scheme - especially at low Mach numbers.
Spectral identification of topological domains
Chen, Jie; Hero, Alfred O.; Rajapakse, Indika
2016-01-01
Motivation: Topological domains have been proposed as the backbone of interphase chromosome structure. They are regions of high local contact frequency separated by sharp boundaries. Genes within a domain often have correlated transcription. In this paper, we present a computational efficient spectral algorithm to identify topological domains from chromosome conformation data (Hi-C data). We consider the genome as a weighted graph with vertices defined by loci on a chromosome and the edge weights given by interaction frequency between two loci. Laplacian-based graph segmentation is then applied iteratively to obtain the domains at the given compactness level. Comparison with algorithms in the literature shows the advantage of the proposed strategy. Results: An efficient algorithm is presented to identify topological domains from the Hi-C matrix. Availability and Implementation: The Matlab source code and illustrative examples are available at http://bionetworks.ccmb.med.umich.edu/ Contact: indikar@med.umich.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27153657
LSAH: a fast and efficient local surface feature for point cloud registration
NASA Astrophysics Data System (ADS)
Lu, Rongrong; Zhu, Feng; Wu, Qingxiao; Kong, Yanzi
2018-04-01
Point cloud registration is a fundamental task in high level three dimensional applications. Noise, uneven point density and varying point cloud resolutions are the three main challenges for point cloud registration. In this paper, we design a robust and compact local surface descriptor called Local Surface Angles Histogram (LSAH) and propose an effectively coarse to fine algorithm for point cloud registration. The LSAH descriptor is formed by concatenating five normalized sub-histograms into one histogram. The five sub-histograms are created by accumulating a different type of angle from a local surface patch respectively. The experimental results show that our LSAH is more robust to uneven point density and point cloud resolutions than four state-of-the-art local descriptors in terms of feature matching. Moreover, we tested our LSAH based coarse to fine algorithm for point cloud registration. The experimental results demonstrate that our algorithm is robust and efficient as well.
Zero-block mode decision algorithm for H.264/AVC.
Lee, Yu-Ming; Lin, Yinyi
2009-03-01
In the previous paper , we proposed a zero-block intermode decision algorithm for H.264 video coding based upon the number of zero-blocks of 4 x 4 DCT coefficients between the current macroblock and the co-located macroblock. The proposed algorithm can achieve significant improvement in computation, but the computation performance is limited for high bit-rate coding. To improve computation efficiency, in this paper, we suggest an enhanced zero-block decision algorithm, which uses an early zero-block detection method to compute the number of zero-blocks instead of direct DCT and quantization (DCT/Q) calculation and incorporates two adequate decision methods into semi-stationary and nonstationary regions of a video sequence. In addition, the zero-block decision algorithm is also applied to the intramode prediction in the P frame. The enhanced zero-block decision algorithm brings out a reduction of average 27% of total encoding time compared to the zero-block decision algorithm.
Real-time image dehazing using local adaptive neighborhoods and dark-channel-prior
NASA Astrophysics Data System (ADS)
Valderrama, Jesus A.; Díaz-Ramírez, Víctor H.; Kober, Vitaly; Hernandez, Enrique
2015-09-01
A real-time algorithm for single image dehazing is presented. The algorithm is based on calculation of local neighborhoods of a hazed image inside a moving window. The local neighborhoods are constructed by computing rank-order statistics. Next the dark-channel-prior approach is applied to the local neighborhoods to estimate the transmission function of the scene. By using the suggested approach there is no need for applying a refining algorithm to the estimated transmission such as the soft matting algorithm. To achieve high-rate signal processing the proposed algorithm is implemented exploiting massive parallelism on a graphics processing unit (GPU). Computer simulation results are carried out to test the performance of the proposed algorithm in terms of dehazing efficiency and speed of processing. These tests are performed using several synthetic and real images. The obtained results are analyzed and compared with those obtained with existing dehazing algorithms.
Ouyang, Qin; Zhao, Jiewen; Chen, Quansheng
2015-01-01
The non-sugar solids (NSS) content is one of the most important nutrition indicators of Chinese rice wine. This study proposed a rapid method for the measurement of NSS content in Chinese rice wine using near infrared (NIR) spectroscopy. We also systemically studied the efficient spectral variables selection algorithms that have to go through modeling. A new algorithm of synergy interval partial least square with competitive adaptive reweighted sampling (Si-CARS-PLS) was proposed for modeling. The performance of the final model was back-evaluated using root mean square error of calibration (RMSEC) and correlation coefficient (Rc) in calibration set and similarly tested by mean square error of prediction (RMSEP) and correlation coefficient (Rp) in prediction set. The optimum model by Si-CARS-PLS algorithm was achieved when 7 PLS factors and 18 variables were included, and the results were as follows: Rc=0.95 and RMSEC=1.12 in the calibration set, Rp=0.95 and RMSEP=1.22 in the prediction set. In addition, Si-CARS-PLS algorithm showed its superiority when compared with the commonly used algorithms in multivariate calibration. This work demonstrated that NIR spectroscopy technique combined with a suitable multivariate calibration algorithm has a high potential in rapid measurement of NSS content in Chinese rice wine. Copyright © 2015 Elsevier B.V. All rights reserved.
Pronk, Sander; Pouya, Iman; Lundborg, Magnus; Rotskoff, Grant; Wesén, Björn; Kasson, Peter M; Lindahl, Erik
2015-06-09
Computational chemistry and other simulation fields are critically dependent on computing resources, but few problems scale efficiently to the hundreds of thousands of processors available in current supercomputers-particularly for molecular dynamics. This has turned into a bottleneck as new hardware generations primarily provide more processing units rather than making individual units much faster, which simulation applications are addressing by increasingly focusing on sampling with algorithms such as free-energy perturbation, Markov state modeling, metadynamics, or milestoning. All these rely on combining results from multiple simulations into a single observation. They are potentially powerful approaches that aim to predict experimental observables directly, but this comes at the expense of added complexity in selecting sampling strategies and keeping track of dozens to thousands of simulations and their dependencies. Here, we describe how the distributed execution framework Copernicus allows the expression of such algorithms in generic workflows: dataflow programs. Because dataflow algorithms explicitly state dependencies of each constituent part, algorithms only need to be described on conceptual level, after which the execution is maximally parallel. The fully automated execution facilitates the optimization of these algorithms with adaptive sampling, where undersampled regions are automatically detected and targeted without user intervention. We show how several such algorithms can be formulated for computational chemistry problems, and how they are executed efficiently with many loosely coupled simulations using either distributed or parallel resources with Copernicus.
Efficient terrestrial laser scan segmentation exploiting data structure
NASA Astrophysics Data System (ADS)
Mahmoudabadi, Hamid; Olsen, Michael J.; Todorovic, Sinisa
2016-09-01
New technologies such as lidar enable the rapid collection of massive datasets to model a 3D scene as a point cloud. However, while hardware technology continues to advance, processing 3D point clouds into informative models remains complex and time consuming. A common approach to increase processing efficiently is to segment the point cloud into smaller sections. This paper proposes a novel approach for point cloud segmentation using computer vision algorithms to analyze panoramic representations of individual laser scans. These panoramas can be quickly created using an inherent neighborhood structure that is established during the scanning process, which scans at fixed angular increments in a cylindrical or spherical coordinate system. In the proposed approach, a selected image segmentation algorithm is applied on several input layers exploiting this angular structure including laser intensity, range, normal vectors, and color information. These segments are then mapped back to the 3D point cloud so that modeling can be completed more efficiently. This approach does not depend on pre-defined mathematical models and consequently setting parameters for them. Unlike common geometrical point cloud segmentation methods, the proposed method employs the colorimetric and intensity data as another source of information. The proposed algorithm is demonstrated on several datasets encompassing variety of scenes and objects. Results show a very high perceptual (visual) level of segmentation and thereby the feasibility of the proposed algorithm. The proposed method is also more efficient compared to Random Sample Consensus (RANSAC), which is a common approach for point cloud segmentation.
Energy-Efficient Deadline-Aware Data-Gathering Scheme Using Multiple Mobile Data Collectors.
Dasgupta, Rumpa; Yoon, Seokhoon
2017-04-01
In wireless sensor networks, the data collected by sensors are usually forwarded to the sink through multi-hop forwarding. However, multi-hop forwarding can be inefficient due to the energy hole problem and high communications overhead. Moreover, when the monitored area is large and the number of sensors is small, sensors cannot send the data via multi-hop forwarding due to the lack of network connectivity. In order to address those problems of multi-hop forwarding, in this paper, we consider a data collection scheme that uses mobile data collectors (MDCs), which visit sensors and collect data from them. Due to the recent breakthroughs in wireless power transfer technology, MDCs can also be used to recharge the sensors to keep them from draining their energy. In MDC-based data-gathering schemes, a big challenge is how to find the MDCs' traveling paths in a balanced way, such that their energy consumption is minimized and the packet-delay constraint is satisfied. Therefore, in this paper, we aim at finding the MDCs' paths, taking energy efficiency and delay constraints into account. We first define an optimization problem, named the delay-constrained energy minimization (DCEM) problem, to find the paths for MDCs. An integer linear programming problem is formulated to find the optimal solution. We also propose a two-phase path-selection algorithm to efficiently solve the DCEM problem. Simulations are performed to compare the performance of the proposed algorithms with two heuristics algorithms for the vehicle routing problem under various scenarios. The simulation results show that the proposed algorithms can outperform existing algorithms in terms of energy efficiency and packet delay.
Energy-Efficient Deadline-Aware Data-Gathering Scheme Using Multiple Mobile Data Collectors
Dasgupta, Rumpa; Yoon, Seokhoon
2017-01-01
In wireless sensor networks, the data collected by sensors are usually forwarded to the sink through multi-hop forwarding. However, multi-hop forwarding can be inefficient due to the energy hole problem and high communications overhead. Moreover, when the monitored area is large and the number of sensors is small, sensors cannot send the data via multi-hop forwarding due to the lack of network connectivity. In order to address those problems of multi-hop forwarding, in this paper, we consider a data collection scheme that uses mobile data collectors (MDCs), which visit sensors and collect data from them. Due to the recent breakthroughs in wireless power transfer technology, MDCs can also be used to recharge the sensors to keep them from draining their energy. In MDC-based data-gathering schemes, a big challenge is how to find the MDCs’ traveling paths in a balanced way, such that their energy consumption is minimized and the packet-delay constraint is satisfied. Therefore, in this paper, we aim at finding the MDCs’ paths, taking energy efficiency and delay constraints into account. We first define an optimization problem, named the delay-constrained energy minimization (DCEM) problem, to find the paths for MDCs. An integer linear programming problem is formulated to find the optimal solution. We also propose a two-phase path-selection algorithm to efficiently solve the DCEM problem. Simulations are performed to compare the performance of the proposed algorithms with two heuristics algorithms for the vehicle routing problem under various scenarios. The simulation results show that the proposed algorithms can outperform existing algorithms in terms of energy efficiency and packet delay. PMID:28368300
Parallel discontinuous Galerkin FEM for computing hyperbolic conservation law on unstructured grids
NASA Astrophysics Data System (ADS)
Ma, Xinrong; Duan, Zhijian
2018-04-01
High-order resolution Discontinuous Galerkin finite element methods (DGFEM) has been known as a good method for solving Euler equations and Navier-Stokes equations on unstructured grid, but it costs too much computational resources. An efficient parallel algorithm was presented for solving the compressible Euler equations. Moreover, the multigrid strategy based on three-stage three-order TVD Runge-Kutta scheme was used in order to improve the computational efficiency of DGFEM and accelerate the convergence of the solution of unsteady compressible Euler equations. In order to make each processor maintain load balancing, the domain decomposition method was employed. Numerical experiment performed for the inviscid transonic flow fluid problems around NACA0012 airfoil and M6 wing. The results indicated that our parallel algorithm can improve acceleration and efficiency significantly, which is suitable for calculating the complex flow fluid.
Massively parallel algorithms for real-time wavefront control of a dense adaptive optics system
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fijany, A.; Milman, M.; Redding, D.
1994-12-31
In this paper massively parallel algorithms and architectures for real-time wavefront control of a dense adaptive optic system (SELENE) are presented. The authors have already shown that the computation of a near optimal control algorithm for SELENE can be reduced to the solution of a discrete Poisson equation on a regular domain. Although, this represents an optimal computation, due the large size of the system and the high sampling rate requirement, the implementation of this control algorithm poses a computationally challenging problem since it demands a sustained computational throughput of the order of 10 GFlops. They develop a novel algorithm,more » designated as Fast Invariant Imbedding algorithm, which offers a massive degree of parallelism with simple communication and synchronization requirements. Due to these features, this algorithm is significantly more efficient than other Fast Poisson Solvers for implementation on massively parallel architectures. The authors also discuss two massively parallel, algorithmically specialized, architectures for low-cost and optimal implementation of the Fast Invariant Imbedding algorithm.« less
Phase-unwrapping algorithm by a rounding-least-squares approach
NASA Astrophysics Data System (ADS)
Juarez-Salazar, Rigoberto; Robledo-Sanchez, Carlos; Guerrero-Sanchez, Fermin
2014-02-01
A simple and efficient phase-unwrapping algorithm based on a rounding procedure and a global least-squares minimization is proposed. Instead of processing the gradient of the wrapped phase, this algorithm operates over the gradient of the phase jumps by a robust and noniterative scheme. Thus, the residue-spreading and over-smoothing effects are reduced. The algorithm's performance is compared with four well-known phase-unwrapping methods: minimum cost network flow (MCNF), fast Fourier transform (FFT), quality-guided, and branch-cut. A computer simulation and experimental results show that the proposed algorithm reaches a high-accuracy level than the MCNF method by a low-computing time similar to the FFT phase-unwrapping method. Moreover, since the proposed algorithm is simple, fast, and user-free, it could be used in metrological interferometric and fringe-projection automatic real-time applications.
An efficient parallel algorithm for matrix-vector multiplication
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hendrickson, B.; Leland, R.; Plimpton, S.
The multiplication of a vector by a matrix is the kernel computation of many algorithms in scientific computation. A fast parallel algorithm for this calculation is therefore necessary if one is to make full use of the new generation of parallel supercomputers. This paper presents a high performance, parallel matrix-vector multiplication algorithm that is particularly well suited to hypercube multiprocessors. For an n x n matrix on p processors, the communication cost of this algorithm is O(n/[radical]p + log(p)), independent of the matrix sparsity pattern. The performance of the algorithm is demonstrated by employing it as the kernel in themore » well-known NAS conjugate gradient benchmark, where a run time of 6.09 seconds was observed. This is the best published performance on this benchmark achieved to date using a massively parallel supercomputer.« less
Quick Vegas: Improving Performance of TCP Vegas for High Bandwidth-Delay Product Networks
NASA Astrophysics Data System (ADS)
Chan, Yi-Cheng; Lin, Chia-Liang; Ho, Cheng-Yuan
An important issue in designing a TCP congestion control algorithm is that it should allow the protocol to quickly adjust the end-to-end communication rate to the bandwidth on the bottleneck link. However, the TCP congestion control may function poorly in high bandwidth-delay product networks because of its slow response with large congestion windows. In this paper, we propose an enhanced version of TCP Vegas called Quick Vegas, in which we present an efficient congestion window control algorithm for a TCP source. Our algorithm improves the slow-start and congestion avoidance techniques of original Vegas. Simulation results show that Quick Vegas significantly improves the performance of connections as well as remaining fair when the bandwidth-delay product increases.
Instantaneous Coastline Extraction from LIDAR Point Cloud and High Resolution Remote Sensing Imagery
NASA Astrophysics Data System (ADS)
Li, Y.; Zhoing, L.; Lai, Z.; Gan, Z.
2018-04-01
A new method was proposed for instantaneous waterline extraction in this paper, which combines point cloud geometry features and image spectral characteristics of the coastal zone. The proposed method consists of follow steps: Mean Shift algorithm is used to segment the coastal zone of high resolution remote sensing images into small regions containing semantic information;Region features are extracted by integrating LiDAR data and the surface area of the image; initial waterlines are extracted by α-shape algorithm; a region growing algorithm with is taking into coastline refinement, with a growth rule integrating the intensity and topography of LiDAR data; moothing the coastline. Experiments are conducted to demonstrate the efficiency of the proposed method.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vecharynski, Eugene; Brabec, Jiri; Shao, Meiyue
We present two efficient iterative algorithms for solving the linear response eigen- value problem arising from the time dependent density functional theory. Although the matrix to be diagonalized is nonsymmetric, it has a special structure that can be exploited to save both memory and floating point operations. In particular, the nonsymmetric eigenvalue problem can be transformed into a product eigenvalue problem that is self-adjoint with respect to a K-inner product. This product eigenvalue problem can be solved efficiently by a modified Davidson algorithm and a modified locally optimal block preconditioned conjugate gradient (LOBPCG) algorithm that make use of the K-innermore » product. The solution of the product eigenvalue problem yields one component of the eigenvector associated with the original eigenvalue problem. However, the other component of the eigenvector can be easily recovered in a postprocessing procedure. Therefore, the algorithms we present here are more efficient than existing algorithms that try to approximate both components of the eigenvectors simultaneously. The efficiency of the new algorithms is demonstrated by numerical examples.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hong, Youngjoon, E-mail: hongy@uic.edu; Nicholls, David P., E-mail: davidn@uic.edu
The accurate numerical simulation of linear waves interacting with periodic layered media is a crucial capability in engineering applications. In this contribution we study the stable and high-order accurate numerical simulation of the interaction of linear, time-harmonic waves with a periodic, triply layered medium with irregular interfaces. In contrast with volumetric approaches, High-Order Perturbation of Surfaces (HOPS) algorithms are inexpensive interfacial methods which rapidly and recursively estimate scattering returns by perturbation of the interface shape. In comparison with Boundary Integral/Element Methods, the stable HOPS algorithm we describe here does not require specialized quadrature rules, periodization strategies, or the solution ofmore » dense non-symmetric positive definite linear systems. In addition, the algorithm is provably stable as opposed to other classical HOPS approaches. With numerical experiments we show the remarkable efficiency, fidelity, and accuracy one can achieve with an implementation of this algorithm.« less
Pei, Zongrui; Max-Planck-Inst. fur Eisenforschung, Duseldorf; Eisenbach, Markus
2017-02-06
Dislocations are among the most important defects in determining the mechanical properties of both conventional alloys and high-entropy alloys. The Peierls-Nabarro model supplies an efficient pathway to their geometries and mobility. The difficulty in solving the integro-differential Peierls-Nabarro equation is how to effectively avoid the local minima in the energy landscape of a dislocation core. Among the other methods to optimize the dislocation core structures, we choose the algorithm of Particle Swarm Optimization, an algorithm that simulates the social behaviors of organisms. By employing more particles (bigger swarm) and more iterative steps (allowing them to explore for longer time), themore » local minima can be effectively avoided. But this would require more computational cost. The advantage of this algorithm is that it is readily parallelized in modern high computing architecture. We demonstrate the performance of our parallelized algorithm scales linearly with the number of employed cores.« less
HPC-NMF: A High-Performance Parallel Algorithm for Nonnegative Matrix Factorization
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kannan, Ramakrishnan; Sukumar, Sreenivas R.; Ballard, Grey M.
NMF is a useful tool for many applications in different domains such as topic modeling in text mining, background separation in video analysis, and community detection in social networks. Despite its popularity in the data mining community, there is a lack of efficient distributed algorithms to solve the problem for big data sets. We propose a high-performance distributed-memory parallel algorithm that computes the factorization by iteratively solving alternating non-negative least squares (NLS) subproblems formore » $$\\WW$$ and $$\\HH$$. It maintains the data and factor matrices in memory (distributed across processors), uses MPI for interprocessor communication, and, in the dense case, provably minimizes communication costs (under mild assumptions). As opposed to previous implementation, our algorithm is also flexible: It performs well for both dense and sparse matrices, and allows the user to choose any one of the multiple algorithms for solving the updates to low rank factors $$\\WW$$ and $$\\HH$$ within the alternating iterations.« less
Delayed Slater determinant update algorithms for high efficiency quantum Monte Carlo
McDaniel, Tyler; D’Azevedo, Ed F.; Li, Ying Wai; ...
2017-11-07
Within ab initio Quantum Monte Carlo simulations, the leading numerical cost for large systems is the computation of the values of the Slater determinants in the trial wavefunction. Each Monte Carlo step requires finding the determinant of a dense matrix. This is most commonly iteratively evaluated using a rank-1 Sherman-Morrison updating scheme to avoid repeated explicit calculation of the inverse. The overall computational cost is therefore formally cubic in the number of electrons or matrix size. To improve the numerical efficiency of this procedure, we propose a novel multiple rank delayed update scheme. This strategy enables probability evaluation with applicationmore » of accepted moves to the matrices delayed until after a predetermined number of moves, K. The accepted events are then applied to the matrices en bloc with enhanced arithmetic intensity and computational efficiency via matrix-matrix operations instead of matrix-vector operations. Here this procedure does not change the underlying Monte Carlo sampling or its statistical efficiency. For calculations on large systems and algorithms such as diffusion Monte Carlo where the acceptance ratio is high, order of magnitude improvements in the update time can be obtained on both multi- core CPUs and GPUs.« less
Delayed Slater determinant update algorithms for high efficiency quantum Monte Carlo
DOE Office of Scientific and Technical Information (OSTI.GOV)
McDaniel, Tyler; D’Azevedo, Ed F.; Li, Ying Wai
Within ab initio Quantum Monte Carlo simulations, the leading numerical cost for large systems is the computation of the values of the Slater determinants in the trial wavefunction. Each Monte Carlo step requires finding the determinant of a dense matrix. This is most commonly iteratively evaluated using a rank-1 Sherman-Morrison updating scheme to avoid repeated explicit calculation of the inverse. The overall computational cost is therefore formally cubic in the number of electrons or matrix size. To improve the numerical efficiency of this procedure, we propose a novel multiple rank delayed update scheme. This strategy enables probability evaluation with applicationmore » of accepted moves to the matrices delayed until after a predetermined number of moves, K. The accepted events are then applied to the matrices en bloc with enhanced arithmetic intensity and computational efficiency via matrix-matrix operations instead of matrix-vector operations. Here this procedure does not change the underlying Monte Carlo sampling or its statistical efficiency. For calculations on large systems and algorithms such as diffusion Monte Carlo where the acceptance ratio is high, order of magnitude improvements in the update time can be obtained on both multi- core CPUs and GPUs.« less
Delayed Slater determinant update algorithms for high efficiency quantum Monte Carlo.
McDaniel, T; D'Azevedo, E F; Li, Y W; Wong, K; Kent, P R C
2017-11-07
Within ab initio Quantum Monte Carlo simulations, the leading numerical cost for large systems is the computation of the values of the Slater determinants in the trial wavefunction. Each Monte Carlo step requires finding the determinant of a dense matrix. This is most commonly iteratively evaluated using a rank-1 Sherman-Morrison updating scheme to avoid repeated explicit calculation of the inverse. The overall computational cost is, therefore, formally cubic in the number of electrons or matrix size. To improve the numerical efficiency of this procedure, we propose a novel multiple rank delayed update scheme. This strategy enables probability evaluation with an application of accepted moves to the matrices delayed until after a predetermined number of moves, K. The accepted events are then applied to the matrices en bloc with enhanced arithmetic intensity and computational efficiency via matrix-matrix operations instead of matrix-vector operations. This procedure does not change the underlying Monte Carlo sampling or its statistical efficiency. For calculations on large systems and algorithms such as diffusion Monte Carlo, where the acceptance ratio is high, order of magnitude improvements in the update time can be obtained on both multi-core central processing units and graphical processing units.
Delayed Slater determinant update algorithms for high efficiency quantum Monte Carlo
NASA Astrophysics Data System (ADS)
McDaniel, T.; D'Azevedo, E. F.; Li, Y. W.; Wong, K.; Kent, P. R. C.
2017-11-01
Within ab initio Quantum Monte Carlo simulations, the leading numerical cost for large systems is the computation of the values of the Slater determinants in the trial wavefunction. Each Monte Carlo step requires finding the determinant of a dense matrix. This is most commonly iteratively evaluated using a rank-1 Sherman-Morrison updating scheme to avoid repeated explicit calculation of the inverse. The overall computational cost is, therefore, formally cubic in the number of electrons or matrix size. To improve the numerical efficiency of this procedure, we propose a novel multiple rank delayed update scheme. This strategy enables probability evaluation with an application of accepted moves to the matrices delayed until after a predetermined number of moves, K. The accepted events are then applied to the matrices en bloc with enhanced arithmetic intensity and computational efficiency via matrix-matrix operations instead of matrix-vector operations. This procedure does not change the underlying Monte Carlo sampling or its statistical efficiency. For calculations on large systems and algorithms such as diffusion Monte Carlo, where the acceptance ratio is high, order of magnitude improvements in the update time can be obtained on both multi-core central processing units and graphical processing units.
Unmanned Aerial Vehicles for Alien Plant Species Detection and Monitoring
NASA Astrophysics Data System (ADS)
Dvořák, P.; Müllerová, J.; Bartaloš, T.; Brůna, J.
2015-08-01
Invasive species spread rapidly and their eradication is difficult. New methods enabling fast and efficient monitoring are urgently needed for their successful control. Remote sensing can improve early detection of invading plants and make their management more efficient and less expensive. In an ongoing project in the Czech Republic, we aim at developing innovative methods of mapping invasive plant species (semi-automatic detection algorithms) by using purposely designed unmanned aircraft (UAV). We examine possibilities for detection of two tree and two herb invasive species. Our aim is to establish fast, repeatable and efficient computer-assisted method of timely monitoring, reducing the costs of extensive field campaigns. For finding the best detection algorithm we test various classification approaches (object-, pixel-based and hybrid). Thanks to its flexibility and low cost, UAV enables assessing the effect of phenological stage and spatial resolution, and is most suitable for monitoring the efficiency of eradication efforts. However, several challenges exist in UAV application, such as geometrical and radiometric distortions, high amount of data to be processed and legal constrains for the UAV flight missions over urban areas (often highly invaded). The newly proposed UAV approach shall serve invasive species researchers, management practitioners and policy makers.
Spatio-temporal colour correction of strongly degraded movies
NASA Astrophysics Data System (ADS)
Islam, A. B. M. Tariqul; Farup, Ivar
2011-01-01
The archives of motion pictures represent an important part of precious cultural heritage. Unfortunately, these cinematography collections are vulnerable to different distortions such as colour fading which is beyond the capability of photochemical restoration process. Spatial colour algorithms-Retinex and ACE provide helpful tool in restoring strongly degraded colour films but, there are some challenges associated with these algorithms. We present an automatic colour correction technique for digital colour restoration of strongly degraded movie material. The method is based upon the existing STRESS algorithm. In order to cope with the problem of highly correlated colour channels, we implemented a preprocessing step in which saturation enhancement is performed in a PCA space. Spatial colour algorithms tend to emphasize all details in the images, including dust and scratches. Surprisingly, we found that the presence of these defects does not affect the behaviour of the colour correction algorithm. Although the STRESS algorithm is already in itself more efficient than traditional spatial colour algorithms, it is still computationally expensive. To speed it up further, we went beyond the spatial domain of the frames and extended the algorithm to the temporal domain. This way, we were able to achieve an 80 percent reduction of the computational time compared to processing every single frame individually. We performed two user experiments and found that the visual quality of the resulting frames was significantly better than with existing methods. Thus, our method outperforms the existing ones in terms of both visual quality and computational efficiency.
High performance reconciliation for continuous-variable quantum key distribution with LDPC code
NASA Astrophysics Data System (ADS)
Lin, Dakai; Huang, Duan; Huang, Peng; Peng, Jinye; Zeng, Guihua
2015-03-01
Reconciliation is a significant procedure in a continuous-variable quantum key distribution (CV-QKD) system. It is employed to extract secure secret key from the resulted string through quantum channel between two users. However, the efficiency and the speed of previous reconciliation algorithms are low. These problems limit the secure communication distance and the secure key rate of CV-QKD systems. In this paper, we proposed a high-speed reconciliation algorithm through employing a well-structured decoding scheme based on low density parity-check (LDPC) code. The complexity of the proposed algorithm is reduced obviously. By using a graphics processing unit (GPU) device, our method may reach a reconciliation speed of 25 Mb/s for a CV-QKD system, which is currently the highest level and paves the way to high-speed CV-QKD.
A multiresolution approach to iterative reconstruction algorithms in X-ray computed tomography.
De Witte, Yoni; Vlassenbroeck, Jelle; Van Hoorebeke, Luc
2010-09-01
In computed tomography, the application of iterative reconstruction methods in practical situations is impeded by their high computational demands. Especially in high resolution X-ray computed tomography, where reconstruction volumes contain a high number of volume elements (several giga voxels), this computational burden prevents their actual breakthrough. Besides the large amount of calculations, iterative algorithms require the entire volume to be kept in memory during reconstruction, which quickly becomes cumbersome for large data sets. To overcome this obstacle, we present a novel multiresolution reconstruction, which greatly reduces the required amount of memory without significantly affecting the reconstructed image quality. It is shown that, combined with an efficient implementation on a graphical processing unit, the multiresolution approach enables the application of iterative algorithms in the reconstruction of large volumes at an acceptable speed using only limited resources.
ChemTS: an efficient python library for de novo molecular generation.
Yang, Xiufeng; Zhang, Jinzhe; Yoshizoe, Kazuki; Terayama, Kei; Tsuda, Koji
2017-01-01
Automatic design of organic materials requires black-box optimization in a vast chemical space. In conventional molecular design algorithms, a molecule is built as a combination of predetermined fragments. Recently, deep neural network models such as variational autoencoders and recurrent neural networks (RNNs) are shown to be effective in de novo design of molecules without any predetermined fragments. This paper presents a novel Python library ChemTS that explores the chemical space by combining Monte Carlo tree search and an RNN. In a benchmarking problem of optimizing the octanol-water partition coefficient and synthesizability, our algorithm showed superior efficiency in finding high-scoring molecules. ChemTS is available at https://github.com/tsudalab/ChemTS.
ChemTS: an efficient python library for de novo molecular generation
NASA Astrophysics Data System (ADS)
Yang, Xiufeng; Zhang, Jinzhe; Yoshizoe, Kazuki; Terayama, Kei; Tsuda, Koji
2017-12-01
Automatic design of organic materials requires black-box optimization in a vast chemical space. In conventional molecular design algorithms, a molecule is built as a combination of predetermined fragments. Recently, deep neural network models such as variational autoencoders and recurrent neural networks (RNNs) are shown to be effective in de novo design of molecules without any predetermined fragments. This paper presents a novel Python library ChemTS that explores the chemical space by combining Monte Carlo tree search and an RNN. In a benchmarking problem of optimizing the octanol-water partition coefficient and synthesizability, our algorithm showed superior efficiency in finding high-scoring molecules. ChemTS is available at https://github.com/tsudalab/ChemTS.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wollaber, Allan Benton; Park, HyeongKae; Lowrie, Robert Byron
Moment-based acceleration via the development of “high-order, low-order” (HO-LO) algorithms has provided substantial accuracy and efficiency enhancements for solutions of the nonlinear, thermal radiative transfer equations by CCS-2 and T-3 staff members. Accuracy enhancements over traditional, linearized methods are obtained by solving a nonlinear, timeimplicit HO-LO system via a Jacobian-free Newton Krylov procedure. This also prevents the appearance of non-physical maximum principle violations (“temperature spikes”) associated with linearization. Efficiency enhancements are obtained in part by removing “effective scattering” from the linearized system. In this highlight, we summarize recent work in which we formally extended the HO-LO radiation algorithm to includemore » operator-split radiation-hydrodynamics.« less
Parallel optoelectronic trinary signed-digit division
NASA Astrophysics Data System (ADS)
Alam, Mohammad S.
1999-03-01
The trinary signed-digit (TSD) number system has been found to be very useful for parallel addition and subtraction of any arbitrary length operands in constant time. Using the TSD addition and multiplication modules as the basic building blocks, we develop an efficient algorithm for performing parallel TSD division in constant time. The proposed division technique uses one TSD subtraction and two TSD multiplication steps. An optoelectronic correlator based architecture is suggested for implementation of the proposed TSD division algorithm, which fully exploits the parallelism and high processing speed of optics. An efficient spatial encoding scheme is used to ensure better utilization of space bandwidth product of the spatial light modulators used in the optoelectronic implementation.
A new efficient method for color image compression based on visual attention mechanism
NASA Astrophysics Data System (ADS)
Shao, Xiaoguang; Gao, Kun; Lv, Lily; Ni, Guoqiang
2010-11-01
One of the key procedures in color image compression is to extract its region of interests (ROIs) and evaluate different compression ratios. A new non-uniform color image compression algorithm with high efficiency is proposed in this paper by using a biology-motivated selective attention model for the effective extraction of ROIs in natural images. When the ROIs have been extracted and labeled in the image, the subsequent work is to encode the ROIs and other regions with different compression ratios via popular JPEG algorithm. Furthermore, experiment results and quantitative and qualitative analysis in the paper show perfect performance when comparing with other traditional color image compression approaches.
A Comparison of LBG and ADPCM Speech Compression Techniques
NASA Astrophysics Data System (ADS)
Bachu, Rajesh G.; Patel, Jignasa; Barkana, Buket D.
Speech compression is the technology of converting human speech into an efficiently encoded representation that can later be decoded to produce a close approximation of the original signal. In all speech there is a degree of predictability and speech coding techniques exploit this to reduce bit rates yet still maintain a suitable level of quality. This paper is a study and implementation of Linde-Buzo-Gray Algorithm (LBG) and Adaptive Differential Pulse Code Modulation (ADPCM) algorithms to compress speech signals. In here we implemented the methods using MATLAB 7.0. The methods we used in this study gave good results and performance in compressing the speech and listening tests showed that efficient and high quality coding is achieved.
Discrete Biogeography Based Optimization for Feature Selection in Molecular Signatures.
Liu, Bo; Tian, Meihong; Zhang, Chunhua; Li, Xiangtao
2015-04-01
Biomarker discovery from high-dimensional data is a complex task in the development of efficient cancer diagnoses and classification. However, these data are usually redundant and noisy, and only a subset of them present distinct profiles for different classes of samples. Thus, selecting high discriminative genes from gene expression data has become increasingly interesting in the field of bioinformatics. In this paper, a discrete biogeography based optimization is proposed to select the good subset of informative gene relevant to the classification. In the proposed algorithm, firstly, the fisher-markov selector is used to choose fixed number of gene data. Secondly, to make biogeography based optimization suitable for the feature selection problem; discrete migration model and discrete mutation model are proposed to balance the exploration and exploitation ability. Then, discrete biogeography based optimization, as we called DBBO, is proposed by integrating discrete migration model and discrete mutation model. Finally, the DBBO method is used for feature selection, and three classifiers are used as the classifier with the 10 fold cross-validation method. In order to show the effective and efficiency of the algorithm, the proposed algorithm is tested on four breast cancer dataset benchmarks. Comparison with genetic algorithm, particle swarm optimization, differential evolution algorithm and hybrid biogeography based optimization, experimental results demonstrate that the proposed method is better or at least comparable with previous method from literature when considering the quality of the solutions obtained. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
NASA Astrophysics Data System (ADS)
Zackay, Barak; Ofek, Eran O.
2017-01-01
Astronomical radio signals are subjected to phase dispersion while traveling through the interstellar medium. To optimally detect a short-duration signal within a frequency band, we have to precisely compensate for the unknown pulse dispersion, which is a computationally demanding task. We present the “fast dispersion measure transform” algorithm for optimal detection of such signals. Our algorithm has a low theoretical complexity of 2{N}f{N}t+{N}t{N}{{Δ }}{{log}}2({N}f), where Nf, Nt, and NΔ are the numbers of frequency bins, time bins, and dispersion measure bins, respectively. Unlike previously suggested fast algorithms, our algorithm conserves the sensitivity of brute-force dedispersion. Our tests indicate that this algorithm, running on a standard desktop computer and implemented in a high-level programming language, is already faster than the state-of-the-art dedispersion codes running on graphical processing units (GPUs). We also present a variant of the algorithm that can be efficiently implemented on GPUs. The latter algorithm’s computation and data-transport requirements are similar to those of a two-dimensional fast Fourier transform, indicating that incoherent dedispersion can now be considered a nonissue while planning future surveys. We further present a fast algorithm for sensitive detection of pulses shorter than the dispersive smearing limits of incoherent dedispersion. In typical cases, this algorithm is orders of magnitude faster than enumerating dispersion measures and coherently dedispersing by convolution. We analyze the computational complexity of pulsed signal searches by radio interferometers. We conclude that, using our suggested algorithms, maximally sensitive blind searches for dispersed pulses are feasible using existing facilities. We provide an implementation of these algorithms in Python and MATLAB.
Fast prediction of RNA-RNA interaction using heuristic algorithm.
Montaseri, Soheila
2015-01-01
Interaction between two RNA molecules plays a crucial role in many medical and biological processes such as gene expression regulation. In this process, an RNA molecule prohibits the translation of another RNA molecule by establishing stable interactions with it. Some algorithms have been formed to predict the structure of the RNA-RNA interaction. High computational time is a common challenge in most of the presented algorithms. In this context, a heuristic method is introduced to accurately predict the interaction between two RNAs based on minimum free energy (MFE). This algorithm uses a few dot matrices for finding the secondary structure of each RNA and binding sites between two RNAs. Furthermore, a parallel version of this method is presented. We describe the algorithm's concurrency and parallelism for a multicore chip. The proposed algorithm has been performed on some datasets including CopA-CopT, R1inv-R2inv, Tar-Tar*, DIS-DIS, and IncRNA54-RepZ in Escherichia coli bacteria. The method has high validity and efficiency, and it is run in low computational time in comparison to other approaches.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chen, Guangye; Chacon, Luis; Barnes, Daniel C
2012-01-01
Recently, a fully implicit, energy- and charge-conserving particle-in-cell method has been developed for multi-scale, full-f kinetic simulations [G. Chen, et al., J. Comput. Phys. 230, 18 (2011)]. The method employs a Jacobian-free Newton-Krylov (JFNK) solver and is capable of using very large timesteps without loss of numerical stability or accuracy. A fundamental feature of the method is the segregation of particle orbit integrations from the field solver, while remaining fully self-consistent. This provides great flexibility, and dramatically improves the solver efficiency by reducing the degrees of freedom of the associated nonlinear system. However, it requires a particle push per nonlinearmore » residual evaluation, which makes the particle push the most time-consuming operation in the algorithm. This paper describes a very efficient mixed-precision, hybrid CPU-GPU implementation of the implicit PIC algorithm. The JFNK solver is kept on the CPU (in double precision), while the inherent data parallelism of the particle mover is exploited by implementing it in single-precision on a graphics processing unit (GPU) using CUDA. Performance-oriented optimizations, with the aid of an analytical performance model, the roofline model, are employed. Despite being highly dynamic, the adaptive, charge-conserving particle mover algorithm achieves up to 300 400 GOp/s (including single-precision floating-point, integer, and logic operations) on a Nvidia GeForce GTX580, corresponding to 20 25% absolute GPU efficiency (against the peak theoretical performance) and 50-70% intrinsic efficiency (against the algorithm s maximum operational throughput, which neglects all latencies). This is about 200-300 times faster than an equivalent serial CPU implementation. When the single-precision GPU particle mover is combined with a double-precision CPU JFNK field solver, overall performance gains 100 vs. the double-precision CPU-only serial version are obtained, with no apparent loss of robustness or accuracy when applied to a challenging long-time scale ion acoustic wave simulation.« less
Improved Ant Colony Clustering Algorithm and Its Performance Study
Gao, Wei
2016-01-01
Clustering analysis is used in many disciplines and applications; it is an important tool that descriptively identifies homogeneous groups of objects based on attribute values. The ant colony clustering algorithm is a swarm-intelligent method used for clustering problems that is inspired by the behavior of ant colonies that cluster their corpses and sort their larvae. A new abstraction ant colony clustering algorithm using a data combination mechanism is proposed to improve the computational efficiency and accuracy of the ant colony clustering algorithm. The abstraction ant colony clustering algorithm is used to cluster benchmark problems, and its performance is compared with the ant colony clustering algorithm and other methods used in existing literature. Based on similar computational difficulties and complexities, the results show that the abstraction ant colony clustering algorithm produces results that are not only more accurate but also more efficiently determined than the ant colony clustering algorithm and the other methods. Thus, the abstraction ant colony clustering algorithm can be used for efficient multivariate data clustering. PMID:26839533
A low-power and high-quality implementation of the discrete cosine transformation
NASA Astrophysics Data System (ADS)
Heyne, B.; Götze, J.
2007-06-01
In this paper a computationally efficient and high-quality preserving DCT architecture is presented. It is obtained by optimizing the Loeffler DCT based on the Cordic algorithm. The computational complexity is reduced from 11 multiply and 29 add operations (Loeffler DCT) to 38 add and 16 shift operations (which is similar to the complexity of the binDCT). The experimental results show that the proposed DCT algorithm not only reduces the computational complexity significantly, but also retains the good transformation quality of the Loeffler DCT. Therefore, the proposed Cordic based Loeffler DCT is especially suited for low-power and high-quality CODECs in battery-based systems.
Parallel Multi-Step/Multi-Rate Integration of Two-Time Scale Dynamic Systems
NASA Technical Reports Server (NTRS)
Chang, Johnny T.; Ploen, Scott R.; Sohl, Garett. A,; Martin, Bryan J.
2004-01-01
Increasing demands on the fidelity of simulations for real-time and high-fidelity simulations are stressing the capacity of modern processors. New integration techniques are required that provide maximum efficiency for systems that are parallelizable. However many current techniques make assumptions that are at odds with non-cascadable systems. A new serial multi-step/multi-rate integration algorithm for dual-timescale continuous state systems is presented which applies to these systems, and is extended to a parallel multi-step/multi-rate algorithm. The superior performance of both algorithms is demonstrated through a representative example.
Application of Biased Metropolis Algorithms: From protons to proteins
Bazavov, Alexei; Berg, Bernd A.; Zhou, Huan-Xiang
2015-01-01
We show that sampling with a biased Metropolis scheme is essentially equivalent to using the heatbath algorithm. However, the biased Metropolis method can also be applied when an efficient heatbath algorithm does not exist. This is first illustrated with an example from high energy physics (lattice gauge theory simulations). We then illustrate the Rugged Metropolis method, which is based on a similar biased updating scheme, but aims at very different applications. The goal of such applications is to locate the most likely configurations in a rugged free energy landscape, which is most relevant for simulations of biomolecules. PMID:26612967
Azad, Ariful; Buluç, Aydın
2016-05-16
We describe parallel algorithms for computing maximal cardinality matching in a bipartite graph on distributed-memory systems. Unlike traditional algorithms that match one vertex at a time, our algorithms process many unmatched vertices simultaneously using a matrix-algebraic formulation of maximal matching. This generic matrix-algebraic framework is used to develop three efficient maximal matching algorithms with minimal changes. The newly developed algorithms have two benefits over existing graph-based algorithms. First, unlike existing parallel algorithms, cardinality of matching obtained by the new algorithms stays constant with increasing processor counts, which is important for predictable and reproducible performance. Second, relying on bulk-synchronous matrix operations,more » these algorithms expose a higher degree of parallelism on distributed-memory platforms than existing graph-based algorithms. We report high-performance implementations of three maximal matching algorithms using hybrid OpenMP-MPI and evaluate the performance of these algorithm using more than 35 real and randomly generated graphs. On real instances, our algorithms achieve up to 200 × speedup on 2048 cores of a Cray XC30 supercomputer. Even higher speedups are obtained on larger synthetically generated graphs where our algorithms show good scaling on up to 16,384 cores.« less
NASA Technical Reports Server (NTRS)
Hedgley, D. R.
1978-01-01
An efficient algorithm for selecting the degree of a polynomial that defines a curve that best approximates a data set was presented. This algorithm was applied to both oscillatory and nonoscillatory data without loss of generality.
Using High-Dimensional Image Models to Perform Highly Undetectable Steganography
NASA Astrophysics Data System (ADS)
Pevný, Tomáš; Filler, Tomáš; Bas, Patrick
This paper presents a complete methodology for designing practical and highly-undetectable stegosystems for real digital media. The main design principle is to minimize a suitably-defined distortion by means of efficient coding algorithm. The distortion is defined as a weighted difference of extended state-of-the-art feature vectors already used in steganalysis. This allows us to "preserve" the model used by steganalyst and thus be undetectable even for large payloads. This framework can be efficiently implemented even when the dimensionality of the feature set used by the embedder is larger than 107. The high dimensional model is necessary to avoid known security weaknesses. Although high-dimensional models might be problem in steganalysis, we explain, why they are acceptable in steganography. As an example, we introduce HUGO, a new embedding algorithm for spatial-domain digital images and we contrast its performance with LSB matching. On the BOWS2 image database and in contrast with LSB matching, HUGO allows the embedder to hide 7× longer message with the same level of security level.
Concepts and design of the CMS high granularity calorimeter Level-1 trigger
NASA Astrophysics Data System (ADS)
Sauvan, Jean-Baptiste; CMS Collaboration
2017-11-01
The CMS experiment has chosen a novel high granularity calorimeter for the forward region as part of its planned upgrade for the high luminosity LHC. The calorimeter will have a fine segmentation in both the transverse and longitudinal directions and will be the first such calorimeter specifically optimised for particle flow reconstruction to operate at a colliding beam experiment. The high granularity results in around six million readout channels in total and so presents a significant challenge in terms of data manipulation and processing for the trigger; the trigger data volumes will be an order of magnitude above those currently handled at CMS. In addition, the high luminosity will result in an average of 140 to 200 interactions per bunch crossing, giving a huge background rate in the forward region that needs to be efficiently reduced by the trigger algorithms. Efficient data reduction and reconstruction algorithms making use of the fine segmentation of the detector have been simulated and evaluated. They provide an increase of the trigger rates with the luminosity significantly smaller than would be expected with the current trigger system.
Research on virtual network load balancing based on OpenFlow
NASA Astrophysics Data System (ADS)
Peng, Rong; Ding, Lei
2017-08-01
The Network based on OpenFlow technology separate the control module and data forwarding module. Global deployment of load balancing strategy through network view of control plane is fast and of high efficiency. This paper proposes a Weighted Round-Robin Scheduling algorithm for virtual network and a load balancing plan for server load based on OpenFlow. Load of service nodes and load balancing tasks distribution algorithm will be taken into account.
A Semi-supervised Heat Kernel Pagerank MBO Algorithm for Data Classification
2016-07-01
financial predictions, etc. and is finding growing use in text mining studies. In this paper, we present an efficient algorithm for classification of high...video data, set of images, hyperspectral data, medical data, text data, etc. Moreover, the framework provides a way to analyze data whose different...also be incorporated. For text classification, one can use tfidf (term frequency inverse document frequency) to form feature vectors for each document
Graph Based Models for Unsupervised High Dimensional Data Clustering and Network Analysis
2015-01-01
ApprovedOMB No. 0704-0188 Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for...algorithms we proposed improve the time e ciency signi cantly for large scale datasets. In the last chapter, we also propose an incremental reseeding...plume detection in hyper-spectral video data. These graph based clustering algorithms we proposed improve the time efficiency significantly for large
Efficient architecture for spike sorting in reconfigurable hardware.
Hwang, Wen-Jyi; Lee, Wei-Hao; Lin, Shiow-Jyu; Lai, Sheng-Ying
2013-11-01
This paper presents a novel hardware architecture for fast spike sorting. The architecture is able to perform both the feature extraction and clustering in hardware. The generalized Hebbian algorithm (GHA) and fuzzy C-means (FCM) algorithm are used for feature extraction and clustering, respectively. The employment of GHA allows efficient computation of principal components for subsequent clustering operations. The FCM is able to achieve near optimal clustering for spike sorting. Its performance is insensitive to the selection of initial cluster centers. The hardware implementations of GHA and FCM feature low area costs and high throughput. In the GHA architecture, the computation of different weight vectors share the same circuit for lowering the area costs. Moreover, in the FCM hardware implementation, the usual iterative operations for updating the membership matrix and cluster centroid are merged into one single updating process to evade the large storage requirement. To show the effectiveness of the circuit, the proposed architecture is physically implemented by field programmable gate array (FPGA). It is embedded in a System-on-Chip (SOC) platform for performance measurement. Experimental results show that the proposed architecture is an efficient spike sorting design for attaining high classification correct rate and high speed computation.
High-efficiency wavefunction updates for large scale Quantum Monte Carlo
NASA Astrophysics Data System (ADS)
Kent, Paul; McDaniel, Tyler; Li, Ying Wai; D'Azevedo, Ed
Within ab intio Quantum Monte Carlo (QMC) simulations, the leading numerical cost for large systems is the computation of the values of the Slater determinants in the trial wavefunctions. The evaluation of each Monte Carlo move requires finding the determinant of a dense matrix, which is traditionally iteratively evaluated using a rank-1 Sherman-Morrison updating scheme to avoid repeated explicit calculation of the inverse. For calculations with thousands of electrons, this operation dominates the execution profile. We propose a novel rank- k delayed update scheme. This strategy enables probability evaluation for multiple successive Monte Carlo moves, with application of accepted moves to the matrices delayed until after a predetermined number of moves, k. Accepted events grouped in this manner are then applied to the matrices en bloc with enhanced arithmetic intensity and computational efficiency. This procedure does not change the underlying Monte Carlo sampling or the sampling efficiency. For large systems and algorithms such as diffusion Monte Carlo where the acceptance ratio is high, order of magnitude speedups can be obtained on both multi-core CPU and on GPUs, making this algorithm highly advantageous for current petascale and future exascale computations.
NASA Astrophysics Data System (ADS)
Eric, L.; Vrugt, J. A.
2010-12-01
Spatially distributed hydrologic models potentially contain hundreds of parameters that need to be derived by calibration against a historical record of input-output data. The quality of this calibration strongly determines the predictive capability of the model and thus its usefulness for science-based decision making and forecasting. Unfortunately, high-dimensional optimization problems are typically difficult to solve. Here we present our recent developments to the Differential Evolution Adaptive Metropolis (DREAM) algorithm (Vrugt et al., 2009) to warrant efficient solution of high-dimensional parameter estimation problems. The algorithm samples from an archive of past states (Ter Braak and Vrugt, 2008), and uses multiple-try Metropolis sampling (Liu et al., 2000) to decrease the required burn-in time for each individual chain and increase efficiency of posterior sampling. This approach is hereafter referred to as MT-DREAM. We present results for 2 synthetic mathematical case studies, and 2 real-world examples involving from 10 to 240 parameters. Results for those cases show that our multiple-try sampler, MT-DREAM, can consistently find better solutions than other Bayesian MCMC methods. Moreover, MT-DREAM is admirably suited to be implemented and ran on a parallel machine and is therefore a powerful method for posterior inference.
Bloom, Guillaume; Larat, Christian; Lallier, Eric; Lee-Bouhours, Mane-Si Laure; Loiseaux, Brigitte; Huignard, Jean-Pierre
2011-02-10
We have designed a high-efficiency array generator composed of subwavelength grooves etched in a GaAs substrate for operation at 4.5 μm. The method used combines rigorous coupled wave analysis with an optimization algorithm. The optimized beam splitter has both a high efficiency (∼96%) and a good intensity uniformity (∼0.2%). The fabrication error tolerances are numerically calculated, and it is shown that this subwavelength array generator could be fabricated with current electron beam writers and inductively coupled plasma etching. Finally, we studied the effect of a simple and realistic antireflection coating on the performance of the beam splitter.
Efficient Fingercode Classification
NASA Astrophysics Data System (ADS)
Sun, Hong-Wei; Law, Kwok-Yan; Gollmann, Dieter; Chung, Siu-Leung; Li, Jian-Bin; Sun, Jia-Guang
In this paper, we present an efficient fingerprint classification algorithm which is an essential component in many critical security application systems e. g. systems in the e-government and e-finance domains. Fingerprint identification is one of the most important security requirements in homeland security systems such as personnel screening and anti-money laundering. The problem of fingerprint identification involves searching (matching) the fingerprint of a person against each of the fingerprints of all registered persons. To enhance performance and reliability, a common approach is to reduce the search space by firstly classifying the fingerprints and then performing the search in the respective class. Jain et al. proposed a fingerprint classification algorithm based on a two-stage classifier, which uses a K-nearest neighbor classifier in its first stage. The fingerprint classification algorithm is based on the fingercode representation which is an encoding of fingerprints that has been demonstrated to be an effective fingerprint biometric scheme because of its ability to capture both local and global details in a fingerprint image. We enhance this approach by improving the efficiency of the K-nearest neighbor classifier for fingercode-based fingerprint classification. Our research firstly investigates the various fast search algorithms in vector quantization (VQ) and the potential application in fingerprint classification, and then proposes two efficient algorithms based on the pyramid-based search algorithms in VQ. Experimental results on DB1 of FVC 2004 demonstrate that our algorithms can outperform the full search algorithm and the original pyramid-based search algorithms in terms of computational efficiency without sacrificing accuracy.
Two efficient label-equivalence-based connected-component labeling algorithms for 3-D binary images.
He, Lifeng; Chao, Yuyan; Suzuki, Kenji
2011-08-01
Whenever one wants to distinguish, recognize, and/or measure objects (connected components) in binary images, labeling is required. This paper presents two efficient label-equivalence-based connected-component labeling algorithms for 3-D binary images. One is voxel based and the other is run based. For the voxel-based one, we present an efficient method of deciding the order for checking voxels in the mask. For the run-based one, instead of assigning each foreground voxel, we assign each run a provisional label. Moreover, we use run data to label foreground voxels without scanning any background voxel in the second scan. Experimental results have demonstrated that our voxel-based algorithm is efficient for 3-D binary images with complicated connected components, that our run-based one is efficient for those with simple connected components, and that both are much more efficient than conventional 3-D labeling algorithms.
NASA Astrophysics Data System (ADS)
Zhao, Jijun; Zhang, Nawa; Ren, Danping; Hu, Jinhua
2017-12-01
The recently proposed flexible optical network can provide more efficient accommodation of multiple data rates than the current wavelength-routed optical networks. Meanwhile, the energy efficiency has also been a hot topic because of the serious energy consumption problem. In this paper, the energy efficiency problem of flexible optical networks with physical-layer impairments constraint is studied. We propose a combined impairment-aware and energy-efficient routing and spectrum assignment (RSA) algorithm based on the link availability, in which the impact of power consumption minimization on signal quality is considered. By applying the proposed algorithm, the connection requests are established on a subset of network topology, reducing the number of transitions from sleep to active state. The simulation results demonstrate that our proposed algorithm can improve the energy efficiency and spectrum resources utilization with the acceptable blocking probability and average delay.
An Optimization Study of Hot Stamping Operation
NASA Astrophysics Data System (ADS)
Ghoo, Bonyoung; Umezu, Yasuyoshi; Watanabe, Yuko; Ma, Ninshu; Averill, Ron
2010-06-01
In the present study, 3-dimensional finite element analyses for hot-stamping processes of Audi B-pillar product are conducted using JSTAMP/NV and HEEDS. Special attention is paid to the optimization of simulation technology coupling with thermal-mechanical formulations. Numerical simulation based on FEM technology and optimization design using the hybrid adaptive SHERPA algorithm are applied to hot stamping operation to improve productivity. The robustness of the SHERPA algorithm is found through the results of the benchmark example. The SHERPA algorithm is shown to be far superior to the GA (Genetic Algorithm) in terms of efficiency, whose calculation time is about 7 times faster than that of the GA. The SHERPA algorithm could show high performance in a large scale problem having complicated design space and long calculation time.
An AK-LDMeans algorithm based on image clustering
NASA Astrophysics Data System (ADS)
Chen, Huimin; Li, Xingwei; Zhang, Yongbin; Chen, Nan
2018-03-01
Clustering is an effective analytical technique for handling unmarked data for value mining. Its ultimate goal is to mark unclassified data quickly and correctly. We use the roadmap for the current image processing as the experimental background. In this paper, we propose an AK-LDMeans algorithm to automatically lock the K value by designing the Kcost fold line, and then use the long-distance high-density method to select the clustering centers to further replace the traditional initial clustering center selection method, which further improves the efficiency and accuracy of the traditional K-Means Algorithm. And the experimental results are compared with the current clustering algorithm and the results are obtained. The algorithm can provide effective reference value in the fields of image processing, machine vision and data mining.
A new algorithm for agile satellite-based acquisition operations
NASA Astrophysics Data System (ADS)
Bunkheila, Federico; Ortore, Emiliano; Circi, Christian
2016-06-01
Taking advantage of the high manoeuvrability and the accurate pointing of the so-called agile satellites, an algorithm which allows efficient management of the operations concerning optical acquisitions is described. Fundamentally, this algorithm can be subdivided into two parts: in the first one the algorithm operates a geometric classification of the areas of interest and a partitioning of these areas into stripes which develop along the optimal scan directions; in the second one it computes the succession of the time windows in which the acquisition operations of the areas of interest are feasible, taking into consideration the potential restrictions associated with these operations and with the geometric and stereoscopic constraints. The results and the performances of the proposed algorithm have been determined and discussed considering the case of the Periodic Sun-Synchronous Orbits.
NASA Technical Reports Server (NTRS)
Jain, A.; Man, G. K.
1993-01-01
This paper describes the Dynamics Algorithms for Real-Time Simulation (DARTS) real-time hardware-in-the-loop dynamics simulator for the National Aeronautics and Space Administration's Cassini spacecraft. The spacecraft model consists of a central flexible body with a number of articulated rigid-body appendages. The demanding performance requirements from the spacecraft control system require the use of a high fidelity simulator for control system design and testing. The DARTS algorithm provides a new algorithmic and hardware approach to the solution of this hardware-in-the-loop simulation problem. It is based upon the efficient spatial algebra dynamics for flexible multibody systems. A parallel and vectorized version of this algorithm is implemented on a low-cost, multiprocessor computer to meet the simulation timing requirements.
Efficient data communication protocols for wireless networks
NASA Astrophysics Data System (ADS)
Zeydan, Engin
In this dissertation, efficient decentralized algorithms are investigated for cost minimization problems in wireless networks. For wireless sensor networks, we investigate both the reduction in the energy consumption and throughput maximization problems separately using multi-hop data aggregation for correlated data in wireless sensor networks. The proposed algorithms exploit data redundancy using a game theoretic framework. For energy minimization, routes are chosen to minimize the total energy expended by the network using best response dynamics to local data. The cost function used in routing takes into account distance, interference and in-network data aggregation. The proposed energy-efficient correlation-aware routing algorithm significantly reduces the energy consumption in the network and converges in a finite number of steps iteratively. For throughput maximization, we consider both the interference distribution across the network and correlation between forwarded data when establishing routes. Nodes along each route are chosen to minimize the interference impact in their neighborhood and to maximize the in-network data aggregation. The resulting network topology maximizes the global network throughput and the algorithm is guaranteed to converge with a finite number of steps using best response dynamics. For multiple antenna wireless ad-hoc networks, we present distributed cooperative and regret-matching based learning schemes for joint transmit beanformer and power level selection problem for nodes operating in multi-user interference environment. Total network transmit power is minimized while ensuring a constant received signal-to-interference and noise ratio at each receiver. In cooperative and regret-matching based power minimization algorithms, transmit beanformers are selected from a predefined codebook to minimize the total power. By selecting transmit beamformers judiciously and performing power adaptation, the cooperative algorithm is shown to converge to pure strategy Nash equilibrium with high probability throughout the iterations in the interference impaired network. On the other hand, the regret-matching learning algorithm is noncooperative and requires minimum amount of overhead. The proposed cooperative and regret-matching based distributed algorithms are also compared with centralized solutions through simulation results.
Score-Level Fusion of Phase-Based and Feature-Based Fingerprint Matching Algorithms
NASA Astrophysics Data System (ADS)
Ito, Koichi; Morita, Ayumi; Aoki, Takafumi; Nakajima, Hiroshi; Kobayashi, Koji; Higuchi, Tatsuo
This paper proposes an efficient fingerprint recognition algorithm combining phase-based image matching and feature-based matching. In our previous work, we have already proposed an efficient fingerprint recognition algorithm using Phase-Only Correlation (POC), and developed commercial fingerprint verification units for access control applications. The use of Fourier phase information of fingerprint images makes it possible to achieve robust recognition for weakly impressed, low-quality fingerprint images. This paper presents an idea of improving the performance of POC-based fingerprint matching by combining it with feature-based matching, where feature-based matching is introduced in order to improve recognition efficiency for images with nonlinear distortion. Experimental evaluation using two different types of fingerprint image databases demonstrates efficient recognition performance of the combination of the POC-based algorithm and the feature-based algorithm.
Use of parallel computing in mass processing of laser data
NASA Astrophysics Data System (ADS)
Będkowski, J.; Bratuś, R.; Prochaska, M.; Rzonca, A.
2015-12-01
The first part of the paper includes a description of the rules used to generate the algorithm needed for the purpose of parallel computing and also discusses the origins of the idea of research on the use of graphics processors in large scale processing of laser scanning data. The next part of the paper includes the results of an efficiency assessment performed for an array of different processing options, all of which were substantially accelerated with parallel computing. The processing options were divided into the generation of orthophotos using point clouds, coloring of point clouds, transformations, and the generation of a regular grid, as well as advanced processes such as the detection of planes and edges, point cloud classification, and the analysis of data for the purpose of quality control. Most algorithms had to be formulated from scratch in the context of the requirements of parallel computing. A few of the algorithms were based on existing technology developed by the Dephos Software Company and then adapted to parallel computing in the course of this research study. Processing time was determined for each process employed for a typical quantity of data processed, which helped confirm the high efficiency of the solutions proposed and the applicability of parallel computing to the processing of laser scanning data. The high efficiency of parallel computing yields new opportunities in the creation and organization of processing methods for laser scanning data.
A Metascalable Computing Framework for Large Spatiotemporal-Scale Atomistic Simulations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nomura, K; Seymour, R; Wang, W
2009-02-17
A metascalable (or 'design once, scale on new architectures') parallel computing framework has been developed for large spatiotemporal-scale atomistic simulations of materials based on spatiotemporal data locality principles, which is expected to scale on emerging multipetaflops architectures. The framework consists of: (1) an embedded divide-and-conquer (EDC) algorithmic framework based on spatial locality to design linear-scaling algorithms for high complexity problems; (2) a space-time-ensemble parallel (STEP) approach based on temporal locality to predict long-time dynamics, while introducing multiple parallelization axes; and (3) a tunable hierarchical cellular decomposition (HCD) parallelization framework to map these O(N) algorithms onto a multicore cluster based onmore » hybrid implementation combining message passing and critical section-free multithreading. The EDC-STEP-HCD framework exposes maximal concurrency and data locality, thereby achieving: (1) inter-node parallel efficiency well over 0.95 for 218 billion-atom molecular-dynamics and 1.68 trillion electronic-degrees-of-freedom quantum-mechanical simulations on 212,992 IBM BlueGene/L processors (superscalability); (2) high intra-node, multithreading parallel efficiency (nanoscalability); and (3) nearly perfect time/ensemble parallel efficiency (eon-scalability). The spatiotemporal scale covered by MD simulation on a sustained petaflops computer per day (i.e. petaflops {center_dot} day of computing) is estimated as NT = 2.14 (e.g. N = 2.14 million atoms for T = 1 microseconds).« less
Optimization research of railway passenger transfer scheme based on ant colony algorithm
NASA Astrophysics Data System (ADS)
Ni, Xiang
2018-05-01
The optimization research of railway passenger transfer scheme can provide strong support for railway passenger transport system, and its essence is path search. This paper realized the calculation of passenger transfer scheme for high speed railway when giving the time and stations of departure and arrival. The specific method that used were generating a passenger transfer service network of high-speed railway, establishing optimization model and searching by Ant Colony Algorithm. Finally, making analysis on the scheme from LanZhouxi to BeiJingXi which were based on high-speed railway network of China in 2017. The results showed that the transfer network and model had relatively high practical value and operation efficiency.
High-Performance Integrated Virtual Environment (HIVE) Tools and Applications for Big Data Analysis.
Simonyan, Vahan; Mazumder, Raja
2014-09-30
The High-performance Integrated Virtual Environment (HIVE) is a high-throughput cloud-based infrastructure developed for the storage and analysis of genomic and associated biological data. HIVE consists of a web-accessible interface for authorized users to deposit, retrieve, share, annotate, compute and visualize Next-generation Sequencing (NGS) data in a scalable and highly efficient fashion. The platform contains a distributed storage library and a distributed computational powerhouse linked seamlessly. Resources available through the interface include algorithms, tools and applications developed exclusively for the HIVE platform, as well as commonly used external tools adapted to operate within the parallel architecture of the system. HIVE is composed of a flexible infrastructure, which allows for simple implementation of new algorithms and tools. Currently, available HIVE tools include sequence alignment and nucleotide variation profiling tools, metagenomic analyzers, phylogenetic tree-building tools using NGS data, clone discovery algorithms, and recombination analysis algorithms. In addition to tools, HIVE also provides knowledgebases that can be used in conjunction with the tools for NGS sequence and metadata analysis.
High-Performance Integrated Virtual Environment (HIVE) Tools and Applications for Big Data Analysis
Simonyan, Vahan; Mazumder, Raja
2014-01-01
The High-performance Integrated Virtual Environment (HIVE) is a high-throughput cloud-based infrastructure developed for the storage and analysis of genomic and associated biological data. HIVE consists of a web-accessible interface for authorized users to deposit, retrieve, share, annotate, compute and visualize Next-generation Sequencing (NGS) data in a scalable and highly efficient fashion. The platform contains a distributed storage library and a distributed computational powerhouse linked seamlessly. Resources available through the interface include algorithms, tools and applications developed exclusively for the HIVE platform, as well as commonly used external tools adapted to operate within the parallel architecture of the system. HIVE is composed of a flexible infrastructure, which allows for simple implementation of new algorithms and tools. Currently, available HIVE tools include sequence alignment and nucleotide variation profiling tools, metagenomic analyzers, phylogenetic tree-building tools using NGS data, clone discovery algorithms, and recombination analysis algorithms. In addition to tools, HIVE also provides knowledgebases that can be used in conjunction with the tools for NGS sequence and metadata analysis. PMID:25271953
A Methodology for the Hybridization Based in Active Components: The Case of cGA and Scatter Search
Alba, Enrique; Leguizamón, Guillermo
2016-01-01
This work presents the results of a new methodology for hybridizing metaheuristics. By first locating the active components (parts) of one algorithm and then inserting them into second one, we can build efficient and accurate optimization, search, and learning algorithms. This gives a concrete way of constructing new techniques that contrasts the spread ad hoc way of hybridizing. In this paper, the enhanced algorithm is a Cellular Genetic Algorithm (cGA) which has been successfully used in the past to find solutions to such hard optimization problems. In order to extend and corroborate the use of active components as an emerging hybridization methodology, we propose here the use of active components taken from Scatter Search (SS) to improve cGA. The results obtained over a varied set of benchmarks are highly satisfactory in efficacy and efficiency when compared with a standard cGA. Moreover, the proposed hybrid approach (i.e., cGA+SS) has shown encouraging results with regard to earlier applications of our methodology. PMID:27403153
Orthogonalizing EM: A design-based least squares algorithm
Xiong, Shifeng; Dai, Bin; Huling, Jared; Qian, Peter Z. G.
2016-01-01
We introduce an efficient iterative algorithm, intended for various least squares problems, based on a design of experiments perspective. The algorithm, called orthogonalizing EM (OEM), works for ordinary least squares and can be easily extended to penalized least squares. The main idea of the procedure is to orthogonalize a design matrix by adding new rows and then solve the original problem by embedding the augmented design in a missing data framework. We establish several attractive theoretical properties concerning OEM. For the ordinary least squares with a singular regression matrix, an OEM sequence converges to the Moore-Penrose generalized inverse-based least squares estimator. For ordinary and penalized least squares with various penalties, it converges to a point having grouping coherence for fully aliased regression matrices. Convergence and the convergence rate of the algorithm are examined. Finally, we demonstrate that OEM is highly efficient for large-scale least squares and penalized least squares problems, and is considerably faster than competing methods when n is much larger than p. Supplementary materials for this article are available online. PMID:27499558
Efficient high density train operations
Gordon, Susanna P.; Evans, John A.
2001-01-01
The present invention provides methods for preventing low train voltages and managing interference, thereby improving the efficiency, reliability, and passenger comfort associated with commuter trains. An algorithm implementing neural network technology is used to predict low voltages before they occur. Once voltages are predicted, then multiple trains can be controlled to prevent low voltage events. Further, algorithms for managing inference are presented in the present invention. Different types of interference problems are addressed in the present invention such as "Interference. During Acceleration", "Interference Near Station Stops", and "Interference During Delay Recovery." Managing such interference avoids unnecessary brake/acceleration cycles during acceleration, immediately before station stops, and after substantial delays. Algorithms are demonstrated to avoid oscillatory brake/acceleration cycles due to interference and to smooth the trajectories of closely following trains. This is achieved by maintaining sufficient following distances to avoid unnecessary braking/accelerating. These methods generate smooth train trajectories, making for a more comfortable ride, and improve train motor reliability by avoiding unnecessary mode-changes between propulsion and braking. These algorithms can also have a favorable impact on traction power system requirements and energy consumption.
Controlling Tensegrity Robots Through Evolution
NASA Technical Reports Server (NTRS)
Iscen, Atil; Agogino, Adrian; SunSpiral, Vytas; Tumer, Kagan
2013-01-01
Tensegrity structures (built from interconnected rods and cables) have the potential to offer a revolutionary new robotic design that is light-weight, energy-efficient, robust to failures, capable of unique modes of locomotion, impact tolerant, and compliant (reducing damage between the robot and its environment). Unfortunately robots built from tensegrity structures are difficult to control with traditional methods due to their oscillatory nature, nonlinear coupling between components and overall complexity. Fortunately this formidable control challenge can be overcome through the use of evolutionary algorithms. In this paper we show that evolutionary algorithms can be used to efficiently control a ball-shaped tensegrity robot. Experimental results performed with a variety of evolutionary algorithms in a detailed soft-body physics simulator show that a centralized evolutionary algorithm performs 400 percent better than a hand-coded solution, while the multi-agent evolution performs 800 percent better. In addition, evolution is able to discover diverse control solutions (both crawling and rolling) that are robust against structural failures and can be adapted to a wide range of energy and actuation constraints. These successful controls will form the basis for building high-performance tensegrity robots in the near future.
Zeng, Xueqiang; Luo, Gang
2017-12-01
Machine learning is broadly used for clinical data analysis. Before training a model, a machine learning algorithm must be selected. Also, the values of one or more model parameters termed hyper-parameters must be set. Selecting algorithms and hyper-parameter values requires advanced machine learning knowledge and many labor-intensive manual iterations. To lower the bar to machine learning, miscellaneous automatic selection methods for algorithms and/or hyper-parameter values have been proposed. Existing automatic selection methods are inefficient on large data sets. This poses a challenge for using machine learning in the clinical big data era. To address the challenge, this paper presents progressive sampling-based Bayesian optimization, an efficient and automatic selection method for both algorithms and hyper-parameter values. We report an implementation of the method. We show that compared to a state of the art automatic selection method, our method can significantly reduce search time, classification error rate, and standard deviation of error rate due to randomization. This is major progress towards enabling fast turnaround in identifying high-quality solutions required by many machine learning-based clinical data analysis tasks.
Dong, Feihong; Li, Hongjun; Gong, Xiangwu; Liu, Quan; Wang, Jingchao
2015-01-01
A typical application scenario of remote wireless sensor networks (WSNs) is identified as an emergency scenario. One of the greatest design challenges for communications in emergency scenarios is energy-efficient transmission, due to scarce electrical energy in large-scale natural and man-made disasters. Integrated high altitude platform (HAP)/satellite networks are expected to optimally meet emergency communication requirements. In this paper, a novel integrated HAP/satellite (IHS) architecture is proposed, and three segments of the architecture are investigated in detail. The concept of link-state advertisement (LSA) is designed in a slow flat Rician fading channel. The LSA is received and processed by the terminal to estimate the link state information, which can significantly reduce the energy consumption at the terminal end. Furthermore, the transmission power requirements of the HAPs and terminals are derived using the gradient descent and differential equation methods. The energy consumption is modeled at both the source and system level. An innovative and adaptive algorithm is given for the energy-efficient path selection. The simulation results validate the effectiveness of the proposed adaptive algorithm. It is shown that the proposed adaptive algorithm can significantly improve energy efficiency when combined with the LSA and the energy consumption estimation. PMID:26404292
Dong, Feihong; Li, Hongjun; Gong, Xiangwu; Liu, Quan; Wang, Jingchao
2015-09-03
A typical application scenario of remote wireless sensor networks (WSNs) is identified as an emergency scenario. One of the greatest design challenges for communications in emergency scenarios is energy-efficient transmission, due to scarce electrical energy in large-scale natural and man-made disasters. Integrated high altitude platform (HAP)/satellite networks are expected to optimally meet emergency communication requirements. In this paper, a novel integrated HAP/satellite (IHS) architecture is proposed, and three segments of the architecture are investigated in detail. The concept of link-state advertisement (LSA) is designed in a slow flat Rician fading channel. The LSA is received and processed by the terminal to estimate the link state information, which can significantly reduce the energy consumption at the terminal end. Furthermore, the transmission power requirements of the HAPs and terminals are derived using the gradient descent and differential equation methods. The energy consumption is modeled at both the source and system level. An innovative and adaptive algorithm is given for the energy-efficient path selection. The simulation results validate the effectiveness of the proposed adaptive algorithm. It is shown that the proposed adaptive algorithm can significantly improve energy efficiency when combined with the LSA and the energy consumption estimation.
A robust embedded vision system feasible white balance algorithm
NASA Astrophysics Data System (ADS)
Wang, Yuan; Yu, Feihong
2018-01-01
White balance is a very important part of the color image processing pipeline. In order to meet the need of efficiency and accuracy in embedded machine vision processing system, an efficient and robust white balance algorithm combining several classical ones is proposed. The proposed algorithm mainly has three parts. Firstly, in order to guarantee higher efficiency, an initial parameter calculated from the statistics of R, G and B components from raw data is used to initialize the following iterative method. After that, the bilinear interpolation algorithm is utilized to implement demosaicing procedure. Finally, an adaptive step adjustable scheme is introduced to ensure the controllability and robustness of the algorithm. In order to verify the proposed algorithm's performance on embedded vision system, a smart camera based on IMX6 DualLite, IMX291 and XC6130 is designed. Extensive experiments on a large amount of images under different color temperatures and exposure conditions illustrate that the proposed white balance algorithm avoids color deviation problem effectively, achieves a good balance between efficiency and quality, and is suitable for embedded machine vision processing system.
A fast optimization approach for treatment planning of volumetric modulated arc therapy.
Yan, Hui; Dai, Jian-Rong; Li, Ye-Xiong
2018-05-30
Volumetric modulated arc therapy (VMAT) is widely used in clinical practice. It not only significantly reduces treatment time, but also produces high-quality treatment plans. Current optimization approaches heavily rely on stochastic algorithms which are time-consuming and less repeatable. In this study, a novel approach is proposed to provide a high-efficient optimization algorithm for VMAT treatment planning. A progressive sampling strategy is employed for beam arrangement of VMAT planning. The initial beams with equal-space are added to the plan in a coarse sampling resolution. Fluence-map optimization and leaf-sequencing are performed for these beams. Then, the coefficients of fluence-maps optimization algorithm are adjusted according to the known fluence maps of these beams. In the next round the sampling resolution is doubled and more beams are added. This process continues until the total number of beams arrived. The performance of VMAT optimization algorithm was evaluated using three clinical cases and compared to those of a commercial planning system. The dosimetric quality of VMAT plans is equal to or better than the corresponding IMRT plans for three clinical cases. The maximum dose to critical organs is reduced considerably for VMAT plans comparing to those of IMRT plans, especially in the head and neck case. The total number of segments and monitor units are reduced for VMAT plans. For three clinical cases, VMAT optimization takes < 5 min accomplished using proposed approach and is 3-4 times less than that of the commercial system. The proposed VMAT optimization algorithm is able to produce high-quality VMAT plans efficiently and consistently. It presents a new way to accelerate current optimization process of VMAT planning.
NASA Astrophysics Data System (ADS)
Tartakovsky, A.; Tong, M.; Brown, A. P.; Agh, C.
2013-09-01
We develop efficient spatiotemporal image processing algorithms for rejection of non-stationary clutter and tracking of multiple dim objects using non-linear track-before-detect methods. For clutter suppression, we include an innovative image alignment (registration) algorithm. The images are assumed to contain elements of the same scene, but taken at different angles, from different locations, and at different times, with substantial clutter non-stationarity. These challenges are typical for space-based and surface-based IR/EO moving sensors, e.g., highly elliptical orbit or low earth orbit scenarios. The algorithm assumes that the images are related via a planar homography, also known as the projective transformation. The parameters are estimated in an iterative manner, at each step adjusting the parameter vector so as to achieve improved alignment of the images. Operating in the parameter space rather than in the coordinate space is a new idea, which makes the algorithm more robust with respect to noise as well as to large inter-frame disturbances, while operating at real-time rates. For dim object tracking, we include new advancements to a particle non-linear filtering-based track-before-detect (TrbD) algorithm. The new TrbD algorithm includes both real-time full image search for resolved objects not yet in track and joint super-resolution and tracking of individual objects in closely spaced object (CSO) clusters. The real-time full image search provides near-optimal detection and tracking of multiple extremely dim, maneuvering objects/clusters. The super-resolution and tracking CSO TrbD algorithm provides efficient near-optimal estimation of the number of unresolved objects in a CSO cluster, as well as the locations, velocities, accelerations, and intensities of the individual objects. We demonstrate that the algorithm is able to accurately estimate the number of CSO objects and their locations when the initial uncertainty on the number of objects is large. We demonstrate performance of the TrbD algorithm both for satellite-based and surface-based EO/IR surveillance scenarios.
Mining algorithm for association rules in big data based on Hadoop
NASA Astrophysics Data System (ADS)
Fu, Chunhua; Wang, Xiaojing; Zhang, Lijun; Qiao, Liying
2018-04-01
In order to solve the problem that the traditional association rules mining algorithm has been unable to meet the mining needs of large amount of data in the aspect of efficiency and scalability, take FP-Growth as an example, the algorithm is realized in the parallelization based on Hadoop framework and Map Reduce model. On the basis, it is improved using the transaction reduce method for further enhancement of the algorithm's mining efficiency. The experiment, which consists of verification of parallel mining results, comparison on efficiency between serials and parallel, variable relationship between mining time and node number and between mining time and data amount, is carried out in the mining results and efficiency by Hadoop clustering. Experiments show that the paralleled FP-Growth algorithm implemented is able to accurately mine frequent item sets, with a better performance and scalability. It can be better to meet the requirements of big data mining and efficiently mine frequent item sets and association rules from large dataset.
Efficient generation of discontinuity-preserving adaptive triangulations from range images.
Garcia, Miguel Angel; Sappa, Angel Domingo
2004-10-01
This paper presents an efficient technique for generating adaptive triangular meshes from range images. The algorithm consists of two stages. First, a user-defined number of points is adaptively sampled from the given range image. Those points are chosen by taking into account the surface shapes represented in the range image in such a way that points tend to group in areas of high curvature and to disperse in low-variation regions. This selection process is done through a noniterative, inherently parallel algorithm in order to gain efficiency. Once the image has been subsampled, the second stage applies a two and one half-dimensional Delaunay triangulation to obtain an initial triangular mesh. To favor the preservation of surface and orientation discontinuities (jump and crease edges) present in the original range image, the aforementioned triangular mesh is iteratively modified by applying an efficient edge flipping technique. Results with real range images show accurate triangular approximations of the given range images with low processing times.
Research on personalized recommendation algorithm based on spark
NASA Astrophysics Data System (ADS)
Li, Zeng; Liu, Yu
2018-04-01
With the increasing amount of data in the past years, the traditional recommendation algorithm has been unable to meet people's needs. Therefore, how to better recommend their products to users of interest, become the opportunities and challenges of the era of big data development. At present, each platform enterprise has its own recommendation algorithm, but how to make efficient and accurate push information is still an urgent problem for personalized recommendation system. In this paper, a hybrid algorithm based on user collaborative filtering and content-based recommendation algorithm is proposed on Spark to improve the efficiency and accuracy of recommendation by weighted processing. The experiment shows that the recommendation under this scheme is more efficient and accurate.
Camera calibration method of binocular stereo vision based on OpenCV
NASA Astrophysics Data System (ADS)
Zhong, Wanzhen; Dong, Xiaona
2015-10-01
Camera calibration, an important part of the binocular stereo vision research, is the essential foundation of 3D reconstruction of the spatial object. In this paper, the camera calibration method based on OpenCV (open source computer vision library) is submitted to make the process better as a result of obtaining higher precision and efficiency. First, the camera model in OpenCV and an algorithm of camera calibration are presented, especially considering the influence of camera lens radial distortion and decentering distortion. Then, camera calibration procedure is designed to compute those parameters of camera and calculate calibration errors. High-accurate profile extraction algorithm and a checkboard with 48 corners have also been used in this part. Finally, results of calibration program are presented, demonstrating the high efficiency and accuracy of the proposed approach. The results can reach the requirement of robot binocular stereo vision.
Jiang, Qingan; Wu, Wenqi; Jiang, Mingming; Li, Yun
2017-01-01
High-accuracy railway track surveying is essential for railway construction and maintenance. The traditional approaches based on total station equipment are not efficient enough since high precision surveying frequently needs static measurements. This paper proposes a new filtering and smoothing algorithm based on the IMU/odometer and landmarks integration for the railway track surveying. In order to overcome the difficulty of estimating too many error parameters with too few landmark observations, a new model with completely observable error states is established by combining error terms of the system. Based on covariance analysis, the analytical relationship between the railway track surveying accuracy requirements and equivalent gyro drifts including bias instability and random walk noise are established. Experiment results show that the accuracy of the new filtering and smoothing algorithm for railway track surveying can reach 1 mm (1σ) when using a Ring Laser Gyroscope (RLG)-based Inertial Measurement Unit (IMU) with gyro bias instability of 0.03°/h and random walk noise of 0.005°/h while control points of the track control network (CPIII) position observations are provided by the optical total station in about every 60 m interval. The proposed approach can satisfy at the same time the demands of high accuracy and work efficiency for railway track surveying. PMID:28629191
NASA Astrophysics Data System (ADS)
Wang, Jianhua; Cheng, Lianglun; Wang, Tao; Peng, Xiaodong
2016-03-01
Table look-up operation plays a very important role during the decoding processing of context-based adaptive variable length decoding (CAVLD) in H.264/advanced video coding (AVC). However, frequent table look-up operation can result in big table memory access, and then lead to high table power consumption. Aiming to solve the problem of big table memory access of current methods, and then reduce high power consumption, a memory-efficient table look-up optimized algorithm is presented for CAVLD. The contribution of this paper lies that index search technology is introduced to reduce big memory access for table look-up, and then reduce high table power consumption. Specifically, in our schemes, we use index search technology to reduce memory access by reducing the searching and matching operations for code_word on the basis of taking advantage of the internal relationship among length of zero in code_prefix, value of code_suffix and code_lengh, thus saving the power consumption of table look-up. The experimental results show that our proposed table look-up algorithm based on index search can lower about 60% memory access consumption compared with table look-up by sequential search scheme, and then save much power consumption for CAVLD in H.264/AVC.
Solving SAT Problem Based on Hybrid Differential Evolution Algorithm
NASA Astrophysics Data System (ADS)
Liu, Kunqi; Zhang, Jingmin; Liu, Gang; Kang, Lishan
Satisfiability (SAT) problem is an NP-complete problem. Based on the analysis about it, SAT problem is translated equally into an optimization problem on the minimum of objective function. A hybrid differential evolution algorithm is proposed to solve the Satisfiability problem. It makes full use of strong local search capacity of hill-climbing algorithm and strong global search capability of differential evolution algorithm, which makes up their disadvantages, improves the efficiency of algorithm and avoids the stagnation phenomenon. The experiment results show that the hybrid algorithm is efficient in solving SAT problem.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Maurer, S. A.; Kussmann, J.; Ochsenfeld, C., E-mail: Christian.Ochsenfeld@cup.uni-muenchen.de
2014-08-07
We present a low-prefactor, cubically scaling scaled-opposite-spin second-order Møller-Plesset perturbation theory (SOS-MP2) method which is highly suitable for massively parallel architectures like graphics processing units (GPU). The scaling is reduced from O(N{sup 5}) to O(N{sup 3}) by a reformulation of the MP2-expression in the atomic orbital basis via Laplace transformation and the resolution-of-the-identity (RI) approximation of the integrals in combination with efficient sparse algebra for the 3-center integral transformation. In contrast to previous works that employ GPUs for post Hartree-Fock calculations, we do not simply employ GPU-based linear algebra libraries to accelerate the conventional algorithm. Instead, our reformulation allows tomore » replace the rate-determining contraction step with a modified J-engine algorithm, that has been proven to be highly efficient on GPUs. Thus, our SOS-MP2 scheme enables us to treat large molecular systems in an accurate and efficient manner on a single GPU-server.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bolding, Simon R.; Cleveland, Mathew Allen; Morel, Jim E.
In this paper, we have implemented a new high-order low-order (HOLO) algorithm for solving thermal radiative transfer problems. The low-order (LO) system is based on the spatial and angular moments of the transport equation and a linear-discontinuous finite-element spatial representation, producing equations similar to the standard S 2 equations. The LO solver is fully implicit in time and efficiently resolves the nonlinear temperature dependence at each time step. The high-order (HO) solver utilizes exponentially convergent Monte Carlo (ECMC) to give a globally accurate solution for the angular intensity to a fixed-source pure-absorber transport problem. This global solution is used tomore » compute consistency terms, which require the HO and LO solutions to converge toward the same solution. The use of ECMC allows for the efficient reduction of statistical noise in the Monte Carlo solution, reducing inaccuracies introduced through the LO consistency terms. Finally, we compare results with an implicit Monte Carlo code for one-dimensional gray test problems and demonstrate the efficiency of ECMC over standard Monte Carlo in this HOLO algorithm.« less
New algorithms for processing time-series big EEG data within mobile health monitoring systems.
Serhani, Mohamed Adel; Menshawy, Mohamed El; Benharref, Abdelghani; Harous, Saad; Navaz, Alramzana Nujum
2017-10-01
Recent advances in miniature biomedical sensors, mobile smartphones, wireless communications, and distributed computing technologies provide promising techniques for developing mobile health systems. Such systems are capable of monitoring epileptic seizures reliably, which are classified as chronic diseases. Three challenging issues raised in this context with regard to the transformation, compression, storage, and visualization of big data, which results from a continuous recording of epileptic seizures using mobile devices. In this paper, we address the above challenges by developing three new algorithms to process and analyze big electroencephalography data in a rigorous and efficient manner. The first algorithm is responsible for transforming the standard European Data Format (EDF) into the standard JavaScript Object Notation (JSON) and compressing the transformed JSON data to decrease the size and time through the transfer process and to increase the network transfer rate. The second algorithm focuses on collecting and storing the compressed files generated by the transformation and compression algorithm. The collection process is performed with respect to the on-the-fly technique after decompressing files. The third algorithm provides relevant real-time interaction with signal data by prospective users. It particularly features the following capabilities: visualization of single or multiple signal channels on a smartphone device and query data segments. We tested and evaluated the effectiveness of our approach through a software architecture model implementing a mobile health system to monitor epileptic seizures. The experimental findings from 45 experiments are promising and efficiently satisfy the approach's objectives in a price of linearity. Moreover, the size of compressed JSON files and transfer times are reduced by 10% and 20%, respectively, while the average total time is remarkably reduced by 67% through all performed experiments. Our approach successfully develops efficient algorithms in terms of processing time, memory usage, and energy consumption while maintaining a high scalability of the proposed solution. Our approach efficiently supports data partitioning and parallelism relying on the MapReduce platform, which can help in monitoring and automatic detection of epileptic seizures. Copyright © 2017 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Lin, Y.; O'Malley, D.; Vesselinov, V. V.
2015-12-01
Inverse modeling seeks model parameters given a set of observed state variables. However, for many practical problems due to the facts that the observed data sets are often large and model parameters are often numerous, conventional methods for solving the inverse modeling can be computationally expensive. We have developed a new, computationally-efficient Levenberg-Marquardt method for solving large-scale inverse modeling. Levenberg-Marquardt methods require the solution of a dense linear system of equations which can be prohibitively expensive to compute for large-scale inverse problems. Our novel method projects the original large-scale linear problem down to a Krylov subspace, such that the dimensionality of the measurements can be significantly reduced. Furthermore, instead of solving the linear system for every Levenberg-Marquardt damping parameter, we store the Krylov subspace computed when solving the first damping parameter and recycle it for all the following damping parameters. The efficiency of our new inverse modeling algorithm is significantly improved by using these computational techniques. We apply this new inverse modeling method to invert for a random transitivity field. Our algorithm is fast enough to solve for the distributed model parameters (transitivity) at each computational node in the model domain. The inversion is also aided by the use regularization techniques. The algorithm is coded in Julia and implemented in the MADS computational framework (http://mads.lanl.gov). Julia is an advanced high-level scientific programing language that allows for efficient memory management and utilization of high-performance computational resources. By comparing with a Levenberg-Marquardt method using standard linear inversion techniques, our Levenberg-Marquardt method yields speed-up ratio of 15 in a multi-core computational environment and a speed-up ratio of 45 in a single-core computational environment. Therefore, our new inverse modeling method is a powerful tool for large-scale applications.
High-Reproducibility and High-Accuracy Method for Automated Topic Classification
NASA Astrophysics Data System (ADS)
Lancichinetti, Andrea; Sirer, M. Irmak; Wang, Jane X.; Acuna, Daniel; Körding, Konrad; Amaral, Luís A. Nunes
2015-01-01
Much of human knowledge sits in large databases of unstructured text. Leveraging this knowledge requires algorithms that extract and record metadata on unstructured text documents. Assigning topics to documents will enable intelligent searching, statistical characterization, and meaningful classification. Latent Dirichlet allocation (LDA) is the state of the art in topic modeling. Here, we perform a systematic theoretical and numerical analysis that demonstrates that current optimization techniques for LDA often yield results that are not accurate in inferring the most suitable model parameters. Adapting approaches from community detection in networks, we propose a new algorithm that displays high reproducibility and high accuracy and also has high computational efficiency. We apply it to a large set of documents in the English Wikipedia and reveal its hierarchical structure.
Competitive Swarm Optimizer Based Gateway Deployment Algorithm in Cyber-Physical Systems
Huang, Shuqiang; Tao, Ming
2017-01-01
Wireless sensor network topology optimization is a highly important issue, and topology control through node selection can improve the efficiency of data forwarding, while saving energy and prolonging lifetime of the network. To address the problem of connecting a wireless sensor network to the Internet in cyber-physical systems, here we propose a geometric gateway deployment based on a competitive swarm optimizer algorithm. The particle swarm optimization (PSO) algorithm has a continuous search feature in the solution space, which makes it suitable for finding the geometric center of gateway deployment; however, its search mechanism is limited to the individual optimum (pbest) and the population optimum (gbest); thus, it easily falls into local optima. In order to improve the particle search mechanism and enhance the search efficiency of the algorithm, we introduce a new competitive swarm optimizer (CSO) algorithm. The CSO search algorithm is based on an inter-particle competition mechanism and can effectively avoid trapping of the population falling into a local optimum. With the improvement of an adaptive opposition-based search and its ability to dynamically parameter adjustments, this algorithm can maintain the diversity of the entire swarm to solve geometric K-center gateway deployment problems. The simulation results show that this CSO algorithm has a good global explorative ability as well as convergence speed and can improve the network quality of service (QoS) level of cyber-physical systems by obtaining a minimum network coverage radius. We also find that the CSO algorithm is more stable, robust and effective in solving the problem of geometric gateway deployment as compared to the PSO or Kmedoids algorithms. PMID:28117735
Accelerating DNA analysis applications on GPU clusters
DOE Office of Scientific and Technical Information (OSTI.GOV)
Tumeo, Antonino; Villa, Oreste
DNA analysis is an emerging application of high performance bioinformatic. Modern sequencing machinery are able to provide, in few hours, large input streams of data which needs to be matched against exponentially growing databases known fragments. The ability to recognize these patterns effectively and fastly may allow extending the scale and the reach of the investigations performed by biology scientists. Aho-Corasick is an exact, multiple pattern matching algorithm often at the base of this application. High performance systems are a promising platform to accelerate this algorithm, which is computationally intensive but also inherently parallel. Nowadays, high performance systems also includemore » heterogeneous processing elements, such as Graphic Processing Units (GPUs), to further accelerate parallel algorithms. Unfortunately, the Aho-Corasick algorithm exhibits large performance variabilities, depending on the size of the input streams, on the number of patterns to search and on the number of matches, and poses significant challenges on current high performance software and hardware implementations. An adequate mapping of the algorithm on the target architecture, coping with the limit of the underlining hardware, is required to reach the desired high throughputs. Load balancing also plays a crucial role when considering the limited bandwidth among the nodes of these systems. In this paper we present an efficient implementation of the Aho-Corasick algorithm for high performance clusters accelerated with GPUs. We discuss how we partitioned and adapted the algorithm to fit the Tesla C1060 GPU and then present a MPI based implementation for a heterogeneous high performance cluster. We compare this implementation to MPI and MPI with pthreads based implementations for a homogeneous cluster of x86 processors, discussing the stability vs. the performance and the scaling of the solutions, taking into consideration aspects such as the bandwidth among the different nodes.« less
Hamiltonian Monte Carlo acceleration using surrogate functions with random bases.
Zhang, Cheng; Shahbaba, Babak; Zhao, Hongkai
2017-11-01
For big data analysis, high computational cost for Bayesian methods often limits their applications in practice. In recent years, there have been many attempts to improve computational efficiency of Bayesian inference. Here we propose an efficient and scalable computational technique for a state-of-the-art Markov chain Monte Carlo methods, namely, Hamiltonian Monte Carlo. The key idea is to explore and exploit the structure and regularity in parameter space for the underlying probabilistic model to construct an effective approximation of its geometric properties. To this end, we build a surrogate function to approximate the target distribution using properly chosen random bases and an efficient optimization process. The resulting method provides a flexible, scalable, and efficient sampling algorithm, which converges to the correct target distribution. We show that by choosing the basis functions and optimization process differently, our method can be related to other approaches for the construction of surrogate functions such as generalized additive models or Gaussian process models. Experiments based on simulated and real data show that our approach leads to substantially more efficient sampling algorithms compared to existing state-of-the-art methods.
An Adaptive Evolutionary Algorithm for Traveling Salesman Problem with Precedence Constraints
Sung, Jinmo; Jeong, Bongju
2014-01-01
Traveling sales man problem with precedence constraints is one of the most notorious problems in terms of the efficiency of its solution approach, even though it has very wide range of industrial applications. We propose a new evolutionary algorithm to efficiently obtain good solutions by improving the search process. Our genetic operators guarantee the feasibility of solutions over the generations of population, which significantly improves the computational efficiency even when it is combined with our flexible adaptive searching strategy. The efficiency of the algorithm is investigated by computational experiments. PMID:24701158
An adaptive evolutionary algorithm for traveling salesman problem with precedence constraints.
Sung, Jinmo; Jeong, Bongju
2014-01-01
Traveling sales man problem with precedence constraints is one of the most notorious problems in terms of the efficiency of its solution approach, even though it has very wide range of industrial applications. We propose a new evolutionary algorithm to efficiently obtain good solutions by improving the search process. Our genetic operators guarantee the feasibility of solutions over the generations of population, which significantly improves the computational efficiency even when it is combined with our flexible adaptive searching strategy. The efficiency of the algorithm is investigated by computational experiments.
Multi-threaded Sparse Matrix Sparse Matrix Multiplication for Many-Core and GPU Architectures.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Deveci, Mehmet; Trott, Christian Robert; Rajamanickam, Sivasankaran
Sparse Matrix-Matrix multiplication is a key kernel that has applications in several domains such as scientific computing and graph analysis. Several algorithms have been studied in the past for this foundational kernel. In this paper, we develop parallel algorithms for sparse matrix- matrix multiplication with a focus on performance portability across different high performance computing architectures. The performance of these algorithms depend on the data structures used in them. We compare different types of accumulators in these algorithms and demonstrate the performance difference between these data structures. Furthermore, we develop a meta-algorithm, kkSpGEMM, to choose the right algorithm and datamore » structure based on the characteristics of the problem. We show performance comparisons on three architectures and demonstrate the need for the community to develop two phase sparse matrix-matrix multiplication implementations for efficient reuse of the data structures involved.« less
Multi-threaded Sparse Matrix-Matrix Multiplication for Many-Core and GPU Architectures.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Deveci, Mehmet; Rajamanickam, Sivasankaran; Trott, Christian Robert
Sparse Matrix-Matrix multiplication is a key kernel that has applications in several domains such as scienti c computing and graph analysis. Several algorithms have been studied in the past for this foundational kernel. In this paper, we develop parallel algorithms for sparse matrix-matrix multiplication with a focus on performance portability across different high performance computing architectures. The performance of these algorithms depend on the data structures used in them. We compare different types of accumulators in these algorithms and demonstrate the performance difference between these data structures. Furthermore, we develop a meta-algorithm, kkSpGEMM, to choose the right algorithm and datamore » structure based on the characteristics of the problem. We show performance comparisons on three architectures and demonstrate the need for the community to develop two phase sparse matrix-matrix multiplication implementations for efficient reuse of the data structures involved.« less
Information-based management mode based on value network analysis for livestock enterprises
NASA Astrophysics Data System (ADS)
Liu, Haoqi; Lee, Changhoon; Han, Mingming; Su, Zhongbin; Padigala, Varshinee Anu; Shen, Weizheng
2018-01-01
With the development of computer and IT technologies, enterprise management has gradually become information-based management. Moreover, due to poor technical competence and non-uniform management, most breeding enterprises show a lack of organisation in data collection and management. In addition, low levels of efficiency result in increasing production costs. This paper adopts 'struts2' in order to construct an information-based management system for standardised and normalised management within the process of production in beef cattle breeding enterprises. We present a radio-frequency identification system by studying multiple-tag anti-collision via a dynamic grouping ALOHA algorithm. This algorithm is based on the existing ALOHA algorithm and uses an improved packet dynamic of this algorithm, which is characterised by a high-throughput rate. This new algorithm can reach a throughput 42% higher than that of the general ALOHA algorithm. With a change in the number of tags, the system throughput is relatively stable.
A curvature-based weighted fuzzy c-means algorithm for point clouds de-noising
NASA Astrophysics Data System (ADS)
Cui, Xin; Li, Shipeng; Yan, Xiutian; He, Xinhua
2018-04-01
In order to remove the noise of three-dimensional scattered point cloud and smooth the data without damnify the sharp geometric feature simultaneity, a novel algorithm is proposed in this paper. The feature-preserving weight is added to fuzzy c-means algorithm which invented a curvature weighted fuzzy c-means clustering algorithm. Firstly, the large-scale outliers are removed by the statistics of r radius neighboring points. Then, the algorithm estimates the curvature of the point cloud data by using conicoid parabolic fitting method and calculates the curvature feature value. Finally, the proposed clustering algorithm is adapted to calculate the weighted cluster centers. The cluster centers are regarded as the new points. The experimental results show that this approach is efficient to different scale and intensities of noise in point cloud with a high precision, and perform a feature-preserving nature at the same time. Also it is robust enough to different noise model.
Improved collaborative filtering recommendation algorithm of similarity measure
NASA Astrophysics Data System (ADS)
Zhang, Baofu; Yuan, Baoping
2017-05-01
The Collaborative filtering recommendation algorithm is one of the most widely used recommendation algorithm in personalized recommender systems. The key is to find the nearest neighbor set of the active user by using similarity measure. However, the methods of traditional similarity measure mainly focus on the similarity of user common rating items, but ignore the relationship between the user common rating items and all items the user rates. And because rating matrix is very sparse, traditional collaborative filtering recommendation algorithm is not high efficiency. In order to obtain better accuracy, based on the consideration of common preference between users, the difference of rating scale and score of common items, this paper presents an improved similarity measure method, and based on this method, a collaborative filtering recommendation algorithm based on similarity improvement is proposed. Experimental results show that the algorithm can effectively improve the quality of recommendation, thus alleviate the impact of data sparseness.
Seismic noise attenuation using an online subspace tracking algorithm
NASA Astrophysics Data System (ADS)
Zhou, Yatong; Li, Shuhua; Zhang, Dong; Chen, Yangkang
2018-02-01
We propose a new low-rank based noise attenuation method using an efficient algorithm for tracking subspaces from highly corrupted seismic observations. The subspace tracking algorithm requires only basic linear algebraic manipulations. The algorithm is derived by analysing incremental gradient descent on the Grassmannian manifold of subspaces. When the multidimensional seismic data are mapped to a low-rank space, the subspace tracking algorithm can be directly applied to the input low-rank matrix to estimate the useful signals. Since the subspace tracking algorithm is an online algorithm, it is more robust to random noise than traditional truncated singular value decomposition (TSVD) based subspace tracking algorithm. Compared with the state-of-the-art algorithms, the proposed denoising method can obtain better performance. More specifically, the proposed method outperforms the TSVD-based singular spectrum analysis method in causing less residual noise and also in saving half of the computational cost. Several synthetic and field data examples with different levels of complexities demonstrate the effectiveness and robustness of the presented algorithm in rejecting different types of noise including random noise, spiky noise, blending noise, and coherent noise.
An efficient and accurate 3D displacements tracking strategy for digital volume correlation
NASA Astrophysics Data System (ADS)
Pan, Bing; Wang, Bo; Wu, Dafang; Lubineau, Gilles
2014-07-01
Owing to its inherent computational complexity, practical implementation of digital volume correlation (DVC) for internal displacement and strain mapping faces important challenges in improving its computational efficiency. In this work, an efficient and accurate 3D displacement tracking strategy is proposed for fast DVC calculation. The efficiency advantage is achieved by using three improvements. First, to eliminate the need of updating Hessian matrix in each iteration, an efficient 3D inverse compositional Gauss-Newton (3D IC-GN) algorithm is introduced to replace existing forward additive algorithms for accurate sub-voxel displacement registration. Second, to ensure the 3D IC-GN algorithm that converges accurately and rapidly and avoid time-consuming integer-voxel displacement searching, a generalized reliability-guided displacement tracking strategy is designed to transfer accurate and complete initial guess of deformation for each calculation point from its computed neighbors. Third, to avoid the repeated computation of sub-voxel intensity interpolation coefficients, an interpolation coefficient lookup table is established for tricubic interpolation. The computational complexity of the proposed fast DVC and the existing typical DVC algorithms are first analyzed quantitatively according to necessary arithmetic operations. Then, numerical tests are performed to verify the performance of the fast DVC algorithm in terms of measurement accuracy and computational efficiency. The experimental results indicate that, compared with the existing DVC algorithm, the presented fast DVC algorithm produces similar precision and slightly higher accuracy at a substantially reduced computational cost.
T-L Plane Abstraction-Based Energy-Efficient Real-Time Scheduling for Multi-Core Wireless Sensors.
Kim, Youngmin; Lee, Ki-Seong; Pham, Ngoc-Son; Lee, Sun-Ro; Lee, Chan-Gun
2016-07-08
Energy efficiency is considered as a critical requirement for wireless sensor networks. As more wireless sensor nodes are equipped with multi-cores, there are emerging needs for energy-efficient real-time scheduling algorithms. The T-L plane-based scheme is known to be an optimal global scheduling technique for periodic real-time tasks on multi-cores. Unfortunately, there has been a scarcity of studies on extending T-L plane-based scheduling algorithms to exploit energy-saving techniques. In this paper, we propose a new T-L plane-based algorithm enabling energy-efficient real-time scheduling on multi-core sensor nodes with dynamic power management (DPM). Our approach addresses the overhead of processor mode transitions and reduces fragmentations of the idle time, which are inherent in T-L plane-based algorithms. Our experimental results show the effectiveness of the proposed algorithm compared to other energy-aware scheduling methods on T-L plane abstraction.
NASA Astrophysics Data System (ADS)
Liao, Wei-Cheng; Hong, Mingyi; Liu, Ya-Feng; Luo, Zhi-Quan
2014-08-01
In a densely deployed heterogeneous network (HetNet), the number of pico/micro base stations (BS) can be comparable with the number of the users. To reduce the operational overhead of the HetNet, proper identification of the set of serving BSs becomes an important design issue. In this work, we show that by jointly optimizing the transceivers and determining the active set of BSs, high system resource utilization can be achieved with only a small number of BSs. In particular, we provide formulations and efficient algorithms for such joint optimization problem, under the following two common design criteria: i) minimization of the total power consumption at the BSs, and ii) maximization of the system spectrum efficiency. In both cases, we introduce a nonsmooth regularizer to facilitate the activation of the most appropriate BSs. We illustrate the efficiency and the efficacy of the proposed algorithms via extensive numerical simulations.
Extended reactance domain algorithms for DoA estimation onto an ESPAR antennas
NASA Astrophysics Data System (ADS)
Harabi, F.; Akkar, S.; Gharsallah, A.
2016-07-01
Based on an extended reactance domain (RD) covariance matrix, this article proposes new alternatives for directions of arrival (DoAs) estimation of narrowband sources through an electronically steerable parasitic array radiator (ESPAR) antennas. Because of the centro symmetry of the classic ESPAR antennas, an unitary transformation is applied to the collected data that allow an important reduction in both computational cost and processing time and, also, an enhancement of the resolution capabilities of the proposed algorithms. Moreover, this article proposes a new approach for eigenvalues estimation through only some linear operations. The developed DoAs estimation algorithms based on this new approach has illustrated a good behaviour with less calculation cost and processing time as compared to other schemes based on the classic eigenvalues approach. The conducted simulations demonstrate that high-precision and high-resolution DoAs estimation can be reached especially in very closely sources situation and low sources power as compared to the RD-MUSIC algorithm and the RD-PM algorithm. The asymptotic behaviours of the proposed DoAs estimators are analysed in various scenarios and compared with the Cramer-Rao bound (CRB). The conducted simulations testify the high-resolution of the developed algorithms and prove the efficiently of the proposed approach.
Application of composite dictionary multi-atom matching in gear fault diagnosis.
Cui, Lingli; Kang, Chenhui; Wang, Huaqing; Chen, Peng
2011-01-01
The sparse decomposition based on matching pursuit is an adaptive sparse expression method for signals. This paper proposes an idea concerning a composite dictionary multi-atom matching decomposition and reconstruction algorithm, and the introduction of threshold de-noising in the reconstruction algorithm. Based on the structural characteristics of gear fault signals, a composite dictionary combining the impulse time-frequency dictionary and the Fourier dictionary was constituted, and a genetic algorithm was applied to search for the best matching atom. The analysis results of gear fault simulation signals indicated the effectiveness of the hard threshold, and the impulse or harmonic characteristic components could be separately extracted. Meanwhile, the robustness of the composite dictionary multi-atom matching algorithm at different noise levels was investigated. Aiming at the effects of data lengths on the calculation efficiency of the algorithm, an improved segmented decomposition and reconstruction algorithm was proposed, and the calculation efficiency of the decomposition algorithm was significantly enhanced. In addition it is shown that the multi-atom matching algorithm was superior to the single-atom matching algorithm in both calculation efficiency and algorithm robustness. Finally, the above algorithm was applied to gear fault engineering signals, and achieved good results.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lin, Youzuo; O'Malley, Daniel; Vesselinov, Velimir V.
Inverse modeling seeks model parameters given a set of observations. However, for practical problems because the number of measurements is often large and the model parameters are also numerous, conventional methods for inverse modeling can be computationally expensive. We have developed a new, computationally-efficient parallel Levenberg-Marquardt method for solving inverse modeling problems with a highly parameterized model space. Levenberg-Marquardt methods require the solution of a linear system of equations which can be prohibitively expensive to compute for moderate to large-scale problems. Our novel method projects the original linear problem down to a Krylov subspace, such that the dimensionality of themore » problem can be significantly reduced. Furthermore, we store the Krylov subspace computed when using the first damping parameter and recycle the subspace for the subsequent damping parameters. The efficiency of our new inverse modeling algorithm is significantly improved using these computational techniques. We apply this new inverse modeling method to invert for random transmissivity fields in 2D and a random hydraulic conductivity field in 3D. Our algorithm is fast enough to solve for the distributed model parameters (transmissivity) in the model domain. The algorithm is coded in Julia and implemented in the MADS computational framework (http://mads.lanl.gov). By comparing with Levenberg-Marquardt methods using standard linear inversion techniques such as QR or SVD methods, our Levenberg-Marquardt method yields a speed-up ratio on the order of ~10 1 to ~10 2 in a multi-core computational environment. Furthermore, our new inverse modeling method is a powerful tool for characterizing subsurface heterogeneity for moderate- to large-scale problems.« less
Lin, Youzuo; O'Malley, Daniel; Vesselinov, Velimir V.
2016-09-01
Inverse modeling seeks model parameters given a set of observations. However, for practical problems because the number of measurements is often large and the model parameters are also numerous, conventional methods for inverse modeling can be computationally expensive. We have developed a new, computationally-efficient parallel Levenberg-Marquardt method for solving inverse modeling problems with a highly parameterized model space. Levenberg-Marquardt methods require the solution of a linear system of equations which can be prohibitively expensive to compute for moderate to large-scale problems. Our novel method projects the original linear problem down to a Krylov subspace, such that the dimensionality of themore » problem can be significantly reduced. Furthermore, we store the Krylov subspace computed when using the first damping parameter and recycle the subspace for the subsequent damping parameters. The efficiency of our new inverse modeling algorithm is significantly improved using these computational techniques. We apply this new inverse modeling method to invert for random transmissivity fields in 2D and a random hydraulic conductivity field in 3D. Our algorithm is fast enough to solve for the distributed model parameters (transmissivity) in the model domain. The algorithm is coded in Julia and implemented in the MADS computational framework (http://mads.lanl.gov). By comparing with Levenberg-Marquardt methods using standard linear inversion techniques such as QR or SVD methods, our Levenberg-Marquardt method yields a speed-up ratio on the order of ~10 1 to ~10 2 in a multi-core computational environment. Furthermore, our new inverse modeling method is a powerful tool for characterizing subsurface heterogeneity for moderate- to large-scale problems.« less
Modifications to Improve Data Acquisition and Analysis for Camouflage Design
1983-01-01
terrains into facsimiles of the original scenes in 3, 4# or 5 colors in CIELAB notation. Tasks that were addressed included optimization of the...a histogram algorithm (HIST) was used as a first step In the clustering of the CIELAB values of the scene pixels. This algorithm Is highly efficient...however, an optimal process and the CIELAB coordinates of the final color domains can be Influenced by the color coordinate Increments used In the
Detection of Unknown LEO Satellite Using Radar Measurements
NASA Astrophysics Data System (ADS)
Kamensky, S.; Samotokhin, A.; Khutorovsky, Z.; Alfriend, T.
While processing of the radar information aimed at satellite catalog maintenance some measurements do not correlate with cataloged and tracked satellites. These non-correlated measurements participate in the detection (primary orbit determination) of new (not cataloged) satellites. The satellite is considered newly detected when it is missing in the catalog and the primary orbit determination on the basis of the non-correlated measurements provides the accuracy sufficient for reliable correlation of future measurements. We will call this the detection condition. One non-correlated measurement in real conditions does not have enough accuracy and thus does not satisfy the detection condition. Two measurements separated by a revolution or more normally provides orbit determination with accuracy sufficient for selection of other measurements. However, it is not always possible to say with high probability (close to 1) that two measurements belong to one satellite. Three measurements for different revolutions, which are included into one orbit, have significantly higher chances to belong to one satellite. Thus the suggested detection (primary orbit determination) algorithm looks for three uncorrelated measurements in different revolutions for which we can determine the orbit inscribing them. The detection procedure based on search for the triplets is rather laborious. Thus only relatively high efficiency can be the reason for its practical implementation. The work presents the detailed description of the suggested detection procedure based on the search for triplets of uncorrelated measurements (for radar measurements). The break-ups of the tracked satellites provide the most difficult conditions for the operation of the detection algorithm and reveal explicitly its characteristics. The characteristics of time efficiency and reliability of the detected orbits are of maximum interest. Within this work we suggest to determine these characteristics using simulation of break-ups with further acquisition of measurements generated by the fragments. In particular, using simulation we can not only evaluate the characteristics of the algorithm but adjust its parameters for certain conditions: the orbit of the fragmented satellite, the features of the break-up, capabilities of detection radars etc. We describe the algorithm performing the simulation of radar measurements produced by the fragments of the parent satellite. This algorithm accounts of the basic factors affecting the characteristics of time efficiency and reliability of the detection. The catalog maintenance algorithm includes two major components detection and tracking. These are two processes permanently interacting with each other. This is actually in place for the processing of real radar data. The simulation must take this into account since one cannot obtain reliable characteristics of detection procedure simulating only this process. Thus we simulated both processes in their interaction. The work presents the results of simulation for the simplest case of a break-up in near-circular orbit with insignificant atmospheric drag. The simulations show rather high efficiency. We demonstrate as well that the characteristics of time efficiency and reliability of determined orbits essentially depend on the density of the observed break-up fragments.
NASA Astrophysics Data System (ADS)
Liu, Qiong; Wang, Wen-xi; Zhu, Ke-ren; Zhang, Chao-yong; Rao, Yun-qing
2014-11-01
Mixed-model assembly line sequencing is significant in reducing the production time and overall cost of production. To improve production efficiency, a mathematical model aiming simultaneously to minimize overtime, idle time and total set-up costs is developed. To obtain high-quality and stable solutions, an advanced scatter search approach is proposed. In the proposed algorithm, a new diversification generation method based on a genetic algorithm is presented to generate a set of potentially diverse and high-quality initial solutions. Many methods, including reference set update, subset generation, solution combination and improvement methods, are designed to maintain the diversification of populations and to obtain high-quality ideal solutions. The proposed model and algorithm are applied and validated in a case company. The results indicate that the proposed advanced scatter search approach is significant for mixed-model assembly line sequencing in this company.